\B in Python regular expressions

23 July 2005   0 comments   Python

Powered by Fusion×

Today I learnt about how to use the \B gadget in Python regular expressions. I've previously talked about the usefulness of \b but there's a big benefit to using \B sometimes too.

What \b does is that it is a word-boundary for alphanumerics. It allows you to find "peter" in "peter bengtsson" but not "peter" in "nickname: peterbe". In other words, all the letters have to be grouped prefixed or suffixed by a wordboundry such as newline, start-of-line, end-of-line or a non alpha character like (.

What \b does for finding alphanumerics, \B does for finding non-alphanumerics. Example:

>>> import re
>>> re.compile(r'\bX\b').findall('X + Y') 
['X'] # it can find 'X'
>>> re.compile(r'\b\+\b').findall('X + Y')
[] # same technique can't find '+'
>>> re.compile(r'\B\+\B').findall('X + Y')
['+'] # better to use \B when finding '+'
>>> re.compile(r'\BX\B').findall('X + Y')
[] # and use \B only for non-alphanumerics

The lesson is: \b is a really useful tool but it's limited to finding alphanumerics (numbers and A-Z). \B is what you have to use for finding non-alphanumerics.


Your email will never ever be published

Related posts

London bus 26 from Hackney 21 July 2005
Release package file size 29 July 2005
Related by keywords:
Advanced live-search with AngularJS 04 February 2014
\b in Python regular expressions 14 June 2005
UPPER vs. ILIKE 19 April 2010
Python regular expression tester 19 September 2005
Regular Expressions in Javascript cheat sheet 18 June 2005
Quick PostgreSQL optimization story 11 March 2006
Are you a web developer? Then VisiBone is for you 22 January 2006
Anti-spamming email harvesting 26 February 2004
Recon - Regular Expression Test Console 14 January 2004