Kung Fu Kung Fu

Fujian White Crane Kung Fu

Zope Zope

What I have and am doing with Zope

Photos Photos

Photoalbum, both old and new.

Receptsamlingen Receptsamlingen

In Swedish only. About my "Collection of Recipes" website.

Contact me Contact me

My contact details and how to contact me.

  Mobile version of this page Mobile version of this page


 

\B in Python regular expressions

regular expressions, regular expression, \B, \b, wordboundry, word boundry

23rd of July 2005

Today I learnt about how to use the \B gadget in Python regular expressions. I've previously talked about the usefulness of \b but there's a big benefit to using \B sometimes too.

What \b does is that it is a word-boundary for alphanumerics. It allows you to find "peter" in "peter bengtsson" but not "peter" in "nickname: peterbe". In other words, all the letters have to be grouped prefixed or suffixed by a wordboundry such as newline, start-of-line, end-of-line or a non alpha character like (.

What \b does for finding alphanumerics, \B does for finding non-alphanumerics. Example:

 >>> import re
 >>> re.compile(r'\bX\b').findall('X + Y') 
 ['X'] # it can find 'X'
 >>> re.compile(r'\b\+\b').findall('X + Y')
 [] # same technique can't find '+'
 >>> re.compile(r'\B\+\B').findall('X + Y')
 ['+'] # better to use \B when finding '+'
 >>> re.compile(r'\BX\B').findall('X + Y')
 [] # and use \B only for non-alphanumerics

The lesson is: \b is a really useful tool but it's limited to finding alphanumerics (numbers and A-Z). \B is what you have to use for finding non-alphanumerics.


Comment

 
Name:
Email:
hide my email address.

Your email address will be encoded to prevent email-extraction spiders from reading it so you won't get spammed if you decide to show your email address.