23 July 2005 0 comments Python
Today I learnt about how to use the
\B gadget in Python regular expressions. I've previously talked about the usefulness of
\b but there's a big benefit to using
\B sometimes too.
\b does is that it is a word-boundary for alphanumerics. It allows you to find "peter" in "peter bengtsson" but not "peter" in "nickname: peterbe". In other words, all the letters have to be grouped prefixed or suffixed by a wordboundry such as newline, start-of-line, end-of-line or a non alpha character like
\b does for finding alphanumerics,
\B does for finding non-alphanumerics. Example:
>>> import re >>> re.compile(r'\bX\b').findall('X + Y') ['X'] # it can find 'X' >>> re.compile(r'\b\+\b').findall('X + Y')  # same technique can't find '+' >>> re.compile(r'\B\+\B').findall('X + Y') ['+'] # better to use \B when finding '+' >>> re.compile(r'\BX\B').findall('X + Y')  # and use \B only for non-alphanumerics
The lesson is:
\b is a really useful tool but it's limited to finding alphanumerics (numbers and A-Z).
\B is what you have to use for finding non-alphanumerics.