Crosstips.org

My fun Crossword solver project. Crosstips.org & Krysstips.se

Kung Fu

Fujian White Crane Kung Fu

Fry-IT

Fry-IT is the company I work for

Photos

Photoalbum, both old and new.

Zope

What I have and am doing with Zope

Receptsamlingen

In Swedish only. About my "Collection of Recipes" website.

Contact me

My contact details and how to contact me.

 

KungFuPeople.com
Do you train Kung Fu?
Or know someone who does?
Then check out KungFuPeople.com


Mobile version of this page Mobile version of this page


 

\b in Python regular expressions


14th of June 2005

Boy did that shut me up! The \b special character i python regular expressions is so useful. I've used it before but have forgotten about it. The following code:

 def createStandaloneWordRegex(word):
    """ return a regular expression that can find 'peter'
    only if it's written alone (next to space, start of 
    string, end of string, comma, etc) but not if inside 
    another word like peterbe """

    return re.compile(r"""
      (
      ^ %s
      (?=\W | $)
      |
      (?<=\W)
      %s
      (?=\W | $)
      )
      """
% (re.escape(word), re.escape(word)),
            re.I|re.L|re.M|re.X)

can with the \b gadget be simplified to this:

 def createStandaloneWordRegex(word):
    """ return a regular expression that can find 'peter'
    only if it's written alone (next to space, start of 
    string, end of string, comma, etc) but not if inside 
    another word like peterbe """

    return re.compile(r'\b%s\b' % word, re.I)

Quite a lot simpler isn't it? The simplified passes all the few unit tests I had.



Comment

Matt Schinckel - 29th June 2005  [«« Reply to this]
Excellent! I was wondering how to do this for a script (http://schinckel.blogsome.com/2005/06/27/ecto-auto-abbracronym/) that automatically adds abbr and acronym tags to text in ecto, a blogging client for MacOS X.
YuppY - 1st July 2005  [«« Reply to this]
First variant could be shorter:

re.compile(r'((?<=\W)|^)%s(?=\W|$)' % re.escape(word), re.I)

re.escape is necessary.
Peter Bengtsson - 1st July 2005   [«« Reply to this]
shorter but certainly less readable.
 
Name:
Email:
hide my email address.

Your email address will be encoded to prevent email-extraction spiders from reading it so you won't get spammed if you decide to show your email address.