Boy did that shut me up! The \b special character i python regular expressions is so useful. I've used it before but have forgotten about it. The following code:

def createStandaloneWordRegex(word):
   """ return a regular expression that can find 'peter'
   only if it's written alone (next to space, start of 
   string, end of string, comma, etc) but not if inside 
   another word like peterbe """
   return re.compile(r"""
     (
     ^ %s
     (?=\W | $)
     |
     (?<=\W)
     %s
     (?=\W | $)
     )
     """% (re.escape(word), re.escape(word)),
           re.I|re.L|re.M|re.X)

can with the \b gadget be simplified to this:

def createStandaloneWordRegex(word):
   """ return a regular expression that can find 'peter'
   only if it's written alone (next to space, start of 
   string, end of string, comma, etc) but not if inside 
   another word like peterbe """
   return re.compile(r'\b%s\b' % word, re.I)

Quite a lot simpler isn't it? The simplified passes all the few unit tests I had.

Matt Schinckel - 30 June 2005 [«« Reply to this]
Excellent! I was wondering how to do this for a script (http://schinckel.blogsome.com/2005/06/27/ecto-auto-abbracronym/) that automatically adds abbr and acronym tags to text in ecto, a blogging client for MacOS X.
YuppY - 01 July 2005 [«« Reply to this]
First variant could be shorter:

re.compile(r'((?<=\W)|^)%s(?=\W|$)' % re.escape(word), re.I)

re.escape is necessary.
Peter Bengtsson - 01 July 2005 [«« Reply to this]
shorter but certainly less readable.


Your email will never ever be published