String comparison function in Python (alpha)

22 December 2007   7 comments   Python

Powered by Fusion×

I was working on a unittest which when it failed would say "this string != that string" and because some of these strings were very long (output of a HTML lib I wrote which spits out snippets of HTML code) it became hard to spot how they were different. So I decided to override the usual self.assertEqual(str1, str2) in Python's unittest class instance with this little baby:

def assertEqualLongString(a, b):
   NOT, POINT = '-', '*'
   if a != b:
       print a
       o = ''
       for i, e in enumerate(a):
               if e != b[i]:
                   o += POINT
                   o += NOT
           except IndexError:
               o += '*'

       o += NOT * (len(a)-len(o))
       if len(b) > len(a):
           o += POINT* (len(b)-len(a))

       print o
       print b

       raise AssertionError, '(see string comparison above)'

It's far from perfect and doesn't really work when you've got Unicode characters that the terminal you use can't print properly. It might not look great on strings that are really really long but I'm sure that's something that can be solved too. After all, this is just a quick hack that helped me spot that the difference between one snippet and another was that one produced <br/> and the other produced <br />. Below are some examples of this utility function in action.

Beware, you can use this in many different ways that fits your need so I'm not going to focus on how it's executed:

u = MyUnittest()
u.assertEqualLongString('Peter Bengtsson 123', 'Peter PengtsXon 124'); print ""
u.assertEqualLongString('Bengtsson','Bengtzzon'); print ""
u.assertEqualLongString('Bengtsson','BengtzzonLonger'); print ""
u.assertEqualLongString('BengtssonLonger','Bengtzzon'); print ""
u.assertEqualLongString('Bengtssonism','Bengtsson'); print ""

# Results:

Peter Bengtsson 123
Peter PengtsXon 124






I love writing python oneliners :)

".join([x[0]==x[1] and "-" or "*" for x in map(None, a, b)])

in python2.5 you can use the trinary op to make it slightly more elegant
Marius Gedminas

I've often used Python's difflib.ndiff to make test failures easier to understand, back when I wrote PyUnit-style unit tests. It was especially useful for multiline strings.

Nowadays with doctests getting readable diffs is easy (#doctest: +REPORT_NDIFF). Unless you're comparing several-thousand-line-long HTML pages in your functional tests.
while don't use the difflib builtin module?
Hi Peter,

I did the same thing recently, but I used the built-in difflib module to show me where the differences are. Might be worth a look. :)

Hi again,

Just FYI, I only posted a repeat of what others had already mentioned because with cookies off, I do not see any comments.

The warning about needing to turn on javascript in order to comment is nice, but with cookies off, you site looks like no one has commented.

Anyway, just thought I'd mention that. Sorry for the duplicate info.

Peter Bengtsson
It's not about cookies, it's just that the caching is set to 1 hour which might have tricked you. I haven't had the time to find a good solution to this yet.
Ian Bicking
lxml.html.usedoctest (and lxml.doctestcompare) implement a smarter comparison for HTML fragments. It's not perfect, but it's also somewhat less sensitive to unimportant differences in the HTML.

Your email will never ever be published

Related posts

isArithmeticExpression() in Javascript 19 December 2007
EditArea vs. CodePress 03 January 2008
Related by keywords:
Go vs. Python 24 October 2014
To assert or assertEqual in Python unit testing 14 February 2009
Careful with your assertRaises() and inheritance of exceptions 10 April 2013
hastebinit - quickly paste snippets into 11 October 2012
Highlighted code syntax in Keynote 30 August 2014
Mocking os.stat in Python 08 November 2009 - launched and ready! 06 April 2011
Google London Automation Test conference 08 September 2006