String comparison function in Python (alpha)

Saturday, Dec 22, 2007
7 comments Python

I was working on a unittest which when it failed would say "this string != that string" and because some of these strings were very long (output of a HTML lib I wrote which spits out snippets of HTML code) it became hard to spot how they were different. So I decided to override the usual self.assertEqual(str1, str2) in Python's unittest class instance with this little baby:


def assertEqualLongString(a, b):
   NOT, POINT = '-', '*'
   if a != b:
       print a
       o = ''
       for i, e in enumerate(a):
           try:
               if e != b[i]:
                   o += POINT
               else:
                   o += NOT
           except IndexError:
               o += '*'

       o += NOT * (len(a)-len(o))
       if len(b) > len(a):
           o += POINT* (len(b)-len(a))

       print o
       print b

       raise AssertionError, '(see string comparison above)'

It's far from perfect and doesn't really work when you've got Unicode characters that the terminal you use can't print properly. It might not look great on strings that are really really long but I'm sure that's something that can be solved too. After all, this is just a quick hack that helped me spot that the difference between one snippet and another was that one produced <br/> and the other produced <br />. Below are some examples of this utility function in action.

Beware, you can use this in many different ways that fits your need so I'm not going to focus on how it's executed:


u = MyUnittest()
u.assertEqualLongString('Peter Bengtsson 123', 'Peter PengtsXon 124'); print ""
u.assertEqualLongString('Bengtsson','Bengtzzon'); print ""
u.assertEqualLongString('Bengtsson','BengtzzonLonger'); print ""
u.assertEqualLongString('BengtssonLonger','Bengtzzon'); print ""
u.assertEqualLongString('Bengtssonism','Bengtsson'); print ""

# Results:

Peter Bengtsson 123
------*-----*-----*
Peter PengtsXon 124

Bengtsson
-----**--
Bengtzzon

Bengtsson
-----**--******
BengtzzonLonger

BengtssonLonger
-----**--******
Bengtzzon

Bengtssonism
---------***
Bengtsson

Comments

Post your own comment

Ivo December 22, 2007

I love writing python oneliners :)

".join([x[0]==x[1] and "-" or "*" for x in map(None, a, b)])

in python2.5 you can use the trinary op to make it slightly more elegant

Marius Gedminas December 22, 2007

Nice.

I've often used Python's difflib.ndiff to make test failures easier to understand, back when I wrote PyUnit-style unit tests. It was especially useful for multiline strings.

Nowadays with doctests getting readable diffs is easy (#doctest: +REPORT_NDIFF). Unless you're comparing several-thousand-line-long HTML pages in your functional tests.

pacificator December 22, 2007

while don't use the difflib builtin module?

Krys December 22, 2007

Hi Peter,

I did the same thing recently, but I used the built-in difflib module to show me where the differences are. Might be worth a look. :)

HTH

Krys December 22, 2007

Hi again,

Just FYI, I only posted a repeat of what others had already mentioned because with cookies off, I do not see any comments.

The warning about needing to turn on javascript in order to comment is nice, but with cookies off, you site looks like no one has commented.

Anyway, just thought I'd mention that. Sorry for the duplicate info.

Krys

Peter Bengtsson December 24, 2007

It's not about cookies, it's just that the caching is set to 1 hour which might have tricked you. I haven't had the time to find a good solution to this yet.

Ian Bicking December 26, 2007

lxml.html.usedoctest (and lxml.doctestcompare) implement a smarter comparison for HTML fragments. It's not perfect, but it's also somewhat less sensitive to unimportant differences in the HTML.

Previous:: isArithmeticExpression() in Javascript December 19, 2007 JavaScript
Next:: EditArea vs. CodePress January 3, 2008 Web development

Related by category:: A Python dict that can report which keys you did not use June 12, 2025 Python; Native connection pooling in Django 5 with PostgreSQL June 25, 2025 Python; How I run standalone Python in 2025 January 14, 2025 Python; How to resolve a git conflict in poetry.lock February 7, 2020 Python

Related by keyword:: Simple object lookup in TypeScript June 14, 2024 JavaScript; Fastest Python function to slugify a string September 12, 2019 Python; To assert or assertEqual in Python unit testing February 14, 2009 Python; Go vs. Python October 24, 2014 Python, Go

String comparison function in Python (alpha)

Comments

Related posts