06 July 2006 4 comments Python
I've just quickly put together a little script that computes the difference between two texts in a human readable format. The result when you run diff is a bit difficult to understand for a human being and I wanted something more "humane" that quickly summarises what's different on one simple line. Eg. "Added 2 lines, change 1 line".
This little script is going to be part an undo function in our new CMS that I'm working on. Instead of just pinpointing which revision date you want to go back to you'll also be able to see what the differences were between each revision in the undo history for the CMS.
It's important to note that my target usage is for a CMS where the texts to compare are average chunks of HTML. The script works like this:
>>> from humanreadablediff import compare >>> before = open('version1.1.txt').read() >>> after = open('version1.2.txt').read() >>> compare(before, before) No difference >>> compare(before, after) Added 2 lines, removed 1 line >>> compare(after, before) Added 1 line, removed 2 lines
To see it in action, use The test page
You can download the it here: humanreadablediff.py
Questions and challenges
It's not so easy to tell apart what is a change and what is a remove+add sometimes. If you for example start with:
Peter David Andrew
and change the text to:
Petter David Zahid
The result should be "Added 1 line, removed 1 line, changed 1 line", shouldn't it? My script claims to understand and spot that.
Another challenge is of course word wrapping. Imagine a text which is just one long line of about 240 characters. When you view it in a small textarea (typical of a CMS) it will appear to be 3 lines and if you make 3 changes in what to you appears to be three different lines, you'll expect the result "Changed 3 lines". I think I've got that under control too. Have a play with this text and play with the word wrap number.