DifferenceFinder (aka. humanreadablediff.py)

06 July 2006   4 comments   Python

/plog/humanreadablediff/test.html

Powered by Fusion×

I've just quickly put together a little script that computes the difference between two texts in a human readable format. The result when you run diff is a bit difficult to understand for a human being and I wanted something more "humane" that quickly summarises what's different on one simple line. Eg. "Added 2 lines, change 1 line".

This little script is going to be part an undo function in our new CMS that I'm working on. Instead of just pinpointing which revision date you want to go back to you'll also be able to see what the differences were between each revision in the undo history for the CMS.

It's important to note that my target usage is for a CMS where the texts to compare are average chunks of HTML. The script works like this:

>>> from humanreadablediff import compare
>>> before = open('version1.1.txt').read()
>>> after = open('version1.2.txt').read()
>>> compare(before, before)
No difference
>>> compare(before, after)
Added 2 lines, removed 1 line
>>> compare(after, before)
Added 1 line, removed 2 lines

To see it in action, use The test page

You can download the it here: humanreadablediff.py

Questions and challenges

It's not so easy to tell apart what is a change and what is a remove+add sometimes. If you for example start with:

Peter
David
Andrew

and change the text to:

Petter
David
Zahid

The result should be "Added 1 line, removed 1 line, changed 1 line", shouldn't it? My script claims to understand and spot that.

Another challenge is of course word wrapping. Imagine a text which is just one long line of about 240 characters. When you view it in a small textarea (typical of a CMS) it will appear to be 3 lines and if you make 3 changes in what to you appears to be three different lines, you'll expect the result "Changed 3 lines". I think I've got that under control too. Have a play with this text and play with the word wrap number.

Comments

Paddy3118
I am not sure of your result for Example1.

Gong from:
Peter
David
Andrew

To:
Peter
David
Zahid

Can be done with either:
A.1) Locate the Third line.
A.2) Change the line to be Zahid.
Or:
B.1) Locate the second line.
B.2) remove the next line.
B.3) Add a line after the current, of Zahid.

You could do sequence A followed by sequence B but sequence B after sequence A would not change the text.

So, the correct result for me would be either:
Removed 1 line, added 1 line.
Or:
Changed 1 line.
But not both.

Cheers, Paddy.
Jan Kokoska
Paddy, the line that was changed was the first line from "Petter" to "Peter", so this is OK.

I have more trouble understanding the 0'th example though.

It looks to me like either 2 added and 1 removed OR 2 changed and 1 added. One thing is for sure, the difference shows in THREE lines, so the explanation's numbers need to add up. What am I missing here?
Peter Bengtsson
Good point Jan. I'll adjust the distance calculation measure with this an an example. The numbers I used are rough guesses and I wanted to test my way into the best match.
Peter Bengtsson
Fixed that 0th example now.
Thank you for posting a comment

Your email will never ever be published


Related posts

Previous:
RememberYourFriends.com beta version 19 June 2006
Next:
Desired Firefox extension 09 July 2006
Related by keywords:
To readline() or readlines() 12 March 2004
bool is instance of int in Python 05 December 2008
Reciprocal lesson about gender perspectives 02 September 2011
Nginx vs. Squid 17 March 2009
How and why to use django-mongokit (aka. Django to MongoDB) 08 March 2010
Nasty surprise of Django cache 09 December 2008
IssueTrackerProduct now officially abandoned 30 March 2012
Google Calendar, iCalendar Validator but not bloody Apple iCal 09 April 2009
On the command line no one can hear you screen. Or can they? 03 May 2012
In Django, how much faster is it to aggregate? 27 October 2010
tempfile in Python standard library 07 February 2006
Random ID generator for Zope 02 September 2005