Spellcorrector

18 April 2007   3 comments   Python

Powered by Fusion×

Spellcorrector being used on my not-yet-released web app I think a lot of Python people have seen Peter Novig's beautiful article about How to Write a Spelling Corrector. So have I and couldn't wait to write my own little version of it to fit my needs.

The changes I added were:

If you're still reading at this point it's quite likely that you're a coder so you'll prefer code to see how it works:

>>> from spellcorrector import Spellcorrector
>>> sc = Spellcorrector('en')
>>> sc.correct('caracter')
u'character'
>>> sc.correct(u'caracter')
u'character'
>>> sc.suggestions(u'caracter')
[u'character']
>>> sc.suggestions(u'spell')
[u'smell', u'shell', u'sell', u'spell', u'swell', u'spill', u'spells']
>>> sc.suggestions(u'spel')
[u'spell', u'sped']
>>> sc.suggestions(u'spel', detailed=True)
[{'count': 9, 'percentage': 90.0, 'word': u'spell'}, \
{'count': 1, 'percentage': 10.0, 'word': u'sped'}]
>>> # Physics database usage example
... 
>>> sc.correct('Planck')
u'black'
>>> sc.correct('Curie')
u'sure'
>>> sc.train(['Planck','Curie','Einstein','Heisenberg'])
>>> sc.correct('Planck')
u'planck'
>>> sc.correct('curie')
u'curie'
>>> sc.save('Physicist_words.txt')
>>> del sc
>>> file('Physicist_words.txt').read()
'planck\ncurie\neinstein\nheisenberg'

A lot more can probably be done to improve it but it works quite well as a foundation to an application that mimics Google's "Did you mean: ..." feature.

I've actually already implemented this on a search feature of a not-yet-launched website for art. Since the art site contains non-English names like "Corneille", "Doucet" or "Belartio" I had to train my spellcorrector for that particular application so that a perfectly fine search for "attentif" didn't become "Did you mean: _attentive_".

I'll blog more about that application once I get it up and running on a public domain.

To take this early code experiment for a spin download: spellcorrector-0.1.2.tar.bz2 (6.7Mb) spellcorrector-0.1.4.tar.bz2 (6.7Mb) spellcorrector-0.1.5.tar.bz2 (6.7Mb)

Comments

justin
Hi Peter,

This looks interesting. I'm working on a Zope application and am looking for spelling support for epoZ.

best,

Justin
Peter Bengtsson
That's now what this is for. For something for epoz I'd look for ispell or something.

BTW, why Epoz?? Why not ZTinyMCE?
Volkan
Unfortunately the download file is not found. How can I reach it? Thanks
Thank you for posting a comment

Your email will never ever be published


Related posts

Previous:
Albert Einstein figurine head 12 April 2007
Next:
Guess my age with MOBi 21 April 2007
Related by keywords:
To readline() or readlines() 12 March 2004
bool is instance of int in Python 05 December 2008
Reciprocal lesson about gender perspectives 02 September 2011
Nginx vs. Squid 17 March 2009
How and why to use django-mongokit (aka. Django to MongoDB) 08 March 2010
Nasty surprise of Django cache 09 December 2008
IssueTrackerProduct now officially abandoned 30 March 2012
Google Calendar, iCalendar Validator but not bloody Apple iCal 09 April 2009
On the command line no one can hear you screen. Or can they? 03 May 2012
In Django, how much faster is it to aggregate? 27 October 2010
tempfile in Python standard library 07 February 2006
Random ID generator for Zope 02 September 2005