Spellcorrector being used on my not-yet-released web app I think a lot of Python people have seen Peter Novig's beautiful article about How to Write a Spelling Corrector. So have I and couldn't wait to write my own little version of it to fit my needs.

The changes I added were:

  • Python 2.4 compatible
  • Uses a pickleable dict instead of a collection
  • Compiled a huge list of Swedish words
  • Skipped edit distances 2 of words longer than 10 characters
  • Added a function suggestions()
  • All Unicode instead
  • A class instead of a function
  • Ability to train on your own words and to save that training persistently

If you're still reading at this point it's quite likely that you're a coder so you'll prefer code to see how it works:

>>> from spellcorrector import Spellcorrector
>>> sc = Spellcorrector('en')
>>> sc.correct('caracter')
u'character'
>>> sc.correct(u'caracter')
u'character'
>>> sc.suggestions(u'caracter')
[u'character']
>>> sc.suggestions(u'spell')
[u'smell', u'shell', u'sell', u'spell', u'swell', u'spill', u'spells']
>>> sc.suggestions(u'spel')
[u'spell', u'sped']
>>> sc.suggestions(u'spel', detailed=True)
[{'count': 9, 'percentage': 90.0, 'word': u'spell'}, \
{'count': 1, 'percentage': 10.0, 'word': u'sped'}]
>>> # Physics database usage example
... 
>>> sc.correct('Planck')
u'black'
>>> sc.correct('Curie')
u'sure'
>>> sc.train(['Planck','Curie','Einstein','Heisenberg'])
>>> sc.correct('Planck')
u'planck'
>>> sc.correct('curie')
u'curie'
>>> sc.save('Physicist_words.txt')
>>> del sc
>>> file('Physicist_words.txt').read()
'planck\ncurie\neinstein\nheisenberg'

A lot more can probably be done to improve it but it works quite well as a foundation to an application that mimics Google's "Did you mean: ..." feature.

I've actually already implemented this on a search feature of a not-yet-launched website for art. Since the art site contains non-English names like "Corneille", "Doucet" or "Belartio" I had to train my spellcorrector for that particular application so that a perfectly fine search for "attentif" didn't become "Did you mean: _attentive_".

I'll blog more about that application once I get it up and running on a public domain.

To take this early code experiment for a spin download: spellcorrector-0.1.2.tar.bz2 (6.7Mb) spellcorrector-0.1.4.tar.bz2 (6.7Mb) spellcorrector-0.1.5.tar.bz2 (6.7Mb)

justin - 03 July 2007 [«« Reply to this]
Hi Peter,

This looks interesting. I'm working on a Zope application and am looking for spelling support for epoZ.

best,

Justin
Peter Bengtsson - 04 July 2007 [«« Reply to this]
That's now what this is for. For something for epoz I'd look for ispell or something.

BTW, why Epoz?? Why not ZTinyMCE?
Volkan - 28 February 2014 [«« Reply to this]
Unfortunately the download file is not found. How can I reach it? Thanks


Your email will never ever be published