Crosstips.org Crosstips.org

My fun Crossword solver project. Crosstips.org & Krysstips.se

Kung Fu Kung Fu

Fujian White Crane Kung Fu

Fry-IT

Fry-IT is the company I work for

Photos Photos

Photoalbum, both old and new.

Zope Zope

What I have and am doing with Zope

Receptsamlingen Receptsamlingen

In Swedish only. About my "Collection of Recipes" website.

Contact me Contact me

My contact details and how to contact me.

  Mobile version of this page Mobile version of this page


 

Spellcorrector

18th of April 2007

Spellcorrector being used on my not-yet-released web app I think a lot of Python people have seen Peter Novig's beautiful article about How to Write a Spelling Corrector. So have I and couldn't wait to write my own little version of it to fit my needs.

The changes I added were:

If you're still reading at this point it's quite likely that you're a coder so you'll prefer code to see how it works:

 >>> from spellcorrector import Spellcorrector
 >>> sc = Spellcorrector('en')
 >>> sc.correct('caracter')
 u'character'
 >>> sc.correct(u'caracter')
 u'character'
 >>> sc.suggestions(u'caracter')
 [u'character']
 >>> sc.suggestions(u'spell')
 [u'smell', u'shell', u'sell', u'spell', u'swell', u'spill', u'spells']
 >>> sc.suggestions(u'spel')
 [u'spell', u'sped']
 >>> sc.suggestions(u'spel', detailed=True)
 [{'count': 9, 'percentage': 90.0, 'word': u'spell'}, \
 {'count': 1, 'percentage': 10.0, 'word': u'sped'}]
 >>> # Physics database usage example
 ... 
 >>> sc.correct('Planck')
 u'black'
 >>> sc.correct('Curie')
 u'sure'
 >>> sc.train(['Planck','Curie','Einstein','Heisenberg'])
 >>> sc.correct('Planck')
 u'planck'
 >>> sc.correct('curie')
 u'curie'
 >>> sc.save('Physicist_words.txt')
 >>> del sc
 >>> file('Physicist_words.txt').read()
 'planck\ncurie\neinstein\nheisenberg'

A lot more can probably be done to improve it but it works quite well as a foundation to an application that mimics Google's "Did you mean: ..." feature.

I've actually already implemented this on a search feature of a not-yet-launched website for art. Since the art site contains non-English names like "Corneille", "Doucet" or "Belartio" I had to train my spellcorrector for that particular application so that a perfectly fine search for "attentif" didn't become "Did you mean: _attentive_".

I'll blog more about that application once I get it up and running on a public domain.

To take this early code experiment for a spin download: spellcorrector-0.1.2.tar.bz2 (6.7Mb) spellcorrector-0.1.4.tar.bz2 (6.7Mb) spellcorrector-0.1.5.tar.bz2 (6.7Mb)


Comment

justin - 3rd July 2007  [«« Reply to this]
Hi Peter,

This looks interesting. I'm working on a Zope application and am looking for spelling support for epoZ.

best,

Justin
Peter Bengtsson - 4th July 2007   [«« Reply to this]
That's now what this is for. For something for epoz I'd look for ispell or something.

BTW, why Epoz?? Why not ZTinyMCE?
 
Name:
Email:
hide my email address.

Your email address will be encoded to prevent email-extraction spiders from reading it so you won't get spammed if you decide to show your email address.