Unlike previous incarnations of Spellcorrector not it does not by default load the two huge language files for English and Swedish. Alternatively/additionally you can load your own language file. The difference between loading a language file and training on your own words is that trained words are always assumed to be correct.

Another major change with this release is that a pickle file is created once the language file or own training file has been parsed once. This works like a cache, if the original text file changes, the pickle file is recreated. The outcome of this is that the first time you create a Spellcorrector instance it takes a few seconds if the language files is large but on the second time it takes virtually no time at all.

So, recap, here are the different methods for loading the 'Spellcorrector':


>>> Spellcorrector('en')

>>> assert os.path.isdir('languagefiles')
>>> Spellcorrector('en', load_language_files=True)

>>> Spellcorrector('en', load_language_file='/home/peterbe/text.txt')

>>> Spellcorrector('en', own_training_file='/home/peterbe/names.txt')

The load_language_file expects a readable file full of text. The text doesn't have to be written as one word per line. All junk like punctuation and brackets and stuff is stripped.

The own_training_file has to be a file with one word per line. You can combine the two like this:


>>> Spellcorrector('en', load_language_file='/home/peterbe/text.txt',
                   own_training_file='/home/peterbe/names.txt')

There's also been a few other fixes and improvements. For example, there's now two basic unittests at the bottom of the file that might give some clues how it can work for you.

Download spellcorrector.py 0.2 I really ought to include this in PyPi. Something for my todo list.

Comments

bruno GALLART

Hi,
I am interesting by your personal's version of Peter Novig's corrector. I have tried it for my language of South of france (Occitan). I did a test with txt's file. It works good but in my language there are many letters like ò ó ì í ù ú à á è é ç .The correction's method or the suggestions's method, when there is a vowel stressed in the word, cut the word.I am not a very good pythoner and I don't know how resolve this little problem. Can you give me some hints ?
Compliments for your corrector,
Regards,
Bruno

Peter Bengtsson

It supports Unicode. But you'll have to modify it and write down the alphabet of your language.
Oh, and make sure you write the .txt file in UTF8!

bruno GALLART

Thanks for your answer, Peter. In the evening I looked after some informations for unicode etc... on Python and I think that the format's file is not UTF8 !
Thanks a lot,
Bruno

Your email will never ever be published.

Previous:
Ugliest site of the month - The Backyard Comedy Club September 21, 2007 Misc. links
Next:
Linux tip: du --max-depth=1 September 27, 2007 Linux
Related by category:
A Python dict that can report which keys you did not use June 12, 2025 Python
How I run standalone Python in 2025 January 14, 2025 Python
How to resolve a git conflict in poetry.lock February 7, 2020 Python
get in JavaScript is the same as property in Python February 13, 2025 Python
Related by keyword:
How to use django-cache-memoize November 3, 2017 Python, Django
Fastest Redis configuration for Django May 11, 2017 Python, Linux, Web development, Django
cache_memoize - a pretty decent cache decorator for Django September 11, 2017 Python, Web development, Django
django-cache-memoize October 27, 2017 Python, Django