Teach me about OCR

25 March 2006   0 comments   Linux

http://www.gnu.org/software/ocrad/ocrad.html

Mind That Age!

This blog post is 12 years old! Most likely, its content is outdated. Especially if it's technical.

I might soon need a good OCR program to read scanned in pages but these pages aren't perfectly scanned pages from a novel. The kind of pages I'm scanning are stuff like printed out invoices and other stuff like that with tables, headers, logos, footers, etc.

The only program I've looked ocrad and I've had pretty decent results with it. I did scan an invoice and thanks to a quick python script I was able to find out the correct rotation with a 57% confidence (the second best was 37%). That's a start. ocrad seems very flexible and quite active judging from the mailing list

I guess I need to do more research into tuning ocrad with the right charsets, image formats and some of the immediate options of ocrad before I give up. When I scanned my invoice, the words it found did look like words but not much qualitative could be used out of. The company that sent the invoice was for example not anywhere in the recognized words :(

What do people use out there? I bet Amazon didn't just use ocrad when they did their Search Inside the Book

Comments

Your email will never ever be published


Related posts

Previous:
To br / or not to br/ 23 March 2006
Next:
Merrill Lynch's f**ked up website 28 March 2006
Related by Text:
Be very careful with your add_header in Nginx! You might make your site insecure 11 February 2018
jQuery and Highslide JS 08 January 2008
I'm back! Peterbe.com has been renewed 05 June 2005
Anti-McCain propaganda videos 12 August 2008
I'm Prolog 01 May 2007