Sunday, January 11, 2009

Choosing the best OCR software for you

Proved to be a haunting task for me (at least). I was trying to get some scanned data into excel. However, multiple problems:
- scanning quality (some pages were too dark for OCR)--> hence, unusable
- miss-recognition (getting an O instead of 0 and Gs instead of 6ers can become a common pain in the @@$)
- complex tables (this is the biggest challenge, since messing up the columns is hard to fix post-processing).
Choosing an OCR software
I have tried several software (both under Win and Mac): first my personal favorite..a small OCR simple program called Able2Extract Professional. Obviously it does the work for simple tables usually from clean-cut pdfs. In most cases, doesn't go beyond that. Then moved up to the big guns: ABBYY Finereader Pro 9.0 , IRIS.Readiris.Pro.v11.5.6 and OmniPage.Professional.v16.0. However, none of the above blew me away. Abby is terrible slow but seems to have a bit more options for customization, Omnipage is the fastest and the best quality, but I had trouble in doing what I wanted to.
Conclusion:
In the end, none of the above could do both a FAST and HI-Q OCR recognition of text considering the difficulties associated with my PDF files. My Chinese names and other foreign firms were painful to distinguish even for me, not to mention any OCR soft, thus in the end, I opted for manual data recognition (MDR) and I just entered it myself in Excel. Lots and lots of hours and nerves wasted, but in the end I think it would have taken the same (or more) hours just to follow, correct and change the OCR outputs.

No comments: