Tag: CMS

Using Optical Character Recognition (OCR): Observations

In my current contract I had the opportunity to work with optical character recognition (OCR). We had over 50 documents in paper format that were published before 1991 that needed to get digitized and published on the internet. While these documents were old, they have really in-depth knowledge that simply needed to be shared with the world. OCR, however, has its quirks and is not all that straight forward. Some are due to the age and handling of the original documents over the years, and some are due to the original typographical or layout decisions of the original publishers. No matter the reason, they are not to be found and you need these documents on the internet, so the monkey is now on your back.