OCR results not good
-
When scanning newspaper text which I later intend to edit through Word, the results are not good. There are three options I can use: “Text and Graphic(s) as Image”, “Editable Text”, or “Editable Text with Graphic(s)”. The best results are generally obtained when using the last option. How can I improve the quality of text scans from newspapers? The computer is running Windows XP with a Hewlett Packard PSC 1200.
What you are using is Optical Character Recognition (OCR), a process whereby a text document can be scanned and instead of saving the scanned document as an image, it is saved as text which can be edited. This is much more difficult than scanning a document as a graphic, since the OCR software has to look at the document and examine the letters and words in the scanned document to determine the text. The OCR software which comes with the PSC 1200 is called Readiris OCR (www.irisusa.com). Generally the quality of text output you get from an OCR software package depends on the OCR software and the inbuilt engine for examining the scanned documents. That said, the format of the document being scanned also plays a part. Newspaper articles are sometimes difficult, as the articles can be in multiple columns and some OCR software packages do not recognise columns. Other difficulties with newspaper articles are the thin paper, which causes text on the other side of the article to come through which confuses the software and also the grey background of the newspaper which doesn’t give as good contrast with the black lettering compared with a document with a plain white background.
After reading the features of Readiris Pro 10, this version supports things such as multiple columns. However, it’s likely the version which came with your scanner is a lite version and does not have the full features of the Professional version. If you wanted to upgrade to the professional version, this costs US$152.00. As you can see, good OCR software is not cheap as you are paying for accuracy and speed. It may be worthwhile trying some different OCR packages to see which one suits your needs best. You should be able to get some free trial versions of OCR software from various manufacturers websites.
Before you do try other packages, there are a few tips which may help you scan newspaper articles using your existing software. Firstly, place a sheet of black paper (or cardboard) behind the newspaper article when scanning. This may help avoid print from the other side of the article coming through. Secondly, try scanning each column of the newspaper article individually, in case your OCR software has difficulty interpreting the multiple columns. Either select just one column to be scanned in the OCR software, or if such a function is not available, block out all columns except one on the scanner (e.g. using a blank sheet of paper). Thirdly, scan the text at 300 dpi or greater and also adjust the brightness, contrast and other settings to give the text a crisp look and also to make the background paper as clean as possible.