[accessiblelinux] Re: help using gimp for OCR

  • From: David Ring <n1ea@xxxxxxxx>
  • To: accessiblelinux@xxxxxxxxxxxxx
  • Date: Tue, 11 Aug 2009 00:03:40 +0800

The ocr programs like tersseract and others have a clean up function
automatically.

Using GIMP is to labor intensive for books, but perhaps the contrast and
shadow of the page can be edited out for a few pages.  I've done that, and
saved the pages in the format that tesseract needed.

Best wishes,

David Ring
-30-

On Mon, Aug 10, 2009 at 9:14 PM, <aerospace1028@xxxxxxxxxxx> wrote:

>  greetings,
> Is anyone here familiar with using gimp (or any other image processor) to
> clean up scanned pages for OCR?  I'm trying to find a way to enhance some
> scanned text for better OCR results (in this case infty-reader, but in the
> future I want to try ocropus, teseract and a couple other linux OCR
> engines).
>
> I followed the directions from http://www.gimp.org/tutorials/Basic_Batch/and 
> created a "batch-unsharp-mask.csm" file and ran gimp from the command
> line as suggested.  I've experimented with the radius, amount and threshold
> values, but the enhanced image always comes out worse than the unenhanced
> image.  Does anyone know if there's a different filter that would work
> better for text images? or what values would sharpen a 600 dpi black and
> white image?
>
> Also, I can't find the list of functions and their descriptions that the
> tutorial refers to.  I want to also make a batch script to roatate images
> 180 degrees for the upsidedown pages.
>
> Thanks in advance for any help you can provide:-)
> ------------------------------
> Get your vacation photos on your phone! Click 
> here.<http://windowsliveformobile.com/en-us/photos/default.aspx?&OCID=0809TL-HM>
>

Other related posts: