[metro] Re: Fwd: intro

From: Thomas Nikjoo <thomas.nikjoo@xxxxxxxxx>
To: metro@xxxxxxxxxxxxx
Date: Wed, 26 Apr 2006 12:36:36 +0100

hi guys

Ive looked through the intro its good ive made some changes so ive attached the changes, take a look ill go over it again and see if it needs more going over but that will be later as ive just looked at it. now to the manual have fun and see you soon -- Thomas Nikjoo thomas.nikjoo@xxxxxxxxx{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}} {\*\generator Msftedit 5.41.15.1507;}\viewkind4\uc1\pard\qc\ul\f0\fs22 An Introduction to Optical Character Recognition\par \par \pard\par \ulnone\tab Optical character recognition, also known as, OCR is a system used to translate images of text to computer editable text or to translate text into a code, usually ASCII or UNICODE. OCR now covers two specific terms optical and digital character recognition. The optical side of the recognition uses mirrors and lenses, the digital side uses scanners and algorithms.\ul\par \ulnone\tab In the 1950\rquote s a cryptanalyst, David Shepard, who worked for the NSA was asked by Frank Rowlett, who broke the Japanese Purple Diplomatic code during the war, to create an automation procedure. This procedure consisted of the problem of converting printed messages into computing language. In turn this created \ldblquote GISMO\rdblquote . \ldblquote GISMO\rdblquote made the Washington daily news on 27\super th\nosupersub April 1950 and was patented. Subsequently David Shepard founded IMR, Intelligent Machines Research Corporation and created the first OCR system in commercial operation.\par \tab IMR sold their first system to readers digest in 1955. The second was sold to Standard Oil company in California. It was used to read credit card carbon imprints for billing purposes. IMR sold a many other systems. A bill stub reader was another that went to Ohio Bell Telephone Company, and a page scanner to the U.S Air force for reading and transmitting teletype typewritten messages.\par \tab From selling these machines the company International Business Machines was created which is better know as IBM. This was also patented under David Shepard\rquote s OCR patent.\par \tab Since 1965 the U.S and Canadian post have used OCR to sort out their mail in the sorting office. The system reads addresses and prints barcodes on the letters in a Ultra violet ink so that it does not distort the text that is already written and it can only be visible under a U.V light. This barcode represents the postal code, which means less expensive equipment can recognise this code and sort out the post accordingly.\par \tab Type written text recognition is considered a solved problem in the commercial world of OCR, but the world of handwritten text still needs to be conquered. Handwritten text recognition exists already in Palm tops and certain mobile phones, but has not been transfered to script or scanned documents.\par \tab As mentioned previously, early OCR systems used lights and mirrors to function. They contained fixed slits for light to pass through and a moving disk with addition slits. A reflected Image was broken into discrete bits of black and white data. This was then presented to a photomultiplier tube and converted into electronic bits. Logic converted these black and white codes to a specially designed character set. The second batch of OCR machines, which were used in the 1960\rquote s, contained cathode ray tubes, but still used light and photomultipliers. This gave more flexibility in the location of the data across the source of the image. This new system was know as Curve following. The third generation systems used photo diode arrays. Diodes were lined up and reflected images of the source document were passed through them at a predetermined speed. The diodes were sensitive to Infa-red light so red ink was used as the non readable colour.\par \par \ul Things to consider when creating an OCR\par \ulnone\tab\par \pard{\pntext\f0 1\tab}{\*\pn\pnlvlbody\pnf0\pnindent360\pnstart1\pndec } \fi-360\li1080\tx1080 How is the input file created?\par {\pntext\f0 2\tab}What style of fonts are to be used?\par \pard{\pntext\f0 2\tab}{\*\pn\pnlvlbody\pnf0\pnindent360\pnstart2\pndec } \fi-360\li1080\tx1080 Does the text need to be in a \ldblquote non-reading colour\rdblquote ?\par \pard{\pntext\f0 3\tab}{\*\pn\pnlvlbody\pnf0\pnindent360\pnstart3\pndec } \fi-360\li1080\tx1080 What size should the text be?\par \pard{\pntext\f0 4\tab}{\*\pn\pnlvlbody\pnf0\pnindent360\pnstart4\pndec } \fi-360\li1080\tx1080 Where should the data be read from on the document?\par \pard{\pntext\f0 5\tab}{\*\pn\pnlvlbody\pnf0\pnindent360\pnstart5\pndec } \fi-360\li1080\tx1080 Will the system be effiecient?\par \pard{\pntext\f0 6\tab}{\*\pn\pnlvlbody\pnf0\pnindent360\pnstart6\pndec } \fi-360\li1080\tx1080 The system will probably not be 100% accurate?\par \pard\par \tab A very important part of OCR is the type of font that is selected to read. Being able to read and interpret the correct characters are essential to allow the document to read correctly throughout its translation. Common mistakes include misinterpreting the \ldblquote\'a3\rdblquote symbol for the letter \rdblquote E\rdblquote . Also exclamation marks can be mistaken for an \ldblquote I\rdblquote and a full stop.\par \tab OCR's play an important role in the computer tech industry. The future holds still one significant step and that is to optically/digitally read handwritten work accurately. This is one part of the OCR which hasn\rquote t been fully discovered yet, but with increasing processing power and speed, and general ingenuity CR will be fully operational and at the forefront of data entry.\par \par \par \par http://www.aimglobal.org/technologies/othertechnologies/ocr.pdf\par http://en.wikipedia.org/wiki/Optical_character_recognition\par \par }

References:
- [metro] intro
  - From: Anesh Ram
- [metro] Fwd: intro
  - From: Anesh Ram

[metro] Re: Fwd: intro

Other related posts: