no ocr does not work well against capchas, blurry lines running through the font mess up any ocr program totally.
Macarty, Jay {PBSG} wrote:
I agree with inthane; I suppose it is possible. Although, given that it isn't usually hard to throw off most OCR packages with some unusual font or background, I'd not have high expectations of getting the code right. -----Original Message----- From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of inthaneelf Sent: Saturday, September 15, 2007 8:14 PM To: programmingblind@xxxxxxxxxxxxx Subject: Re: j Macarty, could use your assistance! was: Re: Introducing PDF2OCR and seeking testersits possible, stress the possible, I would try selecting the graphic, copying it and pasting it into an open paint file, so your not trying to dig the image out of the surrounding text that you already have access to, then run printscreen on it there.and I say might, because some places, like Microsoft I know, use blurry backgrounds and oddly formed but readable (to someone with really good eyesight) forms of the numbers and letters, as well as sometimes putting something crossing the background to blur the letter formation, to keep them from being scanned and auto inputted by programs, which is the whole reason there doing that, to block automated generation bots.HTH, inthane . For Blind Programming assistance, Information, Useful Programs, andLinks to Jamal Mazrui's Text tutorial packages and Applications, visit me at:http://grabbag.alacorncomputer.com . to be able to view a simple programming project in several programming languages, visit the Fruit basket demo site at: http://fruitbasketdemo.alacorncomputer.com----- Original Message ----- From: "Dale Leavens" <dleavens@xxxxxxx>To: <programmingblind@xxxxxxxxxxxxx> Sent: Saturday, September 15, 2007 10:31 AMSubject: Re: j Macarty, could use your assistance! was: Re: Introducing PDF2OCR and seeking testersSo, Is it likely this could be used to read those graphics used to secureWeb sites, the image of a password or number the user must enter to gain admission?Dale Leavens, Cochrane Ontario Canada DLeavens@xxxxxxx Skype DaleLeavens Come and meet Aurora, Nakita and Nanook at our polar bear habitat.----- Original Message ----- From: "Jamal Mazrui" <empower@xxxxxxxxx>To: <programmingblind@xxxxxxxxxxxxx> Sent: Saturday, September 15, 2007 12:28 PMSubject: RE: j Macarty, could use your assistance! was: Re: Introducing PDF2OCR and seeking testersThanks, Jay and Inthane. I did some web research and found out the following. The current version is PrintKey Pro 1.04, costing about $20 andapparentlypublished in 2003. A 30-day, fully functioning demo is available from http://www.warecentral.com I installed the demo and converted the CHM help file to text in the archive at http://www.empowermentzone.com/pkey_doc.zip Several graphics output types are possible, though Microsoft Document Image format is not mentioned in the documentation or listed in the options dialog. TIFF is an output type, which I then tried to converttotext with Tesseract-OCR. Unfortunately, the result was gibberish,eventhough Tesseract recognized it as a TIFF type it could handle. Isuspectthe problem may be one of resolution since Tesseract reported only 96MDI.Unfortunately, however, I have not found a setting that can increaseTIFFresolution. If anyone else does, please let me know. Although the program would not be free, at a modest cost one could tie together components that let one perform OCR on a window that a screen readerfailsto incorporate readably in its off screen model. The converted textcouldautomatically be presented in a message box, multiline edit box, orJAWSuser buffer. Jamal P.S. I was confused by the one from Grab Bag partly because the PrintScreen key was not bringing up the program for me, and there hadnotbeen any messages when I ran the program. It turns out that mycomputerdoes not let PrintKey grab the PrintScreen key, but this is a knownissuewith some computers, so the program allows one to change the hot keys.Ilater learned about Alt+PrintScreen for the active window only, whichdoeswork in the current demo. The program is also available in the system tray. On Sat, 15 Sep 2007, Macarty, Jay {PBSG} wrote:Date: Sat, 15 Sep 2007 07:48:50 -0500 From: "Macarty, Jay {PBSG}" <Jay.Macarty@xxxxxxxx> Reply-To: programmingblind@xxxxxxxxxxxxx To: programmingblind@xxxxxxxxxxxxx Subject: RE: j Macarty, could use your assistance! was: Re: Introducing PDF2OCR andseekingtesters inthane, You have summarized the features of PrintKey nicely; thanks. Just a couple of additional notes. 1. The most recent version of PrintKey is a licensed software package(Idon't recall the cost at this point). The latest free version was version 4.0 which is the one I posted about originally. 2. My most common method for invoking PrintKey to capture a window istolaunch the software by selecting the printkey.exe program in Windows Explorer or FileDir. Then, I normally use the keystroke to captureonlythe specific window with focus; the key is alt+printscreen. That wayIonly get the window I am interested in and not everything around it. 3. Once you press printscreen or alt+printscreen, you will hear asoundlike a camera and then the PrintKey window will pop up. I then select the print button and my printer dialog is set to go to a .MDIdocumentinstead of a regular printer. It should be noted that one otherusefulaction when the PrintKey dialog comes up is to press ctrl+c to sendthewindow image to the clipboard. Then, you can paste it into a documentoran e-mail. 4. When the print selection creates the .MDI document, you will automatically be placed in the Microsoft image viewer program. Youthenselect the Tools menu and press enter on Send this image to a Word document. You will then be asked if you want to perform OCR on the image. I am not sure if you can override the OCR program the viewer invokes at this point but it would be worth researching. I'm thinking that, if PrintKey has the ability to save the image in another format such as tif, it might be possible to try running the image through the tool included in Jamal's PDF2OCR package tocompare.My general impression is that the OCR included with Office Toolsisn'tthe greatest. PrintKey gives you the ability to play with someaspectsof the image in order to get it like you want and some of thosesettingmay allow for better OCR translation. I just haven't been able tofindwhat might improve it. -----Original Message----- From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:programmingblind-bounce@xxxxxxxxxxxxx] On Behalf OfinthaneelfSent: Saturday, September 15, 2007 1:58 AM To: programmingblind@xxxxxxxxxxxxx Subject: Re: j Macarty, could use your assistance! was: Re:IntroducingPDF2OCR and seeking testers Jamal, all I have at this time is the information that j Macarty's E mailgave,which I included in the text around the link. its a stand alone application, and if I remember right, one starts the application by entering onit,then uses the screen print key on the image one is trying to OCR and I believe the application will open then to ask what you want to do with this captured image. and I believe that you save it as a .MDI document, then youopenthat file which should start the correct Microsoft program to OCRthefile and render it for your reading. if j catches this post maybe he can clarify this whole thing further than this, and I'll nab his explanation and put it with the applicationforfolks to get with it. I set it up and ran through it once, found that it seemed to beworkingalright, and haven't put it back on my newly reloaded machine yet duetothe fact that I rarely deal with screen images, the old "once in aharvestmoon" scenario. there is stuff on the web about it though, I found the app in asearch after J's post on the list about it and its value to him. sorry I can't be of more help at this point, but if he doesn't have,orhe misses this post and doesn't expand on it, I will see what I canfind.inthane ----- Original Message ----- From: "Jamal Mazrui" <empower@xxxxxxxxx> To: <programmingblind@xxxxxxxxxxxxx> Sent: Friday, September 14, 2007 8:21 PM Subject: Re: Introducing PDF2OCR and seeking testersThanks for the info. Can you distribute more documentation withit?Iguess PrintKey.exe is the utility, itself, rather than aninstaller.When I ran it, there were no messages, either from a GUI or acommandprompt. I tried typical help command-line parameters to no avail.Whatis .mdi format (I have not heard of it before)? Can any version of Microsoft Word read it? How does one do this with Word? Assuming the program works as you have described (though I need tolearnspecific steps of invocation), it seems considerably different inpurposethan PDF2OCR. It seems that PrintKey is for learning about ascreenimage, whereas PDF2OCR is for learning about a PDF file. BothinvolveOCR(though PrintKey does not do this, itself), but the types ofinformationto be accessed in a typically temporarl, static screen image areusuallydifferent than the content of a potentially large or formal pieceofwriting in a PDF. Jamal On Fri, 14 Sep 2007, inthaneelf wrote:Date: Fri, 14 Sep 2007 16:03:32 -0700 From: inthaneelf <inthaneelf@xxxxxxxxxxxxxx> Reply-To: programmingblind@xxxxxxxxxxxxx To: programmingblind@xxxxxxxxxxxxx Subject: Re: Introducing PDF2OCR and seeking testers you can google for it, but I have put it up on my grab bag sitefordownload, and the text that accompanies the link tells you how touseitfor this purpose. its a self contained executable, that you generally would go andclick onto use, I put it in a folder in my programs files folder andgenerallyput ashortcut to it in CO:\Documents and Settings\[my user namedfolder]\StartMenu\Programs\Startup or for a universal (all users of the computer run on it: CO:\Documents and Settings\All Users\Start Menu\Programs\Startup this loads it at start up and makes it available by just hittingtheold"print screen" button on your keyboard. but you can just make ashortcutto it anywhere, and start it if you need it, so your not loadingunneededitems at boot up. HTH, Inthane . For Blind Programming assistance, Information, Useful Programs,andLinks to Jamal Mazrui's Text tutorial packages and Applications, visitmeat:http://grabbag.alacorncomputer.com . to be able to view a simple programming project in severalprogramminglanguages, visit the Fruit basket demo site at: http://fruitbasketdemo.alacorncomputer.com ----- Original Message ----- From: "Eileen Lafond" <Eileen.Lafond@xxxxxxxxxxx> To: <programmingblind@xxxxxxxxxxxxx> Sent: Friday, September 14, 2007 2:38 PM Subject: Re: Introducing PDF2OCR and seeking testers Where do you get print screen? Eileen La Fond Phone (206) 386-0011 e.mail Eileen.LaFond@xxxxxxxxxxx"inthaneelf" <inthaneelf@xxxxxxxxxxxxxx> 9/14/2007 2:29 PM >>>I understand, print screen is a small utility that Jay Macarty ,putoutas a interim solution for folks that had to deal with screen imagessenttothem by clients, to OCR them, so I thought in the spirit of seeinghowthis free OCR module that your using did compared to other scannersthatarefree also I would run it through it. and yes, the setup I am talking about is for OCRing a completelygraphicimage and attempting to render it into a text version. I wasn't trying to be offensive Jamal, but testing your programagainstsome others isn't a bad idea, to give you an idea of how well it, andthisfreebee OCR module does compared to other things out here, thatarefree,or included in suite's that a lot of us have? inthane . For Blind Programming assistance, Information, Useful Programs,andLinks to Jamal Mazrui's Text tutorial packages and Applications, visitmeat:http://grabbag.alacorncomputer.com . to be able to view a simple programming project in severalprogramminglanguages, visit the Fruit basket demo site at: http://fruitbasketdemo.alacorncomputer.com ----- Original Message ----- From: "Jamal Mazrui" <empower@xxxxxxxxx> To: <programmingblind@xxxxxxxxxxxxx> Sent: Friday, September 14, 2007 5:29 AM Subject: Re: Introducing PDF2OCR and seeking testersFYI -- the 1.0 release version is considerably better than thebetaversion due to performing OCR on a page-by-page basis and to the convenience of converting any number of PDFs in a directory withasingle command. PDF2OCR does work on text-based as well asimage-basedPDFs, but the results on text-based ones are generally not asgoodaswith other utilities that extract the text directly rather than analyzing the picture of each page. Jamal On Thu, 13 Sep 2007, inthaneelf wrote:Date: Thu, 13 Sep 2007 15:52:12 -0700 From: inthaneelf <inthaneelf@xxxxxxxxxxxxxx> Reply-To: programmingblind@xxxxxxxxxxxxx To: programmingblind@xxxxxxxxxxxxx Subject: Re: Introducing PDF2OCR and seeking testers and print screen, lets see how this freebee works out, smile and Jamal, I appreciate what your working on, just want to makesureyour getting at least 60% of what your hoping for out of it. inthane . For Blind Programming assistance, Information, UsefulPrograms,andLinks to Jamal Mazrui's Text tutorial packages and Applications,visitmeat: http://grabbag.alacorncomputer.com . to be able to view a simple programming project in several programming languages, visit the Fruit basket demo site at: http://fruitbasketdemo.alacorncomputer.com ----- Original Message ----- From: "Lloyd Rasmussen" <lras@xxxxxxxxxxx> To: <programmingblind@xxxxxxxxxxxxx> Sent: Thursday, September 13, 2007 3:47 AM Subject: RE: Introducing PDF2OCR and seeking testersIf someone wants to submit a sample, I can run it through thetools that come with OmniPage 15. Also, someone else should run thesamplethrough the image printer and OCR built into MS Word. Lloyd Rasmussen, Kensington, Maryland Home: http://lras.home.sprynet.com Work: http://www.loc.gov/nls-----Original Message----- From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:programmingblind- bounce@xxxxxxxxxxxxx] On Behalf Of Jamal Mazrui Sent: Thursday, September 13, 2007 5:42 AM To: programmingblind@xxxxxxxxxxxxx Subject: Re: Introducing PDF2OCR and seeking testers I would expect better results from Kurzweil 1000, which uses arguably the highest grade commercial OCR technology. The free AdobeReaderdoes not do OCR, so would not make the sample file accessible. Jamal__________ View the list's information and change your settings at //www.freelists.org/list/programmingblind__________ View the list's information and change your settings at //www.freelists.org/list/programmingblind__________ View the list's information and change your settings at //www.freelists.org/list/programmingblind__________ View the list's information and change your settings at //www.freelists.org/list/programmingblind __________ View the list's information and change your settings at //www.freelists.org/list/programmingblind __________ View the list's information and change your settings at //www.freelists.org/list/programmingblind__________ View the list's information and change your settings at //www.freelists.org/list/programmingblind__________ View the list's information and change your settings at //www.freelists.org/list/programmingblind __________ View the list's information and change your settings at //www.freelists.org/list/programmingblind__________ View the list's information and change your settings at //www.freelists.org/list/programmingblind__________ View the list's information and change your settings at //www.freelists.org/list/programmingblind __________View the list's information and change your settings at //www.freelists.org/list/programmingblind__________View the list's information and change your settings at //www.freelists.org/list/programmingblind
__________View the list's information and change your settings at //www.freelists.org/list/programmingblind