• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

Recognizing text in a image, how do you do it?

Status
Not open for further replies.

Wired

Member
So I was planning on making a PDF file, and I've scanned all the pages I'm going to use but I was thinking that I'd save some space if I was able to pull the text from the images and then just replacing the images with the text (where needed).

Am I making any sense, because I'm tired and you should forgive me!
 
it's called OCR, I think, and it's VERY difficult to do. Plus you would need some very expensive software to do it. Long story short - recovering text from an image is a pain in the ass. At least in the corporate world.
 
Nerevar said:
it's called OCR, I think, and it's VERY difficult to do. Plus you would need some very expensive software to do it.
The software that came with our scanner has the option. Far from completely accurate, though.
 
JoshuaJSlone said:
The software that came with our scanner has the option. Far from completely accurate, though.

yeah, well, there's always cheap stuff. My experience has been that the "free software" that comes with stuff is usually pretty bad at it, though.
 
/nod

a good OCR package (usually costing some money though never a ton) will always have good (but rarely ever perfect) results... it has been years since I've actually worked with them, but I remember one from years ago actually had a great success rate when I worked with it.. like 20+ documents and I could probably count the mistakes on just two hands...
 
It's been almost a decade since i've touched OCR software. i think it was Xerox's Textbridge software that worked best at the time. You can probably download a demo of various software, but OCR software in general works best with standard serif or sans serif fonts, no scripts or fancy fonts.

What's the final product going to be? CD-ROM? Web? Email? Might be best just to skip the OCR stage completely. You may be able to replace the scanned text with regular text, but unless you've got something that actually lets you edit PDfs instead of just making them, scanning and PDFing the pages is your only way.
 
OCR on straight text documents works pretty well these days... OCR on a combination text and graphics page? Meh... still pretty hit or miss.
 
DarienA said:
OCR on straight text documents works pretty well these days... OCR on a combination text and graphics page? Meh... still pretty hit or miss.
There should be an option to select regions to interpret as text.
 
aoi tsuki said:
There should be an option to select regions to interpret as text.

good packages have that... depending on what the actual region is... and how the text is laid on the graphic, that still may not help.
 
Status
Not open for further replies.
Top Bottom