Support NeoGAF

Wired · Feb 4, 2005

So I was planning on making a PDF file, and I've scanned all the pages I'm going to use but I was thinking that I'd save some space if I was able to pull the text from the images and then just replacing the images with the text (where needed).

Am I making any sense, because I'm tired and you should forgive me!

Nerevar · Feb 4, 2005

it's called OCR, I think, and it's VERY difficult to do. Plus you would need some very expensive software to do it. Long story short - recovering text from an image is a pain in the ass. At least in the corporate world.

Cauliflower of Love · Feb 4, 2005

JoshuaJSlone · Feb 4, 2005

Nerevar said:
it's called OCR, I think, and it's VERY difficult to do. Plus you would need some very expensive software to do it.

The software that came with our scanner has the option. Far from completely accurate, though.

Nerevar · Feb 4, 2005

JoshuaJSlone said:
The software that came with our scanner has the option. Far from completely accurate, though.

yeah, well, there's always cheap stuff. My experience has been that the "free software" that comes with stuff is usually pretty bad at it, though.

borghe · Feb 4, 2005

/nod

a good OCR package (usually costing some money though never a ton) will always have good (but rarely ever perfect) results... it has been years since I've actually worked with them, but I remember one from years ago actually had a great success rate when I worked with it.. like 20+ documents and I could probably count the mistakes on just two hands...

aoi tsuki · Feb 4, 2005

It's been almost a decade since i've touched OCR software. i think it was Xerox's Textbridge software that worked best at the time. You can probably download a demo of various software, but OCR software in general works best with standard serif or sans serif fonts, no scripts or fancy fonts.

What's the final product going to be? CD-ROM? Web? Email? Might be best just to skip the OCR stage completely. You may be able to replace the scanned text with regular text, but unless you've got something that actually lets you edit PDfs instead of just making them, scanning and PDFing the pages is your only way.

DarienA · Feb 4, 2005

OCR on straight text documents works pretty well these days... OCR on a combination text and graphics page? Meh... still pretty hit or miss.

aoi tsuki · Feb 4, 2005

DarienA said:
OCR on straight text documents works pretty well these days... OCR on a combination text and graphics page? Meh... still pretty hit or miss.

There should be an option to select regions to interpret as text.

Azih · Feb 4, 2005

There probably is in the better OCR software.

DarienA · Feb 4, 2005

aoi tsuki said:
There should be an option to select regions to interpret as text.

good packages have that... depending on what the actual region is... and how the text is laid on the graphic, that still may not help.

Support NeoGAF

Recognizing text in a image, how do you do it?

Wired

Member

Nerevar

they call me "Man Gravy".

Cauliflower of Love

Banned

JoshuaJSlone

Member

Nerevar

they call me "Man Gravy".

borghe

Loves the Greater Toronto Area

aoi tsuki

Member

DarienA

The black man everyone at Activision can agree on

aoi tsuki

Member

Azih

Member

DarienA

The black man everyone at Activision can agree on

Similar threads