text sharpening on scanned docs

Feldgrau's WWII operational map project, map research, archives, tools and techniques, and research requests.

Moderator: Abicht

Post Reply
MadDog
Associate
Posts: 666
Joined: Thu Feb 20, 2003 7:39 pm

text sharpening on scanned docs

Post by MadDog »

On docs like these, what do people do to sharpen the text for OCR ? I am not having too much luck in Photoshop trying to get some NARA docs to sharpen up enough.

thanks,

Mad Dog
phylo_roadking
Patron
Posts: 8459
Joined: Thu Apr 28, 2005 2:41 pm

Re: text sharpening on scanned docs

Post by phylo_roadking »

MD - sometimes there is simply no way round the issue :( It's been a problem I've come across VERY regularly now for years. For speed and to limit the size of files i.e. to cram them onto a CD, images scanned by a "commercial" source are usually at a set - and very limited - DPI...

...and to put it simply, to "massage" an image successfuly, the more raw data i.e. the higher the DPI, the better. That's why when you're scanning something yourself - as time and filesize is no option, you scan at the highest resolution you can. It may be slow to scan...but a 1MB scanned image of a picture just gives your photo manipulation software of choice more to work with for sharpening edges etc. than a 100KB image :(

Working with something like NARA scans - or any commercially available photo compendiums etc. - you're limited by the resolution they've chosen to scan at. Especially when you're THEN trying to improve the image further for a SECOND quality-dependent exercise like OCR.
"Well, my days of not taking you seriously are certainly coming to a middle." - Malcolm Reynolds
John P. Moore
Author & Moderator
Posts: 1868
Joined: Thu Jan 02, 2003 10:40 pm
Location: Portland, Oregon & France

Re: text sharpening on scanned docs

Post by John P. Moore »

Here is what I do when scanning microfilm. After scanning the document in Greyscale at 4800 DPI un PhotoShop, I next crop the image. Then magnify the document some so you can clearly see the text charachters. Then I go to IMAGE, then Adjustments and Lighten the image until the text charachter lines are clear. After that, I darken the text using the Black and Midtone sliders to obtain the most pleasing image. On some bady damaged/faded documents you will need to select sections of the document to digitally enhance. If the frame is especially light or dark to begin with after running it through the Preview mode, use the Histogram tool to lighten or darken the frame before doing the final scan.

After you do this a few thousand times, you will become fairly skilled.

John
phylo_roadking
Patron
Posts: 8459
Joined: Thu Apr 28, 2005 2:41 pm

Re: text sharpening on scanned docs

Post by phylo_roadking »

After scanning the document in Greyscale at 4800 DPI un PhotoShop
This is indeed the secret; putting as much data at your disposal as you can :up: That's giving your software the maximum to play with. There's another thread on here where you describe the whole process, isn't there? Specifications of kit you use etc....

You're scanning from the microfilm - MadDog wasn't clear; MadDog, are you scanning from microfilm like John or from their pre-scanned CDs of images?
"Well, my days of not taking you seriously are certainly coming to a middle." - Malcolm Reynolds
MadDog
Associate
Posts: 666
Joined: Thu Feb 20, 2003 7:39 pm

Re: text sharpening on scanned docs

Post by MadDog »

I am currently trying to massage pre-scanned images. I can sharpen things a bit just using Adjust Brightness-Contrast, but thats about as far as I have gotten. Sharpen or Unsharp Mask/etc filters dont do much.

Honestly, there is probably no digital magic that gets me out of having to hand copy the text, then translating.

Ultimately, I would love to OCR the text parts.

thanks,

Mad Dog
John P. Moore
Author & Moderator
Posts: 1868
Joined: Thu Jan 02, 2003 10:40 pm
Location: Portland, Oregon & France

Re: text sharpening on scanned docs

Post by John P. Moore »

The sharpen tools in PhotoShop would just add noise to Text. After you open the document page in PhotoShop use the Adjust Image tool as I previously described to sharpen the text. Then print the page out and scan it thru a good OCR program (set for the language on the document) like OmniPage Pro and you should get a fairly good result, but you may need to spend a lot of time on the edits.

John
MadDog
Associate
Posts: 666
Joined: Thu Feb 20, 2003 7:39 pm

Re: text sharpening on scanned docs

Post by MadDog »

John, when you say "Lighten", you mean using the Hue/Saturation setting, or the Brightness/Contrast ?

Black and Midtone sliders - from Color Balance ?

needless to say, I am a hack at Photoshop.

thanks,

Mad Dog
Post Reply