Copy link to clipboard
Copied
Is there a process for "cleaning up" pdf's created from scaned documents? in this example ... an old scientific doument with symbols, latin names, etc.?
When I selected a section, then saved as PDF...... symbols, latin names, etc. are sometimes interperted incorrectly.
Any suggestions?
Copy link to clipboard
Copied
Please post the exact name of the Adobe program you use so a Moderator may move this message to that forum
Copy link to clipboard
Copied
Adobe Acrobat Pro DC
Copy link to clipboard
Copied
Might be a font issue, older TrueType or PostScript fonts that used the ASCii character set (https://www.asciitable.com/ ), versus today's OpenType fonts that are based on the Unicode character set (https://www.unicode.org). The computer industry adopted Unicode in January 2000. Although older TrueType and PostScript fonts can still be used, they're missing the advanced characters of Unicode, such as foreign language glyphs, math/science symbols, and dingbats.
If you look at the Fonts tab in File / Properties, tell us what fonts are listed.
Copy link to clipboard
Copied
This document was orginally published in 6 parts, (published between 1961 and 1968, in sweeden)
under Files / Properties / Fonts.... Adobe is identifying 8 font types:
Helvetia
Helvetica - Bold
Helvetica - Bold Oblique
Helvetica - Oblique
Times - Bold
Times - Bolditalic
Times - Italic
Times - Roman
I am also includeing one page before and after... (after selecting the page and saving as a new pdf)
FYI.... this publication is large.. 2 files (698 and 546 page)
Copy link to clipboard
Copied
I would recommend taking the original scans to something like Photoshop in order to clean them up, sharpen them, etc.
When done, convert them to a PDF file and then run Text Recognition on them. Acrobat is not really the tool to do the cleaning-up. It's not an image editor.
Copy link to clipboard
Copied
Before bringing the pages into Photoshop for clean-up (that's a lot of work for so many pages!), I'd try these 2 options first:
Because of the complexity of your content, I recommend the "Pro" versions of these programs rather than the cheaper versions. They have better recognition of unusual symbols, STEM characters, and languages, as well as controls for cleaning up the background crud that gets caught into a scan.