Copy link to clipboard
Copied
I have a copy of a PDF that I downloaded from Google Books. It's a book from 1897 (Matheseos Libri VIII edited by Kroll and Skutsch), and has been scanned but the quality is pretty good - even at 800% in Adobe Acrobat there are no obvious artifacts, and only a few occasional inky marks. It's 295 pages, and the file is just over 6Mb.
When I use the "Enhance Scans" tool and select "Recognize Text" to recognise text and create a searchable PDF, it works fine - but now the file does have some artifacts at high magnification, and the searchable PDF is 264Mb. This is more than a forty-fold increase.
Any ideas why, or isn't "enhance scans" the best way of creating a searchable PDF?
Chris.
Copy link to clipboard
Copied
Hi Chris,
Sorry for the delay in response.
"choose the setting searchable image" is same as "recognize text" or OCR in Acrobat Pro DC. I also tried to replicate the issue with the files you shared on windows 10 machine with acrobat pro dc, but the issue was not reproducible.
Please find below the link for the file.
You may try changing the setting for the OCR and check if that helps.
Thanks,
Shivam
Copy link to clipboard
Copied
Hi Chris,
Sorry for the delay in response.
As per the issue description mentioned above, file size increase when running ocr on the pdf, is that correct?
Refer to the following forum threads discussing the similar topic:
PDF file size increases several folds after OCR
https://acrobatusers.com/forum/general-acrobat-topics/file-size-after-ocr/
Thanks,
Shivam
Copy link to clipboard
Copied
Thanks for getting back, Shivam.
I didn't understand the advice in the first link - it says "choose the setting searchable image". I don't have that on my version of Acrobat - I'm using Adobe Acrobat Pro DC. The only OCR option I can see is "Enhance Scans", which then lets me recognise text. Is there another way of doing OCR?
When I do that, and Save As a new file, this is when I get the massive searchable file.
Here's a shorter example - it's an extract from a catalogue in the public domain, in the British Library. This is a 4-page PDF:
https://drive.google.com/open?id=1UEI806v44FwAjQtLHq23Dnxzcrulf4iG
It's 155KB.
After doing Enhance Scans, I get a searchable version:
https://drive.google.com/open?id=17q9brvKxbPSEomCuSjqFmzfZCcJvWJ7H
Copy link to clipboard
Copied
Hi Chris,
Sorry for the delay in response.
"choose the setting searchable image" is same as "recognize text" or OCR in Acrobat Pro DC. I also tried to replicate the issue with the files you shared on windows 10 machine with acrobat pro dc, but the issue was not reproducible.
Please find below the link for the file.
You may try changing the setting for the OCR and check if that helps.
Thanks,
Shivam
Find more inspiration, events, and resources on the new Adobe Community
Explore Now