Copy link to clipboard
Copied
I vainly spent a lot of time on trying to get text of scanned documents look crispy.
For one reason or the other they look blurry.
Attached 3 examples.
It is screenshot of text from Word, saved as a PNG file
1. Left is the screenshot opened by a viewer
2. In the middle is that screenshot converted to PDF - r-click convert to PDF - no, or no significant, loss
3. I printed the PNG file, printout looked fine, and then scanned it - result on the right.
View is 100%, scan is 300dpi, document size A4 (210x297mm) scanner Epson 5620
FWIW I have put some screenshots here.
Hope someone has a clue...
Thanks in advance.
Copy link to clipboard
Copied
@gary_sc
For your info, the settings I am scanning with are in the 4th attachment. 300dpi. At that point there is 'Exact Search'
However, I just found out: I should disable optimize... resulting in a fine scan.
have added them as an attachment as I believe they don't come thru when pasting in this thread.
Pls do check out the attachments.
Copy link to clipboard
Copied
Hi there
Hope you are doing well and sorry to hear that.
Would you mind sharing the version of the Acrobat DC you are using? To check the version go to Help > About Acrobat and make sure you have the recent version 21.11.20039. Go to Help > Check for updates and reboot the computer once.
Also try to create the PDF from the scanner as described here https://helpx.adobe.com/acrobat/using/scan-documents-pdf.html and see if that works for you.
Also check for any missing/pending updates for scanner driver and firmware and try updating it and check.
Regards
Amal
Copy link to clipboard
Copied
Hi Adwul62,
Quick question for you. First, thanks for showing us your application settings, that helps. However, there was one I did not see. If you go to the Settings in OCR, you can get there one way here (there are several ways to this setting)
After opening that up, there are three ways to process the content, see below:
Which of the three ways is your system set to? Whatever it is set to, please try one of the others.
Also, I noticed that in your 3nd screenshot, you show using PNG format. Please try selecting TIF.
Lastly, In the last screenshot, there is an option to use the Scanner's software ("Show Scanner's User Interface.") Can you show what the default settings are when that box is checked?
Thank you,
Copy link to clipboard
Copied
Sorry for the delay. Was busy yesterday.
I am using Adobe Acrobat Pro 2020 (2020.004.30020) standalone version.
Also I am following the steps as per https://helpx.adobe.com/acrobat/using/scan-documents-pdf.html
This results in a poor quality scan, compared to when scanning the same document to .jpg first and convert the jpg to .pdf
First scanning to .jpg and subsequently combine those to a .pdf is a bit of a workaround.
@gary_sc
I'll have to go out again now, will do some further testing this afternoon.
Will get back on this.
Many thanks for the replies so far.
Copy link to clipboard
Copied
@gary_sc
For your info, the settings I am scanning with are in the 4th attachment. 300dpi. At that point there is 'Exact Search'
However, I just found out: I should disable optimize... resulting in a fine scan.
have added them as an attachment as I believe they don't come thru when pasting in this thread.
Pls do check out the attachments.
Copy link to clipboard
Copied
Hi Adwul62,
Glad you figured out a solution. And, I slapped my head as I should have noticed that.
During the initial scan, if you set Acrobat to Optimize image, it will decrease the quality of an image. Plus, since you had Exact Scan as the setting, it displays the scan of the words as opposed to an overlay of the text, which is what you get from Clear Scan.
IF you have lots of images and wish to cut the size of a document, then you can either use Clear Scan (unless you're a govt. agency), or perform a "Reduce Document Size" after the OCR process.
Here's a breakdown of the options when performing an OCR for future reference:
#1 - Provides an OCR output whose glyphs have no stroke or fill -- so, "invisible" or "hidden".
This method also dresses up the image a wee bit. Thus, an altered image rather than the exact image as provided by the scanner.
Consequently, #1 is typically not acceptable to a FedGov agency (or any entity with an interest in a document of record having the proper "provenance").
#2. An OCR output developed as in #1. But, the exact image remains untouched.
Typically this is what a FedGov agency requires if submitting a scanned image of text.
So, the original image out of the scanner maintains its integrity and the OCR output supports find / search.
#3 ClearScan - Introduced a few versions back. When the bit-map of a character's image is recognized that is replaced with a font (character glyph is seen as it has fill and stroke applied). What is not recognized is left. And more magic...
Bottom line - That image out of the scanner that *was* the exact replica of the hardcopy and thus a valid/legal document of record is blown away, gone, dent de lion in the wind eh. Typically not acceptable for something submitted to a FedGov agency.
Copy link to clipboard
Copied
Oh, one last comment: As you may know, JPG is a lossy image format. That means that it reduces the size of an image by losing parts of the content (which is why it's called a lossy format). It is very difficult to see the problems that JPG causes in an image, but it can be seen in images that do not fill the image such as text where there are regions of content (black and gray), and regions of no content (white). I STRONGLY prefer to scan only in the TIF format which is non-lossy and provides excellent resultant PDFs.
Here's a blog I wrote for Adobe on my scanning processes:
http://photosbycoyne.com/Gary's_Help/Scanning/clean-scanning.html
Copy link to clipboard
Copied
Hello again,
Thank you very much for your elaborate reply. Truly appreciated.
Usually, when scanning documents, the ADF of my printer is used (i.e. the Automatic Document Feeder).
Frankly, not being an expert at all, Acrobat is mostly set to use its default settings.
For example: vainly searched but did not find any settings relating to scanning, specifically how to disable 'Optimize Image' by default. It is something one tends to forget when starting the next scan, so the result may be blurry again, hence the scan should be performed again...
Any suggestions?
Another example: all these years, after scanning, or whatever, Acrobat showed the page in a different size (not 100%).
It could be 108%, or 112%, so I changed it to 100% in the dropdown.
Didn't know what and how to set this in the Preferences > Page Display. I know, it is just a minor thing, but still...
Just now, whilst writing here and going thru the settings again, I think... I have found it.
Peferences > Accessibility:
Override Page Display
v Always use Page Layout Style: Single Page
v Always use Zoom Setting: 100% < instead of 'Fit Page'.
As for the PDF size, to be honest, I do not care too much. Diskspace is cheap, some of my PDF files are 500-600 and one even 750MB.
As for PNG converting I have set this as below. I believe this is correct for the best possible results?
I know about TIFF being the best possible format (huge format), however, in my case, considering that usually any image that may require converting, are screenshots only.
Below are the settings.
Again thanks for your help!
Copy link to clipboard
Copied
Hi Adwul62,
OK, lots of stuff to cover here, let's see what I can do. First off, let me point out that I'm on a Mac, and my scanner is a flatbed photograph scanner (Epson V800), so my options are different from yours. As such, I cannot know for sure. So, I have no real knowledge of what to suggest so that you do not have to always reset the "Optimize Image." Please remember that Acrobat cannot scan and everything that you see is Acrobat dealing with your scanner's software.
The 100% issue, congrats, that's what you needed.
Disk space issues: Yes, you are correct about how cheap disk space is. However, there are other considerations to be aware of. The larger the document size, the more computer resources may be needed to process and work with that document. Add to the challenges that one has to electronically send that document to someone else (and the imposition to send a very large document to someone who may not have as much RAM or disk storage as you). I might be comparing apples to avocados here but my camera can take a 20 MB image. That's 5472 p x 3648 p. If I converted that image to the TIF (no compression) format, it's about 57 MB storage size. If I bring that down to 1500 pixels wide (the size one should bring it to for email purposes, as a TIF image it's now down to 4.3 MB. If I save that as a JPG, it's 480 kb. Now, I have lots of storage space but that does not justify sending you that image that's 57 MB. [Note: if you double the size of an image, the storage size quadruples.]
PNG format: be aware that there are two kinds of PNG: 8-bit and 24-bit. The former can contain up to 256 different shades and colors. The 24-bit can contain up to 12 million shades and colors. But if you have a document that is a text document saved in 24-bit, it will be significantly larger than the 8-bit. Unfortunately, Adobe does not specify which one they are using (and I do not know).
Here's another piece of trivia. when I scan a full page of text as a TIF document, it's about 8 MB. When I convert that into a PDF, it's about 80-150 kb. Same document. [Caveat to that: if I have lots of background imagery, that gets PDFed as well and the document can easily be 2-4 MB.
What all this is to say is that you need to experiment and see what works and what's best for you. Also, be considerate: what's best for you may not be the same as to whom you send a document. It's the "and then what" syndrome. If you send a 750 MB document to someone, then what do they do with it? How will/can you send it?
Hey, if it was all easy, how much fun would that be??
Copy link to clipboard
Copied
Thank you so much for your elaborate reply. As for the PNG, those are basically screenshots, hence quality, like with photos, do not play a role here.
The .jpg files are photos or scanned documents.
Checked out properties of canned documents, they are 2480x3507x24 JPEG
Checked out properties of photos and they read 4032x1908x24 JPEG
(Simple Samsung smartphone stuff. Sony camera 5472x3648x24 JPEG)
I remember the .raw files were 20-22MB but I deleted all those
[Off topic]
Those big .PDF files are archive files and are being indexed by X1 Search.
Can be really old stuff and are just for personal memory only. Correspondence, important events, purchases, you name it, they are all scanned.
Usually, when people try to remember things, the best they can come up with is, 'I think it must have been somewhere around 19...so-and-so'
Like, when and at what price did you buy your first car, or when did first started with a computer, or moving from one city to the other. It is always a bit of 'wild guess'.
Anyway, that aside, again many thanks for all the help!