Copy link to clipboard
Copied
I receive pdf files from a client's service and have no part in their creation. The files are all text and searchable but they are "read" as images. For example, a 54 pg all text, black and white mostly double spaced document should not be 12 mb.
After checking Save as Optimized, then clicking Audit space usage, there was 97.69% used by images, 0.46% used by fonts and the image count was 54, confirming that every page is being read as an image (as if I didn't see the blue box around the page anyway). Optimizing only does so much and sometimes very little to reduce the size.
If I go to Recompress Images, it reduces the file size but the font quality is bad. I have adjusted numerous items in the drop down menus but the result is bad quality, no reduction or a bigger file. Below is the latest trial & error of the optimize screen:
The client's service is not the problem. They add one cover page (which does not show as an image) so the document is being sent to them or uploaded this way. Short of printing and rescanning to pdf (which is not happening), how can I reduce the file size to something closer to what it should be? Is there a way to convert the images so they are read as text pages and not seperate images? Any other suggestions are welcome as I've tried quite a few things that did not work. And please be kind, I am an end user....haha.
Copy link to clipboard
Copied
".. should not be 12 mb."
If the pages are scanned images, which it sounds like they are, that size is not at all out of line at all for a 54 page document. That's less than 250KB per page, which is about right for a page-size 300 dpi grayscale image saved at Medium JPEG quality.
In any case, the file has been OCR'd so has a layer of text over top of the image for searchablity, but that addition to file is negligible. Your suggestion to "print out and rescan" is a useless endeavour as your resulting PDF would still be scanned images.
Right now, your recompression settings are leaving things pretty much the same: although you downsampled to 150ppi your quality setting is High. this will give you only a slightly smaller file size compared to an original 300ppi saved at Medium. Also, this is assuming the original was above 200ppi, otherwise nothing will change, in fact may get larger due to the High setting.
What I would try FIRST is NO Downsampling and change the Quality settings to Medium to Low. If the page is pretty much black text on white background, the lower quality will show less than a reduced resolution.
In the end, images can only be compressed only so much.
As for changing everything to text, that's a whole other ball-game and is probably not worth your time.
Copy link to clipboard
Copied
can you show a sample page? I'm curious if there is a "shadow" of the original document left. What I'm speaking of is like what happens when you scan a newspaper and the gray of the newspaper page itself is also in the scan.
Do you have anything like that on the page?
[Note: I'm traveling and will probably not see this until tomorrow evening. Sorry]
Copy link to clipboard
Copied
Sorry I had to redact but here is a half page sample from a 39 page document that is 16 mg:
Copy link to clipboard
Copied
Sorry for being nitpiki, but do you happen to have a screenshot of the edge of the page that was scanned and some of the blank screen? That is, overlapping the border of the scan. What I'm trying to see is if one can "see" the scan of the paper. That would mean that it is a picture of the paper and having a full page picture would significantly increase the size of the document (even after OCR-ing the page). One other thing to do is to delete the background of the page (a different layer) and see what that does to the side.
I'd explain how to do that but off the top of my head I do not remember. Right now I'm using my wife's computer and I do not keep any of my software on her computer. (Sorry).