Copy link to clipboard
Copied
Hi support community,
I use Acrobat Pro (on a Mac) and want to convert multiple JPEG files to PDFs. The JPEGs are of physical typed documents. So far, so easy. I can just go to File->Create->Combine Files into PDF and, in theory, job done. But here's the twist: I'd really like to turn the images from full color, high resolution, big sized pictures of documents into black and white, size adjusted, legible but manageable documents.
If it helps expalin what I mean, Adobe Scan has this feature already, albeit one that is hard to do at scale. As well as scanning from scratch, you can manually select and load already existing pictures from your library. The app then strips out needless colors from those pictures, autocrops and adjusts with Adobe Sensei, and gives you a document of a few KB that looks and handles like it has come from a flatbed scanner. My hope is that there is a way to do this or something quite like it on a desktop/laptop. But Adobe Scan isn't available on computers and I don't know how to mimic or recreate this feature without it.
Any ideas or help? Huge thanks in advance!
Copy link to clipboard
Copied
Hi Jnnnn,
My day got shifted so I have time to respond to this. Let me add that the images I received were much smaller than your original images and as such the quality of my PDF will be dramatically lower than what you'll be getting from the originals. (You show 3024 x 4043 pixels while I received 749 by 999). But no matter.
Let me add that there are several things that make OCR a challenge with these documents: one is that the angle of your photographs are tipped so that there is distortion in the image. The 2nd is that the light is not uniform across the image so while you can remove some of the image shadowing, there's only so far as you can go. But we can do as best as we can.
#1) open the image in Photoshop. Because the images are distorted you can both crop and get better alignment with the same tool. Go to the Tools and mouse down on the Crop tool and hold on it. It will turn into a dropdown menu and select the Perspective Crop Tool.
Now drag out a marquee on your image and place the corners of the crop tool on each of the corners of the page:
Now, go to the image menu and select "Grayscale," this will remove any color in the image.
Now go back to the Image menu shift to -> Adjustments and select Levels. This will bring up a histogram.
When an image has uniform shading, this works great, this image isn't great so the results will be mixed. The circle on the right side controls what's considered "white." Anything on the far right side is WHITE. Anything to the left of this is gradually getting darker and darker until you reach the far left and that is BLACK! What you can do by sliding the right slider to the left is reinterpreting what's considered what. What you want is to tell Photoshop to treat light gray as white. However, because the bottom is darker than the top, if you get the bottom white, than the top is over exposed. The best you can do here (without going into gradient overlays and if you're not strong on PS I will not go into that) is to find a compromise. You can also work with the middle slider, left and right and try to find a compromise.
Once you're done with the adjustments, Save this image as a TIF document. Do not worry about any of the secondary options, you'll be tossing this image in a moment.
Once done, with the save, drag this image onto the Acrobat Icon on your Dock. It will self open, convert to PDF AND process the OCR. (If you had saved it as a JPG, you'd have to do the OCR as an extra step, TIF documents do this automatically.)
For the most part, that's it. Let me know how it goes and how successful you are. If you still have some question, email me at <gary@thecoynes.com>. I'm more likely to see that message than wait for a message in the Adobe world.
Unfortunately because you had hand done the photos, there's not much you can automate. So put on some nice music, pour a glass of wine, and enjoy your process.
Copy link to clipboard
Copied
Hi Jnnn,
When you state "big szied pictures of documents," how big is big? Do you wish to get "down" to 11 x 8-1/2 inch documents?
I'm gathering you do not have a scanner? How were these images created?
As long as I'm asking questions, do you know the resolution of these images? That's important for improved quality of the OCR.
Some of this will best be done in Photoshop or GIMP or some other image manipulation program. Resizing in Acrobat is not really intended as a good approach. In addition, proper color desaturation and balancing of light and dark is SOOOOO much easier in Photoshop and essentially not really possible in Acrobat.
Looking forward to your answers here,
Copy link to clipboard
Copied
Thanks for the reply Gary, really appreciated. Let me try and answer the questions, but feel free to ask more if I fail to clarify.
1) How big? They're pictures taken with an iPhone (different models), with a size of 3024 × 4032, circa 2mb per image. Add that up per page image and you quickly get some pretty unwieldy pdfs. And the page dimensions are so large you need to zoom out to see the whole thing at one time, but this can make scrolling hard (lateral wiggle). I think that's the size I want to get down to, at least if that means a 'regular' sized pdf. As in, the size you would get if you print to pdf from Word, or download a PDF scan etc.
2) What's the resolution? I'm not sure. Is that different from the above size dimensions? I don't see that explicitly listed under the file info, but may be missing something.
3) Do I have a scanner? No. But these images are from archives that don't permit scanners. And if they did, the time that each scan takes per page would itself add up to a huge amount. I was advised some time ago to take raw images, as these would remain stable and of good quality as scanning technology changed. That was probably good advice, but I'm now hoping to make the resulting images easier to use.
I should say that I did once set up a script to try and redimension using Photoshop, but might have done it wrong. I don't remember the exact steps (and was following a mix of youtube how-to advice and some guess work), but I set photoshop to reduce the dimensions to something (guessed) manageable, and to turn the images to grayscale. This worked in the sense that the file sizes were reduced. But it didn't in the sense that the files were (1) still pretty big (the reduction was maybe to about 500mb per image, not bad but far larger than scans, acceptable except for the second problem) and (2) more importantly were very grainy/pixelated. They were legible, but noticeably degraded in quality. The closer I got to satisfactory quality, the closer I got to the original image size! And most of the reduction seemed to come from the size rather than the colors, which surprised me.
All this made me wonder if there was a way to do it differently, although maybe the same process done better could get the desired results. The main difference I see between what I did and the Scan app is that what I did reduced the size of the image indiscriminately, wheras the Scan app seems to be able to bleach out unnecessary detail without pixelating the actual text. And it can do this with some neat document edge recognition.
Hope this helps, sorry if now too detailed. Thanks for your support.
Copy link to clipboard
Copied
It sounds like you do have access to PS, you mention creating scripts for it.
Can you send me (via DM) one of the images? It will help me figure out the best way to get all the things you want.
to size an image in PS,
#1 Open the image into PS and open up Image size (Command-i).
#2 Set the resolution at 300 and the height at 10.5"
You can speed things up by making an Action for that. (no script necessary)
Thanks,
Copy link to clipboard
Copied
Thanks again Gary, I just sent you a DM with more info/pictures. Ironically I switched my subscription from the full CC suite to just Acrobat-related products a few months ago, I wasn't getting any real use out of the range. But if this is something that Photoshop can handle well then I can reactivate it to try. I've never used Gimp and have heard it's not very user friendly, but for something as specific and perhaps ultimately simple as this it might be a good or better option. I'll leave that to your better judgment!
Copy link to clipboard
Copied
It's a job to do with Photoshop, which allows you to edit the images and export them all as a single PDF
Copy link to clipboard
Copied
Thanks for the reply, JR. Could you elaborate? How would you do this with Photoshop? I won't repeat the same info here, but in my reply to Gary (above) you can see my previous attempt to do this with Photoshop, along with an explanation of why that attempt didn't really work. But as I also said there, I might have been doing it wrong so would really welcome any tips at recreating whatever it is the Adobe Scan app seems to do with photos!
Copy link to clipboard
Copied
HI Jnnn,
Please send me one example (DM) and I'll be able to help you.
Copy link to clipboard
Copied
You will get better answers here: Photoshop ecosystem 😉
Copy link to clipboard
Copied
Thanks—is there a specific thread you're refering to? I don't know what to search for as I don't know what Photoshop method you have in mind. Thanks in advance!
Copy link to clipboard
Copied
I meant: ask your question in this Photoshop forum
😉
Copy link to clipboard
Copied
Hi Jnnnn,
My day got shifted so I have time to respond to this. Let me add that the images I received were much smaller than your original images and as such the quality of my PDF will be dramatically lower than what you'll be getting from the originals. (You show 3024 x 4043 pixels while I received 749 by 999). But no matter.
Let me add that there are several things that make OCR a challenge with these documents: one is that the angle of your photographs are tipped so that there is distortion in the image. The 2nd is that the light is not uniform across the image so while you can remove some of the image shadowing, there's only so far as you can go. But we can do as best as we can.
#1) open the image in Photoshop. Because the images are distorted you can both crop and get better alignment with the same tool. Go to the Tools and mouse down on the Crop tool and hold on it. It will turn into a dropdown menu and select the Perspective Crop Tool.
Now drag out a marquee on your image and place the corners of the crop tool on each of the corners of the page:
Now, go to the image menu and select "Grayscale," this will remove any color in the image.
Now go back to the Image menu shift to -> Adjustments and select Levels. This will bring up a histogram.
When an image has uniform shading, this works great, this image isn't great so the results will be mixed. The circle on the right side controls what's considered "white." Anything on the far right side is WHITE. Anything to the left of this is gradually getting darker and darker until you reach the far left and that is BLACK! What you can do by sliding the right slider to the left is reinterpreting what's considered what. What you want is to tell Photoshop to treat light gray as white. However, because the bottom is darker than the top, if you get the bottom white, than the top is over exposed. The best you can do here (without going into gradient overlays and if you're not strong on PS I will not go into that) is to find a compromise. You can also work with the middle slider, left and right and try to find a compromise.
Once you're done with the adjustments, Save this image as a TIF document. Do not worry about any of the secondary options, you'll be tossing this image in a moment.
Once done, with the save, drag this image onto the Acrobat Icon on your Dock. It will self open, convert to PDF AND process the OCR. (If you had saved it as a JPG, you'd have to do the OCR as an extra step, TIF documents do this automatically.)
For the most part, that's it. Let me know how it goes and how successful you are. If you still have some question, email me at <gary@thecoynes.com>. I'm more likely to see that message than wait for a message in the Adobe world.
Unfortunately because you had hand done the photos, there's not much you can automate. So put on some nice music, pour a glass of wine, and enjoy your process.
Copy link to clipboard
Copied
A stellar reply, many thanks again Gary. More in the private thread.
If anyone stumbles on this post in the future I will add here, as I mentioned there, that this is the perfect solution for a few images. But my real problem is finding a good-enough solution for a quantity of images that can't be cropped or edited by hand. I actually put in a request to Adobe to add their Sensei AI technology to Acrobat Pro—that's the gadget that allows Adobe Scan do perform these functions automatically and at scale. I'm not holding my breath they'll act on or even read the suggestion box, but am crossing my fingers.
Thanks again for an excellent reply.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now