Copy link to clipboard
Copied
What is the best way to create a high-quality, small PDF for a printed book that includes text and photos for long-term preservation? This PDF will also be subject to OCR.
Copy link to clipboard
Copied
Create from scratch? From an existing file? If so, in which format is it currently, etc.?
Generally speaking, any PDF format will do, but you can use PDF/A for archiving purposes.
Copy link to clipboard
Copied
TRY67 I am currently dealing with multiple cases. For the first case, I am scanning monographs from scratch. In the second case, I have received a collection of PDFs. The third case involves PDF files that are already part of my collection, which have been scanned at 300 DPI.
Copy link to clipboard
Copied
PDF/A was made for you: https://www.adobe.com/acrobat/hub/how-to-convert-pdf-to-pdfa.html
Copy link to clipboard
Copied
What format should my images be in when scanning documents: JPEG or PDF?
If I have a manuscript digitized in both high and low resolution, is PDF-A applicable as well?
Copy link to clipboard
Copied
- Either one will work. For higher quality use PNG, though.
- Yes, PDF/A is applicable for all resolutions.
Copy link to clipboard
Copied
Could PDF/X help me in this situation, or is PDF/A more recommended for preservation? If we need to share the digital content with users for printing, which format should I use?
Copy link to clipboard
Copied
Could PDF/X help me in this situation, or is PDF/A more recommended for preservation? If we need to share the digital content with users for printing, which format should I use?
PDF/X is a printing standard.
PDF/A is an archive standard.
Only you can answer this question, depending on the most frequent use to which it will be put and the type of printing envisaged.
Copy link to clipboard
Copied
You don't need to use PDF/X unless you do professional printing. PDF and PDF/A are more than enough for everyday purposes.
Copy link to clipboard
Copied
What format should my images be in when scanning documents: JPEG or PDF?
PDF is not an image format.
If you are scanning using Acrobat you can use JPEG or JPEG2000
If you convert/import pre-scanned image with/in Acrobat the settings of these preferences will guide the conversion:
Copy link to clipboard
Copied
If I have a manuscript digitized in both high and low resolution, is PDF-A applicable as well?
Yes, but you don't need to have 2 versions, a single version at 250 dpi is sufficient.
Copy link to clipboard
Copied
Thank you Mr. Boulay for your prompt replies. Your instructions are clear and I appreciate your support.
I have one last question regarding OCR: When I apply OCR to a PDF/A file that contains both images and text, the images lose quality and do not retain their resolution. I need to find a solution for two scenarios: existing PDFs and new documents that I want to scan. How can I perform OCR while preserving a high resolution for the photos?
Copy link to clipboard
Copied
You should not be editing a PDF/A file. Perform all the edits on the original PDF, and when you're done, save it as PDF/A.
Copy link to clipboard
Copied
I already tested OCR on pdf and pdf A and I found the same problem with a book including text and photos
Copy link to clipboard
Copied
TRY67 Thank you for your responses to all the questions.
Copy link to clipboard
Copied
How can I perform OCR while preserving a high resolution for the photos?
Copy link to clipboard
Copied
(Hmm, being on the West Coast, I only come in late on these threads. :D)
Fortunately, both @try67 and @JR Boulay have provided an excellent amount of information on the PDF/PDFA dynamics and how to maintain an image.
If you are scanning, the biggest issue people have is assuming that all one does is start the scanner, push the scan button, and enjoy the results.
Unfortunately, IF you want the best capture of an image, you may sacrifice the quality of the text and vice versa. Below I'll post a link to an article I wrote a number of years ago talking about how to get the best quality scan of text. Images are a whole different subject, but the general dynamics are similar. The more time you spend on the original setup on the scanner for each image BEFORE you press the start button, the better the image quality will be.
Think of it this way. If you take a camera and do not face a scene with your back to the sun, do not adjust for the kind of light you are encountering, don't set the aperture, etc., you cannot expect to fix it in Photoshop. When you just press the start button on your scanner, it's not all that different.
Many people do not care about the end quality of a scan; it sounds like you do.
Let me know if you have any questions on the scanning part of this.