Copy link to clipboard
Copied
I would appreciate some advice on the correct Workflow and Adobe apps to use when scanning loose pages with the intention of typesetting them into a book for personal use.
I have a favourite old mass market 80s paperback that has disintegrated and was never reprinted. It is substantially yellowed and the original printing was tightly spaced with slight showthrough of text from the opposite page. I can provide a picture of a sample page for assessment if it is permitted.
It is my intention to scan or image the pages, extract the text, check for errors then typeset it into a new document. It will then be laser printed and bound into a new hardback for personal use. Effectively I am moving the "story" from a decaying book into a new one.
I do not own a scanner but I do have an iPhone XS and iPad 8th Gen. I have already separated the book from its original spine so it is an ordered collection of loose, flat pages.
I already own Adobe Photoshop CS6 but do not have any of the other Adobe Cloud apps.
I understand the first step will probably be to use Adobe Scan to create a PDF of the book, chapter by chapter but I don't know what happens next. Do I pass the scans through Photoshop and convert to greyscale to help with subsequent OCR or do I just pass the Chapter PDFs from Adobe Scan into something like Adobe Acrobat Pro? I am not averse to purchasing a single month of the correct tool to get this work done properly.
My ultimate intention is to have an editable text copy of the Chapters so that I can proof-read and typeset them prior to printing.
Thank you for your advice as well as corrections to my assumptions.
Daniel.
Copy link to clipboard
Copied
Noble idea, but you've got a lot of hurdles to cover.
Create a setup where your phone is perfectly placed so that there is no distortion from an angle. I'd also set some sort of "stop" so that all pages are fairly much in the exact location. Some phones have remote control devices; see if your phone is one of them. Do plan on experimenting with how far your phone needs to be from the text. You do not want to introduce errors due to blurred images, nor do you want to introduce errors due to distortion.
You were dead-on to forget the binding. That will help tremendously.
You can try Adobe Scan and see what the quality and accuracy is by itself. To test this against actual images, take a few Adobe Scans of several pages and then take a couple of images of the same pages.
It's good you have PS-CS6, it's good enough to do what you'll need (you do not need AI). I'll put a link below for a blog I wrote for Adobe on how to get the best scanned image. Since you do not own a scanner, you cannot do what I suggest in the blog specifically, but you can make the corrections in PS-CS6. From PS, save the resultant images in the TIF format and number them consecutively when you drag these onto the Adobe Acrobat Pro icon in the Dock (for Mac) or the App itself (PC), the TIF images will automatically be converted into PDF AND automatically OCR them.
____
Wait, do you own Acrobat Pro? What do you have to work with other than a phone and iPad. Do you have Acrobat Scan? If you do not have Acrobat Pro, how do you expect to access the many pages of documents you're about to create?
Meanwhile, here's the link I mentioned.
Good luck!
Copy link to clipboard
Copied
Noble idea, but you've got a lot of hurdles to cover.
Create a setup where your phone is perfectly placed so that there is no distortion from an angle. I'd also set some sort of "stop" so that all pages are fairly much in the exact location. Some phones have remote control devices; see if your phone is one of them. Do plan on experimenting with how far your phone needs to be from the text. You do not want to introduce errors due to blurred images, nor do you want to introduce errors due to distortion.
You were dead-on to forget the binding. That will help tremendously.
You can try Adobe Scan and see what the quality and accuracy is by itself. To test this against actual images, take a few Adobe Scans of several pages and then take a couple of images of the same pages.
It's good you have PS-CS6, it's good enough to do what you'll need (you do not need AI). I'll put a link below for a blog I wrote for Adobe on how to get the best scanned image. Since you do not own a scanner, you cannot do what I suggest in the blog specifically, but you can make the corrections in PS-CS6. From PS, save the resultant images in the TIF format and number them consecutively when you drag these onto the Adobe Acrobat Pro icon in the Dock (for Mac) or the App itself (PC), the TIF images will automatically be converted into PDF AND automatically OCR them.
____
Wait, do you own Acrobat Pro? What do you have to work with other than a phone and iPad. Do you have Acrobat Scan? If you do not have Acrobat Pro, how do you expect to access the many pages of documents you're about to create?
Meanwhile, here's the link I mentioned.
Good luck!
Copy link to clipboard
Copied
Thank you - that is very helpful.
Funnily enough I have already created a "tripod" for the iPad... I have a sheet of plain paper with a border drawn to the paperback page size for alignment, a piece of bookboard to facilitate lining it up and four tins of tomatoes balancing a heavy choppingboard at the perfect height. When the iPad is laid flat it is absolutely stable and level at the correct height so that the page edges are just inside the border of the camera tool.
At the moment I have not downloaded Adobe Scan for iOS but am about to. I already own PS CS6 but I do not (yet) own a license for Adobe Acrobat Pro. I am glad to know it is the appropriate tool so will probably subscribe for a single month.
While I do own a digital camera it is substantially older than either my iPhone or iPad and had difficulty focussing at the extremities of the image due to distortion. Even its built in focus stacking could not really help with it much. Both the iPhone and iPad cameras gave much crisper results.
I am happy to pay for the app if it will simplify the workflow. When I was processing textures for a mod for Skyrim I purchased nD0 from Quixel for making normal maps and it saved a TON of effort and time! Well worth it.
Here is a sample page showing the extent of the yellowing I have to deal with. The paper was so soft that it only took five strokes of a craft knife (Olfa) to cut through the entire book! The paper at the lower edge of the spine also just crumbled away. The show-through is not too bad but is there. There is also a lot of texture on the paper as it was definitely a cheap book! Annoyingly the typesetting has the author and book title alternately across the head of each facing page so there will be a lot of editing to do. Length is 175 pages over 23 chapters.
Copy link to clipboard
Copied
Thank you for sending me your sample page, it said a lot. The one thing you HAVE to overcome is your lighting. Your light is coming from one angle, and that creates a very textured surface. That will kill your attempts to do what you want.
You will need something like this I'm showing you. I'm not saying you need THIS one, just something like it:
https://www.amazon.com/Magnectic-Compatible-Cellphones-Recording-Photography/dp/B0BVMG4SVZ/ref=sr_1_...
Meanwhile, after you've taken the photos, you'll need to open up PS, go into Image (menu) -> Desaturate, and then go into Image (menu) -> Adjustments -> Levels.
I've never had to do this before on any image but I found that running this through levels twice did give me a "better" image, but the resultant OCR was still dreadful in both cases.
First run:
 2nd run:
 But I repeat, the OCR was beyond dreadful. This is because of the (mentioned) texture creating pixels within the font causing nonsense text.
To be honest, consider this whole process a learning process to see what can be done. If you're successful, GREAT. If not, you've had a lot of fun seeing what can be done. Win-win. Meanwhile, I strongly suggest that you go to https://www.abebooks.com/ and see if you can find a copy of your book in better condition if all you want is a better quality for reading.
Good luck!
Copy link to clipboard
Copied
Now we have some evening light that is more diffuse I can get better pictures but the paper texture still really comes through despite my efforts in Photoshop. I have downloaded Adobe Scan and got a couple of pages into a PDF although I have not yet activated Adobe Acrobat Pro until I am sure it can make sense of what I will be feeding it from Photoshop (as numbered TIFFs of course).
I know what you mean by the OCR being thrown by the paper texture. If you want a laugh, give that image to FreeOCR and see the total garbage it comes up with!
The issue with going to AbeBooks is that there is no way to check the yellowing of a copy before purchase. Given that the book was cheap and produced in 1985 they will all be just as deteriorated. I have a feeling that my copy is just too far gone.
Oddly enough the "Find Text" feature within the iOS Photo App seems to be very accurate but I am struggling to get the text to the clipboard then into a text file because I am not that familiar with the swipe gestures to swap between apps seamlessly. I think the gestures are different between iPad and iPhone too so I am a bit out of date at using the devices effectively. It seems a very clumsy way of getting single pages into text but would be preferable to just sitting down and typing the whole book out!
Copy link to clipboard
Copied
OK, one last thing for you: I was very surprised with the results.
One of the products that I buy for various functions is Snaggit. (https://www.techsmith.com/). It's not horribly expensive and I use it for Screenshots both for personal and in many of the things I do (the earlier screenshots are from Snaggit).
One of its functions is to "Grab Text." I tried it on the 2nd screenshot above, copied the text, and pasted it into Word. There were two issues: one is that every line ends with a paragraph stop and the next line is proceeded with a bunch of spaces that need to be removed. The other issue is that the font size changes all over the place. But both of these are very easy to correct (on a global basis) in Word (or any other word processing application).
Since I did this on the sample I took from you with the poor lighting, it's worth checking into.
Good luck!

