Copy link to clipboard
Copied
If you are not an expert in acrobat pro dc, stop reading at this point. I have been on-line chatting for 3 hrs to no avail. But I have what I think should be a common problem. Please read carefully before I get to the questions.
Background:
I have use cloudHQ to convert gmail labels to pdfs. there are many options so I will focus on two.
Method 1. I convert all the emails in a label to a "single, combined" pdf. Let's say its 580 pgs long, and consisted of originally 316 emails. I know it is 316 because (a) gmail says the label had 316 emails, and (b) if you seach for a unique header text like "Date Received:" you will find 316 occurrences in the pdf.
Method 2. I convert all the emails in a lable to individual pdfs. And here is where the problem starts. I only end up with 310 pdfs. And if I merge the pdfs to a new single combined pdf, I get 539 pgs.
So clearly the application is screwy. Can't fix that. My problem is that I must find the 16 extra emails in the method 1 pdf, which will be 41 extra pages compared to Method 2.
Don't really have time to write a script, so I was hoping Acrobat might have a work around.
On the surface -- and only on the surface -- the pdf's look identical, and if I search, I can find eventually the 16 emails, but I need to automate this process for some 50000 emails.
When I use the Compare tool, it does not work. It ends up highlighting all sort of things that are not perceptible to the eye. Not a surprise. This is because the two methods create pdfs that "look the same" but I presume that are slight spacing differences and so on. so acrobat picks up all of this, and makes the compare tool not viable.
Suggestions:???
1) Convert the pdf's to image pdf's and then use the OCR tool within adobe to create new pdfs??
2) other,?? I tried flattening but no cigar.
3) or is there a way to tell acrobat in batch to print pages 14-16, 52-52, 106-122, etc, to individual pdfs. This would mean if I search by headers, I can write down the pgs numbers of the pdfs, and then print to the 16 email pdfs -- not great but a bit better.
4) Or??? I can determine the page numbers of the beginning of each email, and then batch print the method 1 pdf to individual pdfs, ending up with 316 pdfs. Maybe I can then put this in a folder and find the 16 emails...
5) can acrobat break up the pdf on a query term to find each first page?
thanks
Copy link to clipboard
Copied
Almost all of these things require a script. For example, #5 can certainly be done using a script. I've developed various tools that do that.
You say you don't have time to write one, but that is the most likely solution to your issues.
You can always hire someone to write it for you, of course.
Copy link to clipboard
Copied
I believe you’ve made some scripts try6767, yes? I sent you an email. saw your scroll script.
Copy link to clipboard
Copied
What compare settings do you use? Here are the default settings in Acrobat DC, but you may need to play with them.
Copy link to clipboard
Copied
Nope. does not help unless you can be more specific about successful settings. For two files identical to the eye, one with 16 of 316 extra pages, I get 8500+ differences. the compare tool is way to sensitive to be of practical utility on large projects.
Copy link to clipboard
Copied
I’m suggesting you experiment with settings. Have you tried it at all?
Copy link to clipboard
Copied
yes. of course. don’t mean to be so abrupt. just don’t find the compare tool to be very useful. not well thought out. too sensitive to be of practical use. If I need to compare 10 pos, sure, works great, tells me every difference, 98% of which are not much significance unless I am doing something forensic.
If I need to compare a 900 page document, and only scroll the pages, or find missing pages, does not work. I even tried printing to image and reenhancing OCR to remove content but did not help. the case above showed 8500 difference for what was essentially 16 extra pages, making them impossible to find.
but thanks. spent a day trying setting. adobe is supposed to get back to me, but does not sound promising
Find more inspiration, events, and resources on the new Adobe Community
Explore Now