Copy link to clipboard
Copied
I have a batch of PDF documents that were created from scans and OCR'd. As a next step I need to save them as text, but before I do I'd like to add page numbering (e.g. 'page 1 of 5') that will appear in the text output to identify the location of the text within the larger document.
I did try placing page numbers in the PDF header, but so far I haven't found an export format that retains the header content. (I tried exporting as txt, rtf, and PDF-A. I also tried saving the PDF using the "with comments and document stamp" setting.)
I'd appreciate any advice on how to retain the page numbers in the text output.
Copy link to clipboard
Copied
Hi @try67 , I unsuccessfully tried flattening and then exporting to plain text before posting here and that method does not preserve the header information. But I played around some more with export types and what DOES seem to preserve the header content is flattening and then exporting to accessible text, so that might be a good solution for me.
Thanks again.
Copy link to clipboard
Copied
Hi @kathy_7334,
Hope you are doing well. Thanks for reaching out!
I have a suggestion:
Instead of using Header & Footer, use the Edit PDF > Add Text tool to manually insert page numbers onto each page using actual text objects. This method embeds the text in the content layer and will be picked up by Acrobat when exporting to .txt
.
Steps:
Open the PDF.
Go to Tools > Edit PDF.
Use Add Text to manually insert "Page X of Y" at the top or bottom of each page.
Save and export as .txt
.
Let me know if this works for you.
Regards,
Souvik.
Copy link to clipboard
Copied
Thanks for your quick reply, @S. S. Since I'll be batch processing many files, is it possible to automate the process of inserting and populating the text objects?
Copy link to clipboard
Copied
You can specify the settings once in your Action, and they will be applied the same for all files you process with it.
Copy link to clipboard
Copied
You can specify the settings once in your Action, and they will be applied the same for all files you process with it.
By @try67
I appreciate this suggestion, @try67 -- and I do see the option to Add Text as an action, but don't know how to set up an action that refers to the global page number variable.
Having worked with javascript actions in forms, my first instinct was to add a field to the top of each page that would take on the page number value, but I've never added a field to a non-form document (I assume that I can do that by going to Tools-Forms and just dropping in a text field), and don't know where the global field names are documented.
I'd very much appreciate detailed advice. TIA!
Copy link to clipboard
Copied
You can use a script for this, but you don't have to. Under Tools - Action Wizard you can create an Action with the Add Header & Footer command (and a Save command) that has the same settings as when you run it on a single file, and then run that Action on multiple files.
Copy link to clipboard
Copied
@try67 The header text doesn't persist when the document is exported -- at least not using the methods I've tried thus far: "Saving as" plain txt, rtf, and PDF-A. That's why I'm searching for either a different export method, or for a diffferent method of inserting the page numbers into the document.
Copy link to clipboard
Copied
Strange... I would expect it to export, too. Maybe because it's on a layer it doesn't get included.
So yes, you can use a field, but you'll need to flatten it after adding it to convert it into "static" page contents. And then it will export for sure.
Copy link to clipboard
Copied
Hi @try67 , I unsuccessfully tried flattening and then exporting to plain text before posting here and that method does not preserve the header information. But I played around some more with export types and what DOES seem to preserve the header content is flattening and then exporting to accessible text, so that might be a good solution for me.
Thanks again.
Copy link to clipboard
Copied
This seems to be working well, but the process is interrupted between each file with a prompt asking "Do you want to save changes to xxx.pdf before closing?" Note that my action already contains a Save step to export to text, and I don't want to save the intermediate version to a PDF document. See screenshots below.
Any suggestions?
Copy link to clipboard
Copied
Copy link to clipboard
Copied
OK! In case it helps someone else: I solved the problem with popups interrupting the save process by following the advice here: LINK. Stopping the popups requires changing the document property from 'dirty' (which indicates it's been changed) to not dirty. You can do that by executing a javascript command (this.dirty = false;) at the end of your action.
Specifically:
this.dirty = false;
Copy link to clipboard
Copied
That should not be necessary after saving the file...
Copy link to clipboard
Copied
Make sure the Prompt User check-box is NOT ticked under any of the commands you added to your Action.
Copy link to clipboard
Copied
I just double-checked and I don't see the Prompt user checkbox ticked anywhere. But I'm wondering if just the action of opening the PDF evokes some kind of revision. Maybe by converting the PDF version?
Copy link to clipboard
Copied
It's not under all commands, but some:
Opening the file can modify it, if it contains a script, for example, or if it's corrupt and is fixed by the application. But saving it should make it safe to close without being prompted with a Save dialog.
Anyway, if using the script works for you, that's fine.
Find more inspiration, events, and resources on the new Adobe Community
Explore Now