Copy link to clipboard
Copied
Hi all,
I have a pdf generated in appl Pages. The pdf has several data tables / text boxes per page.
Wondering if the following is possible:
Detect all text content on page, whether in tables or text boxes, etc.
Extract text and replace w/ other text, thus generating the same doc w/ entirely new text.
I've looked at some of the OCR and text extraction samples/docs. Extracting the data seems pretty strait forwards. But can I replace it? The use case is translating the doc to another language.
Copy link to clipboard
Copied
Adobe doesn't have a Document Services API to do this. That said, it's not really something you'll want to do. Text in a PDF is generally laid down with precise coordinates and isn't able to be replaced without causing overlaps.
If you want to create new PDF files with different text, I suggest looking at the Document Generation API which will allow you to start with a Word template plus some JSON and output a PDF where the JSON is merged into tagged "fields" in the document.
Copy link to clipboard
Copied
Copy link to clipboard
Copied
Do you control the authoring process of these documents before they get converted to PDF or is this situation where you need to take what you are given?
Copy link to clipboard
Copied
Yes, I create them in Apple Pages.
Copy link to clipboard
Copied
So you have the source files? Why do you need the Extract service then? I'm missing something.
Copy link to clipboard
Copied
The pdf's change depending on our clients' needs — we use tables & formulas to show/hide different options, and we are always making customiztion to them on the fly - adding notes, etc.. And they change over time. I want to completely automate translating new features in the file.
The end goal is to translate the pdf's. The translation part I have covered (google translate api).