Extract and replace test in PDF

Forum|Forum|5 years ago
June 10, 2021
1 reply
1049 views

Hi all,

I have a pdf generated in appl Pages. The pdf has several data tables / text boxes per page.

Wondering if the following is possible:

Detect all text content on page, whether in tables or text boxes, etc.

Extract text and replace w/ other text, thus generating the same doc w/ entirely new text.

I've looked at some of the OCR and text extraction samples/docs. Extracting the data seems pretty strait forwards. But can I replace it? The use case is translating the doc to another language.

This topic has been closed for replies.

Joel Geraci

Community Expert

Adobe doesn't have a Document Services API to do this. That said, it's not really something you'll want to do. Text in a PDF is generally laid down with precise coordinates and isn't able to be replaced without causing overlaps.

If you want to create new PDF files with different text, I suggest looking at the Document Generation API which will allow you to start with a Word template plus some JSON and output a PDF where the JSON is merged into tagged "fields" in the document.

A

adrian5FDFAuthor

Participant

Hi Joel, thanks for the feedback.

That makes sense. Looks like using a template is the way to go. We don't
use Word though, and also manually tagging our complex documents would be
really time consuming.

I wonder if something like this would work:
> extract text using the extract API, thus getting content and coordinates
> populate a new blank PDF with the data, using the coordinate info to
insert text in blank document
( I would use another PDF js library for this part, if needed)

I downloaded the sample json output from the Extract API, and see a bunch
of coordinate data.
Could I use that to position text in a new document?

Joel Geraci

Community Expert

Do you control the authoring process of these documents before they get converted to PDF or is this situation where you need to take what you are given?

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded