Skip to main content
Participant
June 10, 2021
Question

Extract and replace test in PDF

  • June 10, 2021
  • 1 reply
  • 1043 views

Hi all, 

 

I have  a pdf generated in appl Pages. The pdf has several data tables / text boxes per page.

Wondering if the following is possible:

 

Detect all text content on page, whether in tables or text boxes, etc. 

Extract text and replace w/ other text, thus generating the same doc w/ entirely new text. 

 

I've looked at some of the OCR and text extraction samples/docs. Extracting the data seems pretty strait forwards. But can I replace it? The use case is translating the doc to another language.

    This topic has been closed for replies.

    1 reply

    Joel Geraci
    Community Expert
    Community Expert
    June 10, 2021

    Adobe doesn't have a Document Services API to do this. That said, it's not really something you'll want to do. Text in a PDF is generally laid down with precise coordinates and isn't able to be replaced without causing overlaps.

    If you want to create new PDF files with different text, I suggest looking at the Document Generation API which will allow you to start with a Word template plus some JSON and output a PDF where the JSON is merged into tagged "fields" in the document.

    Participant
    June 15, 2021
    Hi Joel, thanks for the feedback.

    That makes sense. Looks like using a template is the way to go. We don't
    use Word though, and also manually tagging our complex documents would be
    really time consuming.

    I wonder if something like this would work:
    > extract text using the extract API, thus getting content and coordinates
    > populate a new blank PDF with the data, using the coordinate info to
    insert text in blank document
    ( I would use another PDF js library for this part, if needed)

    I downloaded the sample json output from the Extract API, and see a bunch
    of coordinate data.
    Could I use that to position text in a new document?
    Joel Geraci
    Community Expert
    Community Expert
    June 15, 2021

    Do you control the authoring process of these documents before they get converted to PDF or is this situation where you need to take what you are given?