• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Extract and replace test in PDF

New Here ,
Jun 10, 2021 Jun 10, 2021

Copy link to clipboard

Copied

Hi all, 

 

I have  a pdf generated in appl Pages. The pdf has several data tables / text boxes per page.

Wondering if the following is possible:

 

Detect all text content on page, whether in tables or text boxes, etc. 

Extract text and replace w/ other text, thus generating the same doc w/ entirely new text. 

 

I've looked at some of the OCR and text extraction samples/docs. Extracting the data seems pretty strait forwards. But can I replace it? The use case is translating the doc to another language.

Views

341

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 10, 2021 Jun 10, 2021

Copy link to clipboard

Copied

Adobe doesn't have a Document Services API to do this. That said, it's not really something you'll want to do. Text in a PDF is generally laid down with precise coordinates and isn't able to be replaced without causing overlaps.

If you want to create new PDF files with different text, I suggest looking at the Document Generation API which will allow you to start with a Word template plus some JSON and output a PDF where the JSON is merged into tagged "fields" in the document.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 15, 2021 Jun 15, 2021

Copy link to clipboard

Copied

Hi Joel, thanks for the feedback.

That makes sense. Looks like using a template is the way to go. We don't
use Word though, and also manually tagging our complex documents would be
really time consuming.

I wonder if something like this would work:
> extract text using the extract API, thus getting content and coordinates
> populate a new blank PDF with the data, using the coordinate info to
insert text in blank document
( I would use another PDF js library for this part, if needed)

I downloaded the sample json output from the Extract API, and see a bunch
of coordinate data.
Could I use that to position text in a new document?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 15, 2021 Jun 15, 2021

Copy link to clipboard

Copied

Do you control the authoring process of these documents before they get converted to PDF or is this situation where you need to take what you are given?

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

Yes, I create them in Apple Pages.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

So you have the source files? Why do you need the Extract service then? I'm missing something.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Jun 16, 2021 Jun 16, 2021

Copy link to clipboard

Copied

LATEST

The pdf's change depending on our clients' needs — we use tables & formulas to show/hide different options, and we are always making customiztion to them on the fly - adding notes, etc.. And they change over time. I want to completely automate translating new features in the file. 

The end goal is to translate the pdf's. The translation part I have covered (google translate api). 

 

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Resources