Skip to main content
Participant
June 19, 2025
Question

Acrobat Services API in translation workflow

  • June 19, 2025
  • 0 replies
  • 110 views

Hi everyone,


I'm working on a pipeline that involves translating the text content of a PDF into another language and then generating a new PDF that maintains the original formatting and layout as much as possible. I’m trying to achieve this using Adobe PDF Services API.

 

Here’s what I want to do:

  1. Extract structured content from a PDF (ideally as JSON, preserving style/structure).
  2. Translate the extracted text (using my translation model).
  3. Rebuild the PDF (either by modifying the original or creating a new PDF with the translated content, if it's impossible is there a chance to build a docx instead?).

 

I’ve successfully used the Extract API to get a JSON from my PDF, and I can parse and translate the text with Python. However, I’m unsure about the best way to go from the translated JSON back to a PDF.

Questions:

  • Is there a recommended way to rebuild a PDF (or DOCX) from the translated JSON using Adobe's tools?

  • Should I convert the translated text into a DOCX and then use the Create PDF API?

  • How can I preserve font size, positioning, and formatting while rebuilding the PDF?

  • When extracting text, is it possible to avoid breaking sentences across lines (i.e., prevent line breaks from splitting phrases unnaturally)? I’ve noticed that sometimes sentences are split mid-way due to line breaks in the PDF—can this be controlled or post-processed?

 

Any guidance, tips, or working examples would be really helpful!