Acrobat Services API in translation workflow

Report · Jun 19, 2025

Hi everyone,

I'm working on a pipeline that involves translating the text content of a PDF into another language and then generating a new PDF that maintains the original formatting and layout as much as possible. I’m trying to achieve this using Adobe PDF Services API.

Here’s what I want to do:

Extract structured content from a PDF (ideally as JSON, preserving style/structure).
Translate the extracted text (using my translation model).
Rebuild the PDF (either by modifying the original or creating a new PDF with the translated content, if it's impossible is there a chance to build a docx instead?).

I’ve successfully used the Extract API to get a JSON from my PDF, and I can parse and translate the text with Python. However, I’m unsure about the best way to go from the translated JSON back to a PDF.

Questions:

Is there a recommended way to rebuild a PDF (or DOCX) from the translated JSON using Adobe's tools?
Should I convert the translated text into a DOCX and then use the Create PDF API?
How can I preserve font size, positioning, and formatting while rebuilding the PDF?
When extracting text, is it possible to avoid breaking sentences across lines (i.e., prevent line breaks from splitting phrases unnaturally)? I’ve noticed that sometimes sentences are split mid-way due to line breaks in the PDF—can this be controlled or post-processed?

Any guidance, tips, or working examples would be really helpful!