Copy link to clipboard
Copied
Hi everyone,
I'm working on a pipeline that involves translating the text content of a PDF into another language and then generating a new PDF that maintains the original formatting and layout as much as possible. I’m trying to achieve this using Adobe PDF Services API.
Here’s what I want to do:
I’ve successfully used the Extract API to get a JSON from my PDF, and I can parse and translate the text with Python. However, I’m unsure about the best way to go from the translated JSON back to a PDF.
Is there a recommended way to rebuild a PDF (or DOCX) from the translated JSON using Adobe's tools?
Should I convert the translated text into a DOCX and then use the Create PDF API?
How can I preserve font size, positioning, and formatting while rebuilding the PDF?
When extracting text, is it possible to avoid breaking sentences across lines (i.e., prevent line breaks from splitting phrases unnaturally)? I’ve noticed that sometimes sentences are split mid-way due to line breaks in the PDF—can this be controlled or post-processed?
Any guidance, tips, or working examples would be really helpful!
Have something to add?
Get ready! An upgraded Adobe Community experience is coming in January.
Learn more