• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers

Multiple paragraphs merged into one paragraph <p> when converting PDF to docx

New Here ,
Jan 25, 2022 Jan 25, 2022

Copy link to clipboard

Copied

Hello, we are using Adobe PDF service API heavily to convert PDF to Docx and looking to expand our usage. However, we have an issue with the XML formatting from the converted PDF to DOCX, and would like some advice as to your logic for generating the XML paragraphs and runs.

 

In some cases what looks like multiple paragraphs in the PDF are merged into a single paragraph object <p> in the XML. From my investigation, I can not understand the logic in the XML Paragraph that divides up the paragraph into two or more paragraphs to be displayed. For example, in the PDF it looks like two individual paragraphs:

 

This is the first paragraph.

This is the second paragraph.

 

This would normally show up as two individual paragraph objects <p> in the XML document.xml, but it is not, it is all in one paragraph object. It looks something like this:

<w:p>
    <w:pPr>
        ...
        <w:rPr> ... </w:rPr>
    </w:pPr>
    <w:r>
        <w:rPr> ... </w:rPr>
        <w:t>This the first paragraph.</w:t>
    </w:r>
    <w:r>
        <w:rPr> ... </w:rPr>
        <w:t> </w:t>
    </w:r>
    <w:r>
        <w:rPr> ... </w:rPr>
        <w:t>This is the second paragraph.</w:t>
    </w:r>
</w:p>

The only thing that "divides" up the two paragraphs is a run with a single space as text. This logic would normally result in the two paragraphs displayed on the same row:
This is the first paragraph. This is the second paragraph.

 

We post-process the XML data and have done this for years using Microsoft Word generated XML’s. However, we consistently get a variation of the format with PDF Services and want to understand why and how, so we can make adjustments on our side.

TOPICS
PDF Services API

Views

53

Likes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
no replies

Have something to add?

Join the conversation
Resources