Exit
  • Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
  • 한국 커뮤니티
0

API Adobe PDF Extract text JSON missed result

New Here ,
Jan 05, 2023 Jan 05, 2023

Sorry to repost here, I can't delete the other post in the general discussion forum. 

I am using Adobe PDF Extractor API.  The <require('@adobe/pdfservices-node-sdk')> thing. It converts PDF to a readable JSON file.

 

I have a simple PDF file that has basic words on the corners, 4 per corner, 2 pages. Total results should be 8, but I am getting only 5 elements when examining the output JSON file. 

How is this API missing such a simple test case?

If it can't extract information accurately from a basic example, how much confidence can I have for much larger more complex PDF's?

 

Should have: top left, top right, bottom left, bottom right, top left 2, top right 2, bottom left 2, bottom right 2.

jason27819072sr0h_0-1672948457131.pngexpand image

 

 

733
Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Jan 05, 2023 Jan 05, 2023

in the future, to find the best place to post your message, use the list here, https://community.adobe.com/

 

p.s. i don't think the adobe website, and forums in particular, are easy to navigate, so don't spend a lot of time searching that forum list. do your best and we'll move the post if it helps you get responses.

 

<moved from using the community>

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 04, 2024 Nov 04, 2024

I have a similar problem; one very important field in the document header is not exported in the JSON output AT ALL. It's the only missing element on the entire document.

 

Does anybody know what factors cause this API to recognize fields vs. not recognize them?

 

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Adobe Employee ,
Feb 19, 2025 Feb 19, 2025
LATEST

Hi @defaultv1dp7jvc7y89,

 

Hope you are doing well. Sorry for the trouble, and the delayed response.

 

If you are still looking for a solution, here are a few points I would look towards to fix this:

  1. Check text positioning to ensure it's within the printable area of the page.
  2. Verify text is machine-readable (not part of an image or embedded in a graphic).
  3. Inspect PDF layers and content structure to ensure all text elements are properly placed.
  4. Consider increasing the quality of the PDF (DPI or OCR).
  5. Review the raw API output to check if the missing elements are in a different part of the result.

Hope this will give you a better clarity on what to look for.


-Souvik

Translate
Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines