Copy link to clipboard
Copied
How does the Adobe PDF Extract API extract text from PDF (to go from PDF to CSV)?
Does it naturally extract text from PDF and convert to CSV as if we were doing it ourself using Acrobat in Desktop? Or does it always try to use OCR and Sensei AI to extract and structure text?
Basically, I am trying to understand how much reliance is on AI here versus Adobe's natural ability to convert a pdf into csv based on the actual text/characters.
Copy link to clipboard
Copied
We use both AI and algorithms but we only OCR when we get an image-only PDF. Most of the time we operate on native PDF.
Copy link to clipboard
Copied
So Export / Convert PDF does conversion from PDF to XLSX using native PDF, as if I were doing it in Acrobat Desktop - no AI and OCR.
And then Extract PDF uses AI / Algorithms to extract text, image (OCR), and tables.
Is this correct way to understand this?
Copy link to clipboard
Copied
Correct. The AI in Extract does a much better job of "understanding" complex tables. For example, tables with merged cells and rows with verticallyand horizontally centered cells.
Get ready! An upgraded Adobe Community experience is coming in January.
Learn more