Acrobat Javascript extracting footnotes technique

Question

Hi,

For PDFs that have been converted from Word files, I'm investigating how I can extract footnotes. One approach I'd like to validate the possibility of is writing a search that looks for footnotes in the text (understanding that "text" is not a straightfoward concept in a PDF). I've been looking at:

ADOBE PDF LIBRARY SDK
Acrobat DC SDK

for scripting options.

I'm wondering if I could first do a search for a number - e.g. 1 and then either determine the rectangle shape and relative offset to determine if it's a footnote reference; or if text properties are available, the superscript property (if there is one). If it finds a footnote reference, follow on to find the actual footnote content at the bottom of the page and extract that.

Thanks

try67 · Answer

You can search for words and their location.

To do that you would need to use the getPageNthWord and getPageNthWordQuads methods.

However, you can't find any additional information about them, like the font used, color, size, whether or not they are superscript, underline, italic, bold, etc.

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.