Skip to main content
Participant
May 17, 2017
Question

Extract text location

  • May 17, 2017
  • 2 replies
  • 1387 views

Dear all

is it possible we can extract all text from a page of PDF with text location or coordinates by writing a script. Thanks

This topic has been closed for replies.

2 replies

Legend
May 19, 2017

To create annotations, markup etc the coordinates are needed. Examine what a "quad" type contains and bear in mind that text, which does not exist at a single point, does not have a single coordinate; rather it occupies space. The baseline is not provided.

Legend
May 17, 2017

Yes. Look up getPageNthWord and getPageNthWordQuads.

May 19, 2017

Please check out this link: http://help.adobe.com/livedocs/acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/wwhelp/wwhimpl/common/html/wwhelp.htm?context=Acrobat…

getPageNthWordQuads will only give quad property of the Annotation object which can be used for constructing text markup, underline, strikeout, highlight and squiggly annotations. How will this return the coordinates of the word?

Bernd Alheit
Community Expert
Community Expert
May 19, 2017

getPageNthWordQuads returns the coordinates of the word.