Get text/number co-ordinates in a pdf

New Here ,
Nov 12, 2017

Copy link to clipboard

Copied

Hi. Im trying to extract the x.y coordinates of any word/number matches in the pdf. Below is the code i am using. This works great for words like "California" but when i use numbers with symbols like 3.0-234 this doesn't work. Please help as this is very urgent for me. (I have tried giving 'false' in the 3rd parameter of getPageNthWord but still doesn't work)

for (var p = 0; p < this.numPages; p++)

  {

  var numWords = this.getPageNumWords(p);

  for (var i=0; i<numWords; i++)

  {

  var ckWord = this.getPageNthWord(p, i, true);

  var num = 'California';

  var n = num.toString();

  if ( ckWord == n)

  {

  app.alert("Mouse position is: " + this.mouseX + "," + this.mouseY, 3);

  }

  }

  }

TOPICS
Acrobat SDK and JavaScript, Macintosh, Windows

Views

830

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Get text/number co-ordinates in a pdf

New Here ,
Nov 12, 2017

Copy link to clipboard

Copied

Hi. Im trying to extract the x.y coordinates of any word/number matches in the pdf. Below is the code i am using. This works great for words like "California" but when i use numbers with symbols like 3.0-234 this doesn't work. Please help as this is very urgent for me. (I have tried giving 'false' in the 3rd parameter of getPageNthWord but still doesn't work)

for (var p = 0; p < this.numPages; p++)

  {

  var numWords = this.getPageNumWords(p);

  for (var i=0; i<numWords; i++)

  {

  var ckWord = this.getPageNthWord(p, i, true);

  var num = 'California';

  var n = num.toString();

  if ( ckWord == n)

  {

  app.alert("Mouse position is: " + this.mouseX + "," + this.mouseY, 3);

  }

  }

  }

TOPICS
Acrobat SDK and JavaScript, Macintosh, Windows

Views

831

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Nov 12, 2017 0
Most Valuable Participant ,
Nov 12, 2017

Copy link to clipboard

Copied

Try printing out all the words in the file to the console and then you'll

see what the issue is.

On Sun, Nov 12, 2017 at 6:52 PM, syedu35318304 <forums_noreply@adobe.com>

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 12, 2017 0
Most Valuable Participant ,
Nov 12, 2017

Copy link to clipboard

Copied

By the way, your code does not extract the coordinates of the word you're

looking for, but of the mouse cursor...

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 12, 2017 0
New Here ,
Nov 12, 2017

Copy link to clipboard

Copied

Hi, thanks for the quick response. Do you think it's a good/right approach to get coordinates from the mouse location? If not, what is the preferred way to get so.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 12, 2017 0
Most Valuable Participant ,
Nov 12, 2017

Copy link to clipboard

Copied

No, it's not. There's no relation between the mouse's location and the location of the word.

You need to use the getPageNthWordQuads method to get an array that defines the location(s) of the word on the page.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 12, 2017 0
Adobe Community Professional ,
Nov 12, 2017

Copy link to clipboard

Copied

Searching for "words" or text strings gets tricky since there are many types of characters that may or may not need to be accounted for.  The biggest issue will be what is called white space. There are usually non-printable characters like space, new line, carriage return, horizontal tab, vertical tab, form feed, ";", ":", ".", etc.

You should review the getPageNthWord method and pay close attention to the "bStrip" parameter. I expect you will need to search for single words and then also write code to search for multiple word combinations.

Without the "bStrip" parameter or it set to "false"" I get a sample output of:

0 word: |Word | length: 5

1 word: |30.24-| length: 6

2 word: |0 | length: 2

3 word: |California. | length: 12

4 word: |test | length: 5

5 word: |word

| length: 6

With the "bstrip" parameter se to true I get a sample output of:

0 word: |Word| length: 4

1 word: |30.24| length: 5

2 word: |0| length: 1

3 word: |California| length: 10

4 word: |test| length: 4

5 word: |word| length: 4

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 12, 2017 0
Adobe Community Professional ,
Nov 13, 2017

Copy link to clipboard

Copied

There are also different types of coordinate systems for a PDF. Here is an article that explains

https://acrobatusers.com/tutorials/auto_placement_annotations

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 13, 2017 0
New Here ,
Nov 15, 2017

Copy link to clipboard

Copied

Hi all,

     i had successfully extract the coordinates using   getPageNthWordQuads. But i have a problem with extracting special charecters like " - ".

For ex: 13-jul-2011 will extrat like 3 words. Is there any possibuility to extract Quads with special charecters also included in a word.

Thanks in Advance,

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 15, 2017 0
Adobe Community Professional ,
Nov 15, 2017

Copy link to clipboard

Copied

The quad always includes the associated punctuation. Try this

Find the index of a word that contains punctuation with this code

len = getPageNumWords(pageNum)

for(i=0;i<len;i++)

console.println(i+ ": " + getPageNthWord(pageNum,i,false));

Run it in the cosole window

Then run this code on the word that includes a comma or dash. In this case it's word number 3

qds = getPageNthWordQuads(pageNum,3)

rect = [qds[0][0],qds[0][5],qds[0][2],qds[0][1]]

addAnnot({page:pageNum,rect:rect,type:"Square"})

You'll see that the added rectangle surrounds the punctuation

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Nov 15, 2017 0