• Global community
    • Language:
      • Deutsch
      • English
      • Español
      • Français
      • Português
  • 日本語コミュニティ
    Dedicated community for Japanese speakers
  • 한국 커뮤니티
    Dedicated community for Korean speakers
Exit
0

Get text/number co-ordinates in a pdf

New Here ,
Nov 12, 2017 Nov 12, 2017

Copy link to clipboard

Copied

Hi. Im trying to extract the x.y coordinates of any word/number matches in the pdf. Below is the code i am using. This works great for words like "California" but when i use numbers with symbols like 3.0-234 this doesn't work. Please help as this is very urgent for me. (I have tried giving 'false' in the 3rd parameter of getPageNthWord but still doesn't work)

for (var p = 0; p < this.numPages; p++)

  {

  var numWords = this.getPageNumWords(p);

  for (var i=0; i<numWords; i++)

  {

  var ckWord = this.getPageNthWord(p, i, true);

  var num = 'California';

  var n = num.toString();

  if ( ckWord == n)

  {

  app.alert("Mouse position is: " + this.mouseX + "," + this.mouseY, 3);

  }

  }

  }

TOPICS
Acrobat SDK and JavaScript , Windows

Views

1.3K

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 12, 2017 Nov 12, 2017

Copy link to clipboard

Copied

Try printing out all the words in the file to the console and then you'll

see what the issue is.

On Sun, Nov 12, 2017 at 6:52 PM, syedu35318304 <forums_noreply@adobe.com>

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 12, 2017 Nov 12, 2017

Copy link to clipboard

Copied

By the way, your code does not extract the coordinates of the word you're

looking for, but of the mouse cursor...

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 12, 2017 Nov 12, 2017

Copy link to clipboard

Copied

Hi, thanks for the quick response. Do you think it's a good/right approach to get coordinates from the mouse location? If not, what is the preferred way to get so.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 12, 2017 Nov 12, 2017

Copy link to clipboard

Copied

No, it's not. There's no relation between the mouse's location and the location of the word.

You need to use the getPageNthWordQuads method to get an array that defines the location(s) of the word on the page.

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
LEGEND ,
Nov 12, 2017 Nov 12, 2017

Copy link to clipboard

Copied

Searching for "words" or text strings gets tricky since there are many types of characters that may or may not need to be accounted for.  The biggest issue will be what is called white space. There are usually non-printable characters like space, new line, carriage return, horizontal tab, vertical tab, form feed, ";", ":", ".", etc.

You should review the getPageNthWord method and pay close attention to the "bStrip" parameter. I expect you will need to search for single words and then also write code to search for multiple word combinations.

Without the "bStrip" parameter or it set to "false"" I get a sample output of:

0 word: |Word | length: 5

1 word: |30.24-| length: 6

2 word: |0 | length: 2

3 word: |California. | length: 12

4 word: |test | length: 5

5 word: |word

| length: 6

With the "bstrip" parameter se to true I get a sample output of:

0 word: |Word| length: 4

1 word: |30.24| length: 5

2 word: |0| length: 1

3 word: |California| length: 10

4 word: |test| length: 4

5 word: |word| length: 4

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 13, 2017 Nov 13, 2017

Copy link to clipboard

Copied

There are also different types of coordinate systems for a PDF. Here is an article that explains

https://acrobatusers.com/tutorials/auto_placement_annotations

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 15, 2017 Nov 15, 2017

Copy link to clipboard

Copied

Hi all,

     i had successfully extract the coordinates using   getPageNthWordQuads. But i have a problem with extracting special charecters like " - ".

For ex: 13-jul-2011 will extrat like 3 words. Is there any possibuility to extract Quads with special charecters also included in a word.

Thanks in Advance,

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 15, 2017 Nov 15, 2017

Copy link to clipboard

Copied

The quad always includes the associated punctuation. Try this

Find the index of a word that contains punctuation with this code

len = getPageNumWords(pageNum)

for(i=0;i<len;i++)

console.println(i+ ": " + getPageNthWord(pageNum,i,false));

Run it in the cosole window

Then run this code on the word that includes a comma or dash. In this case it's word number 3

qds = getPageNthWordQuads(pageNum,3)

rect = [qds[0][0],qds[0][5],qds[0][2],qds[0][1]]

addAnnot({page:pageNum,rect:rect,type:"Square"})

You'll see that the added rectangle surrounds the punctuation

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
New Here ,
Nov 19, 2017 Nov 19, 2017

Copy link to clipboard

Copied

Thanks Thom. One more doubt. How do i export the output values to a text file (anywhere in the local drive)?

Following is my code for finding a text and generating the xy co-ordinates.

for (var p = 0; p < this.numPages; p++)

  {

  var numWords = this.getPageNumWords(p);

  for (var i=0; i<numWords; i++)

  {

  var ckWord = this.getPageNthWord(p, i, true);

  if ( ckWord == "Adobe")

  {

  var q = this.getPageNthWordQuads(p, i);

  var a = q.toString();

  var b = new Array();

  b = a.split(",");

  var x1= b[0];

  var y1= b[1]

  var x4= b[6];

  var y4= b[7];

  var x=(parseInt(x1)+parseInt(x4)/2);

  var y=(parseInt(y1)+parseInt(y4)/2);

  }

  }

  }

My question is to how do i write the x and y value to a text file .

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines
Community Expert ,
Nov 19, 2017 Nov 19, 2017

Copy link to clipboard

Copied

LATEST

There is no way with JavaScript alone to write a random file to the local file system.  There are a couple of workarounds.

There used to be a way to do this with the "doc.exportDataObject" function, but it has since been restricted.

1. The Easy way. Write the text data to a file attachement with the "doc.createDataObject" function. I do this a lot. Although the data is not written to file system, its very easy for the user to drag and drop it anywhere they want. And there is an advantage to having the data attached to the PDF where it was created.

2. Create a new PDF with the "Report" object. Write the xy text to the "Report", then save it as text. This writes the text data to a specific location on the file system.

There are several variations on this theme. For example, you could create a blank PDF with "app.newDoc" then add one large form field or text annotation and write all the text data to the field/annot. Then flatten and save as text.

3. There are other tricks if you can write a plug-in or an IAC App. I once wrote a VBA add-in to Excell that sucked data out of a PDF, then saved the excel file. 

Thom Parker - Software Developer at PDFScripting
Use the Acrobat JavaScript Reference early and often

Votes

Translate

Translate

Report

Report
Community guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
community guidelines