Skip to main content
Participant
September 19, 2024
Question

Detecting new Line with Acrobat SDK

  • September 19, 2024
  • 2 replies
  • 749 views

Im using the Javascript plugin with the following code 

var j = 3
for (var i = 0; i < 10; i++) {
    var line= this.getPageNthWord(0,j+i,false);
    var final2 = line.slice(-2);
    if (final2 == ""){
        console.println("I AM A NEW LINE");
    }
    //console.println(this.getPageNthWord(0, j+i,false));
    console.println(line);
    //console.println(line.slice(-2));


}

the output of this shows for example 

word1 

word2

word3

 

word4

word5

 

word6

word7

word8

word9

work10

 

as expexted howeever i want to see what is the last word on the line the spaces in the console print are showing correctly but ive tried if line == "" and "\n" etc but nothing is telling me that its the space. Any suggestions? 

 

This topic has been closed for replies.

2 replies

try67
Community Expert
Community Expert
September 19, 2024

Try this:

console.println(line.toSource());

Thom Parker
Community Expert
Community Expert
September 19, 2024

Sometimes the app that created the PDF will leave a \r or \n at the end of a line, but this is not guaranteed.  It's also not guaranteed that the words will appear in the order in which you see them on the page.  The only way to know for sure is to get the bounding boxes of all the words and sort them into lines.  Of course you have to be aware that not all lines of text are across the entire page. Text can appear in blocks, as well as columns. 

 

 

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often
Participant
September 19, 2024

Can you provide some of the methods that are used for the sorting them into lines? 

Thom Parker
Community Expert
Community Expert
September 19, 2024

There are two skills necessary to solve this issue. 

1) And understanding of 2D geometry.

2) JavaScript programming skills. 

 

The idea is quite simple. Create a array where each entry is another array representing a line. The line array contains objects, where each object contains the word and the word rectangle. Then write a loop just like the one you have above, only save the word and it's rectangle to a line array based on the rectangle.     The meat of this method is a function that determines whether or not a word rectangle is on the same line as another rectangle, i.e., do the vertical limits of the rectangle overlap the vertical limits of another rectangle.  A 50% overlap is enough to say they are on the same line. If a word doesn't match any existing line, then it is the first entry on a new line.

The last word on a line is the one if the right-most coordinate.

 

 

 

Thom Parker - Software Developer at PDFScriptingUse the Acrobat JavaScript Reference early and often