Copy link to clipboard
Copied
I have a set of pdf pages where getPageNthWordQuads returns the wrong coordinates. The coordinates appear to be offset 15 pts up and to the left. Anybody else seen this or have a suggestion how to detect that this page has an issue?
I checked all the values returned by getPageBox and nothing seemed different from pages that return correct results
Any word on the page is offset the same amount, so it's a translation error, not a scaling error
The problem is that Doc.getPageBox() will not give you the actual media or crop box, it will do some cleanup and then give you something that in this case is different from the actual media/crop box. When you bring up the preflight tool, and then browse the PDF contents, you will see this for the page boxes:
As you can see, both the media and the crop box do not start at (0.0), they have an offset of almost +-/12pt. I assume that's also the offset that you see between the word you want to place t
...Copy link to clipboard
Copied
It's hard to say what's wrong without looking at the actual file.
Copy link to clipboard
Copied
Here's sample bad page http://plummer.us/BadPage.pdf
Copy link to clipboard
Copied
I'm not seeing any problems. When I ran a script (using Acrobat 9.5.5) to add a strikeout markup for every word using the same quads, they were all correctly placed. Can you give an example of a word in that document and the corresponding quad that you believe isn't correct?
Copy link to clipboard
Copied
Perhaps you're assuming that (0,0) is a corner of the visible page, rather than just a relative measure.
Copy link to clipboard
Copied
Thanks for your interest.
If this were the case, wouldn't some value of getPageBox show this? If not, how do I determine that the origin is offset? And why doesn't Adobe's Matrix2D class take this into account?
Copy link to clipboard
Copied
You're right. I appear to have misstated the issue. I'm trying to add a link.The example from Adobe's JavaScript Reference using the Matrix2D class draws the offset box. Since the page properties seem to indicate that the coordinate systems are the same, I tried creating a link using coordinates from the quads directly and got the same results.
But using the quads to create an annotation that's created directly from quads seems to work.
So my question becomes: How do I tell that that the two coordinate systems are different? And why doesn't Matrix2D work
Here's the code based on Adobe's example:
var q = this.getPageNthWordQuads(0, 200);
// Convert quads in default user space to rotated
// User space used by Links.
m = (new Matrix2D).fromRotated(this,0);
mInv = m.invert()
r = mInv.transform(q)
r=r.toString()
r = r.split(",");
l = this.addLink(0, [r[4], r[5], r[2], r[3]]);
l.borderColor = color.red;
l.borderWidth = 1;
l.setAction("this.getURL('http://www.adobe.com/');");
Copy link to clipboard
Copied
Here's a link to a good tutorial that might help: https://acrobatusers.com/tutorials/auto_placement_annotations
Copy link to clipboard
Copied
Thanks for the suggestion. I understand the geometry and what the Matrix2D class does. I can't figure out why it's not working for a handful of pages out of hundreds.
Copy link to clipboard
Copied
The crop box would give you the effective, visible, origin. But I'd expect the APIs to use the same coordinate system. I can't say because I don't know what Matrix2D is.
The problem may be that a quad is not a rect; that's why there are two types. A rect is identified by lower-left x, lower-left y, upper-right x, and upper-right y. But a quad is identified by four corners of a quadrllateral. Crucially
(a) a quadrilateral may not be a rectangle.
(b) a quadrilateral may be a rotated rectangle e.g. at 45 degrees
(c) the corners of a quadrilateral may be for an object rotated eg upside down, so the lower left of the object is not the lowest or the leftist in the page coordinate system.
You have to decide how to convert, if going to an annotation type that doesn't accept quads. One way is to get the enclosing axis-aligned rectangle, by taking min(x1,x2,x3,x4), min(y1,y2,y3,y4), max(x1,x2,x3,x4), max(y1,y2,y3,y4).
Copy link to clipboard
Copied
Thanks, I know the quads are horizontal rectangles from examiing the quads. I considered the possibility that the quads were upside-down, which might cause the vertical offset (since the vertical offset may be the height of the rectangle), but it couldn't cause the horizontal offset.
Copy link to clipboard
Copied
I'm back to my original issue. I look at the values returned by getPageNthWordQuads and from my measurements, they don't correspond to the position of the word on the page. My guess is the origin of certain pages is not in the corner of the page. Adobe's Matrix2D class doesn't seem to take this into account either. Values for getPageBox aren't any different for pages that have this problem and pages that don't
I'm happy to live with this issue if somebody can tell me how to programatically identify these pages
Copy link to clipboard
Copied
The code creates correct links when I create a new document from your document with printing to Adobe PDF.
Copy link to clipboard
Copied
Thanks for responding.
I'm sure the code works for you. The code works for probably 99% of pdf pages. It's that other 1%, e.g., http://plummer.us/BadPage.pdf
If you can tell me why the code doesn't work on my example page, I'd be grateful
Copy link to clipboard
Copied
Certainly you must not assume the origin is the corner of the page. You should consider
1. The Crop Box. If there is one, the corner is from the Crop box, relative to the Media Box.
2. The Media Box. This defines the corner of the original media. For example, if the bottom left is 72,72 then 0,0 is one inch below and to the left of the page
3. The Rotate value, which will rotate the viewed page after all of the above is applied.
Copy link to clipboard
Copied
Thanks for your answer.
Crop and Media have exactly the same values, also the same as pages where I can draw link boxes correctly.
If I show rulers, I can see that addLink is drawing a box at the position I specify based on the quads returned for the word. There's no value returned by getPageBox that tells me why getPageNthWordQuads returns coordinates for a box that's offset from the ruler measurements.
Copy link to clipboard
Copied
The problem is that Doc.getPageBox() will not give you the actual media or crop box, it will do some cleanup and then give you something that in this case is different from the actual media/crop box. When you bring up the preflight tool, and then browse the PDF contents, you will see this for the page boxes:
As you can see, both the media and the crop box do not start at (0.0), they have an offset of almost +-/12pt. I assume that's also the offset that you see between the word you want to place the link on and the link that's actually placed on the page.
I don't see any way you can get the true coordinates from this document (or any other document with the same type of page boxes) in JavaScript. A plug-in can do this - or an application based on the Adobe PDF library.
Copy link to clipboard
Copied
That certainly makes sense. So the problem is that getPageBox is returning results (whether correct or not) that cause their Matrix2D class and the rulers in Acrobat to give incorrect results. When I get a chance, I'll see if using setPageBoxes to clear them fixes the page
Find more inspiration, events, and resources on the new Adobe Community
Explore Now