Context: I am rewriting content acquiring mechanism in Adobe Acrobat Core API.
I am having trouble recalculating line width with CTM. How it should be done?
In PDF Refernce:
The effect produced in device space depends on the current transformation matrix (CTM) in effect at the time the path is stroked. If the CTM specifies scaling by different factors in the horizontal and vertical dimensions, the thickness of stroked lines in device space shall vary according to their orientation.
As I understand, when I encounter w operator, I should only set operands fixed value to graphic state attribute line width and just before setting current path's graphic state (PDEElementSetGState) I should recalculate line width?
I tried this approach first:
ASFixedRect r; r.bottom = fixedZero; r.left = fixedZero; r.right = graphic_state.lineWidth; r.top = graphic_state.lineWidth; ASFixedMatrixTransformRect(&r, &CTM, &r); double x1 = ASFixedToFloat(r.right - r.left); double y1 = ASFixedToFloat(r.top - r.bottom); double width = sqrt((x1 * x1 + y1 * y1) / 2); graphic_state.lineWidth = ASFloatToFixed(width);
However, I am not getting the correct (but pretty close) results when using this approach. Maybe anyone has any idea how can i approach this?
Thank you in advance,
In general there is no line width in device space. There is only a line width in the (common and normal) case where the CTM has equal X and Y scaling. In general a line width is used as follows:
- a stroked line is contructed in user space, of the required width
- the shape of the line is transformed to device space
- adjustments are made for subjective improvement (undocumented and outside the scope of the standard); this may include pixel snapping, thin line removal or widening, antialiasing and other effects
- the transformed shape is rendered to the device
To illustrate what it means that there is no line width in device space as @Test Screen Name said - this is a curve (well, a pair of curves) drawn without changing the width, merely distorted by transformation:
The content stream:
5 0 0 1 25 25 cm 50 25 m 50 38.81 38.81 50 25 50 c 11.19 50 0 38.81 0 25 c S
I think there was a misunderstanding. I am getting w operand (line width) in user space units. I must process all operators in order to collect all content correctly from page data stream. I am having problem in recalculating line with just before stroking a path and adding it into content.
You don't calculate line width. You calculate a stroked path as defined, in User Space. There is nothing to recalculate.
if you feel it is unavoidable, you need to say how you plan to use this value - which as we have said may not be single value in device space.
What do you mean I don't recalculate ?
Again, I am recreating PDPageAcquirePDEContent, PDEFormGetContent functions, which don't spit an error on large values.
I am using PDEGraphicState object inside my class and setting it's attribute lineWidth when w is encountered.
For example if I use the same code (without recalculating) on two different PDF's in first one I get correctly displayed content, in second one I get line that goes on all page.
q 0.03 0 0 0.03 0 0 cm 20.7667 w 3.25 M 1 j .... Q
Second one (note that line width is very large, and should be appropriately recalculated with CTM (how?)):
64.80859 97.64236 712.3828 400.7153 re W n 15875 w 1 j /Cs1 CS 0.7019608 0.8313726 0.9254902 SC q 0.00005843034 0 0 -0.00005843034 535.8676 167.7913 cm 0 0 m 1280989 0 l 1280989 504056 l 0 504056 l h S Q
Do not recalculate coordinates, widths etc. This is wrong - though it will work most of the time, it will fail for line widths and some other things, for the reason already given.
Since I now see you are not rendering, but trying to convert a page stream to PDFEdit calls -- Do NOT apply the CTM to any coordinates or numbers in the parsed stream. Instead, set the CTM with PDEElementSetMatrix. You must still calculate the effective CTM by concatenating matrix, and honouring Q/q. I am not sure why you would convert a page stream to PDFEdit calls - there are already good APIs to do that.
I am not sure why you would convert a page stream to PDFEdit calls - there are already good APIs to do that.
As I said, when using PDPageAcquirePDEContent, when large path values are being parsed it fails with Number out of range error, specifically PDPageStmGetToken fails (there are lots of examples of such PDF files). Naturally, a need arises for a better tokenizer and better page acquiring mechanism, which can handle large values. I have been following Document management — Portable document format — Part 1: PDF 1.7 while constructing this and in many cases my constructed program can easily get page content while PDPageAcquirePDEContent simply won't.
Do NOT apply the CTM to any coordinates or numbers in the parsed stream.
I must apply CTM with other coordinates. I am constructing all the objects from page data streams. How else can I get the correct relative position of text, path objects, etc.?
You should check out Xpdf source code in order to get a better idea of what I am constructing.
I also wouldn't go so far to ask you never to transform certain coordinates from current user space to e.g. default user space. As you mentioned, such coordinate transformations actually are necessary to check where stuff eventually ends up on a page.
What doesn't make sense, though, is trying to transform all graphics state properties as some properties simply cannot be so transformed. Consider the curve example from my post above, you can clearly see that it has no uniform line width in the default user space anymore. Similarly a line dash pattern cannot be universally transformed. And if you do not only want to transform a single content stream but also form Xobjects drawn from there, you might get into quite a hell.
What you can do, at least inside a single content stream, is pull scalar factors from the transformation matrices and apply them to all applicable current properties and following arguments.
Taking your example
15875 w q 0.00005843034 0 0 -0.00005843034 535.8676 167.7913 cm 0 0 m 1280989 0 l 1280989 504056 l 0 504056 l h S Q
you can normalize it a bit by pulling a .0001 factor out of it and get
15875 w q 0.5843034 0 0 -0.5843034 535.8676 167.7913 cm 1.5875 w 0 0 m 128.0989 0 l 128.0989 50.4056 l 0 50.4056 l h S Q
Yes, I was also wondering about dash pattern as well. Thank you for the answer!