Skip to main content
Participating Frequently
February 9, 2022
Question

Number out of range error when gathering content from PDF content stream

  • February 9, 2022
  • 2 replies
  • 536 views

Good day,

 

For quite some time I have been making PDF Content Parser, which could handle faulty PDF files (with integers or real numbers within content stream that overflow, when used with PDFEdit). I was able to parse some PDFs for which Adobe Acrobat's PDPageAcquirePDEContent() spit an error. Turns out PDPageStmGetToken was not handling large values in pageStmToken.iVal, so I used xpdf tokenizer to resolve this issue. This resolved the issue only partially - tokenizer now worked properly, but large values were being passed further as operands to operators. Setting value to SHRT_MAX, when a larger number is encountered seems not the correct way to do it, because it alters relative positioning and scale of objects.

 

Furthermore I needed alternative PDEFormGetContent() method (because in some cases using it gave Number out of range error as well), so I came up with one using the same logic. Now I am experiencing issues with this method. I do not know how to handle large numbers within form content stream. Should I use alternative coordinate space to transform coordinates of objects and CTM? If so, how should I approach it?

 

Turns out, not only the large numbers in content stream are the problem. Some times when form content contains other resources, such as Shading elements, they can not be handled. The error occurs when PDEShadingCreateFromCosObj() is called. In such cases Coords array in Shading dictionary contains values that are far beyond SHRT_MAX (e.g. 10332000). Could this be a problem? I tried transforming Coords values in DURING HANDLER blocks, but I doubt this is the correct approach.

 

Thank you for your time and I would appreciate all the help

This topic has been closed for replies.

2 replies

try67
Community Expert
Community Expert
February 15, 2022

[Question moved to the Acrobat SDK forum]

Legend
February 9, 2022

Not a direct answer but: before Acrobat 7, Acrobat did all coordinate arithmetic as fixed point numbers (-37268 to +32767). Later versions do all matrix and coordinate work in floating point. However, the actual device coordinates cannot exceed 200 inches / 5040 mm.

 

In other words a content stream with

100000 100000 m

and no cm will fail but 

0.01 0 0 0.01 0 0 cm

100000 100000 m

is OK. In Acrobat of course, other apps may vary.

MikelKlink
Participating Frequently
February 10, 2022
quote

In other words a content stream with

100000 100000 m

and no cm will fail

 

Well, not by itself, see the attached file CoordinateAt100000SmallBox.pdf. Its page has this content stream

1 0 0 RG
10 w
100000 100000 m
-100000 -100000 l
S

and works just fine in the current Adobe Reader.

(Apparently the important part is that the MediaBox coordinates are small, [-250 -250 250 250] in this case.)