Number out of range error when gathering content from PDF content stream
Good day,
For quite some time I have been making PDF Content Parser, which could handle faulty PDF files (with integers or real numbers within content stream that overflow, when used with PDFEdit). I was able to parse some PDFs for which Adobe Acrobat's PDPageAcquirePDEContent() spit an error. Turns out PDPageStmGetToken was not handling large values in pageStmToken.iVal, so I used xpdf tokenizer to resolve this issue. This resolved the issue only partially - tokenizer now worked properly, but large values were being passed further as operands to operators. Setting value to SHRT_MAX, when a larger number is encountered seems not the correct way to do it, because it alters relative positioning and scale of objects.
Furthermore I needed alternative PDEFormGetContent() method (because in some cases using it gave Number out of range error as well), so I came up with one using the same logic. Now I am experiencing issues with this method. I do not know how to handle large numbers within form content stream. Should I use alternative coordinate space to transform coordinates of objects and CTM? If so, how should I approach it?
Turns out, not only the large numbers in content stream are the problem. Some times when form content contains other resources, such as Shading elements, they can not be handled. The error occurs when PDEShadingCreateFromCosObj() is called. In such cases Coords array in Shading dictionary contains values that are far beyond SHRT_MAX (e.g. 10332000). Could this be a problem? I tried transforming Coords values in DURING HANDLER blocks, but I doubt this is the correct approach.
Thank you for your time and I would appreciate all the help
