Highlighted

Verifying and validating PDF files

New Here ,
Apr 14, 2020

Copy link to clipboard

Copied

Hi,

I'm working on an application that ingests PDF files, parses, and produces metrics. I ran into issues with these files because they are generated by different applications and I don't have control over how the PDF files are generated. The contents of the file is well defined, but applications produce PDF files via 3rd party libraries, tools, or API's. Some of these applications are old and are possibly using PDF generating libraries that are out dated.

 

I'm looking for a good way to verify and validate PDF files. Could anyone point me to a tool, library, or API that I can use to help me get some insights from each PDF file? I'd like to include a step in my workflow where I can get some metadata out of PDF files and determine whether it is valid or could lead to potential issues in my workflow. It would be even better if there was a way to take a PDF file and pass it through a "cleaning" process which would make the file more up to date.

 

Any help would be greatly appreciated.

 

-M

Views

39

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Verifying and validating PDF files

New Here ,
Apr 14, 2020

Copy link to clipboard

Copied

Hi,

I'm working on an application that ingests PDF files, parses, and produces metrics. I ran into issues with these files because they are generated by different applications and I don't have control over how the PDF files are generated. The contents of the file is well defined, but applications produce PDF files via 3rd party libraries, tools, or API's. Some of these applications are old and are possibly using PDF generating libraries that are out dated.

 

I'm looking for a good way to verify and validate PDF files. Could anyone point me to a tool, library, or API that I can use to help me get some insights from each PDF file? I'd like to include a step in my workflow where I can get some metadata out of PDF files and determine whether it is valid or could lead to potential issues in my workflow. It would be even better if there was a way to take a PDF file and pass it through a "cleaning" process which would make the file more up to date.

 

Any help would be greatly appreciated.

 

-M

Views

40

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Most Valuable Participant ,
Apr 15, 2020

Copy link to clipboard

Copied

There should be no problem with old libraries. There is no such thing as an out-dated PDF, since the original PDF 1.0 files are still completely valid today. There is no such thing as an "up to date" PDF and hence no tools to make them. However, libaries, both old and new, can have bugs. But you say you have problems. What sort of problems? Are they limitations in your reading software, or violations of the rules of PDF?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Resources