Copy link to clipboard
I've searched the forums and this error so far has been discussed within the context of how to deal with a corrupted PDF file.
My question is a bit different. The company I currently work for has a homegrown XML authoring tool for creating what we call "Knowledge base articles", i.e. product help topics. They've been using this tool from 2008. Files can be attached to a topic. The XML files and attached PDFs are stored in an SQL database. Using the TOC panel or the search function, relevant topics are retrieved.
When there's an attached PDF, users can either view the PDF in their browser or download the file and then open it locally.
The knowledge base was moved from a Windows Server 2003, IIS 6 and SQL 2005 database environment to a Windows Server 2008 R2, IIS 7, SQL 2008 R2 database environment.
Now we are experiencing the following issue: in some topics the downloaded PDF cannot be opened and an error
"There was an error opening this document. The file is damaged and could not be repaired." is displayed. The puzzling part is that some PDFs issue this error, while others open with no problem. All source file PDFs open without a hitch before they are uploaded.
Any advice or pointers will be much appreciated.
Copy link to clipboard
as you've probably read in many of the discussions about this issue, this error message is usually correct, and the PDF file is damaged. As to how it gets damaged, that's something that you will need to find out by poking around in your system. Here are some general things that always apply when you are dealing with corrupt files:
The first thing I would do is open such a corrupt file in a text editor like NotePad to see what it actually contains. Any valid PDF file needs to start with "%PDF-" followed by a version number. If you do not see that in the first line, then you are not dealing with a PDF file, and you will have to find out why you are not getting the file that was saved in your knowledge base.
If that string is there, then you know that it's a PDF file, and the issue is somewhere within that file, and that's where it gets a lot more complex. You will need a good understanding of the PDF specification to analyze the file to see what is wrong with it.
If you are lucky, you will actually have a zero length file (which also implies no PDF header). No valid PDF file can be zero bytes long. Your second lucky case would be that you actually find an HTML error message in that file (instead of PDF content), that points you to the reason for the problem (but again, that depends on your system).
Many thanks Karl--excellent explanation and practical advice on how to proceed.
Even though I don't think this affects any of the above guidance, I'd like to add that the file opens in the browser while the downloaded file (saved locally) is frequently corrupted.
So if the file is retrieved from the KB for displaying in the browser, it makes the second part even more puzzling.
I opened both the source file of the uploaded and then downloaded corrupted file, and the corrupted file in Notepad and compared the files using an online text comparison tool.
Both start with the PDF version (e.g. %PDF-1.5%âãÏÓ).
There were quite a lot of differences between the files, but unfortunately didn't detect an HTML error message. I've forwarded these findings and your advice on to the IT team, but they too are lacking knowledge of the PDF specification, so am not confident in their ability to resolve the issue. If we do manage to find what the problem is I'll post the answer here.
Your prompt, professional and extremely helpful answer made my week!
Took a while, but eventually our R&D looked at the differences between the files and found that the KB application was adding an HTML snippet to the end of the file. That added HTML seemed to cause the problem. They fixed the KB application code to prevent changes to uploaded/downloaded files, and afterwards we spot checked ~10 topics and it seems to be good.
Not sure how helpful this information will be to others, but I highly recommend following Karl's advice of tinkering under the lid of the corrupted files.
It was lucky that we had the source, uncorrupted file to compare to. The highlighted differences made it easier to check what might be wrong.
It remains a mystery of why this issue cropped up only after shifting to the Windows Server 2008 R2, IIS 7, SQL 2008 R2 database environment.
We are experiencing the same issue here, and are desperate for some technical advise. Unfortunately telephone support isn't available. Similar to Donna's original post, we have also stored our .pdf's in a Server 2008R2 environment / web server host. We have however opened the documents failing to open with the error listed in notepad.
The finding is that the version detailed is correct and the document looks to be in tact. The strange thing is through diagnosing this issue we have found the documents to be able to open within Adobe 8.3.0 or previous. Further more proving the document not to be corrupted.
Could someone please advise?