How can I retrieve a PDF form from code?

Report · Jun 29, 2016

I'm having trouble retrieving a PDF from my database using code. I've tried the different ways to no avail. Searching on the web also provides very little help as no one seems to have ever experienced what I'm experiencing.

From code I'm using a Binarywrite("blob") to get my PDF. My method is similar to this post here Convert Binary data to PDF file in C# and VB.Net. This code works if the PDF is just a PDF file. However, once it's prepared as a form PDF, it becomes corrupt.

I should clarify that it's not the original PDF that's corrupt, but rather how it's being written by the BinaryWrite. I know this because if I access my PDF using a client to access the DB, the PDF opens; this means that the blob is correct in the DB. It would appear that the BinaryWrite is not creating the PDF Form properly and hence, I get a corrupt file.

What is the best way for me to get this form from the DB without the form portion of it breaking? Am I missing a parameter or something? or should I not use BinaryWrite? or am I in need of a header value?

Report · Jun 30, 2016

Compare the retrieved PDF with the PDF in the database.

Report · Jun 30, 2016

Compare in what sense? the file size appears to be equal. I cannot open it so I cannot compare any further than that. Is there another tool that can do a better comparison?

Report · Jun 30, 2016

I think your best approach is to forget, for now, that this is a PDF, and not try to open the PDF files in a PDF viewer. Instead, consider the job of the database: to preserve the exact bytes in a binary file. This would be its job for a PDF, a JPEG, an MP3, anything. PDF forms contain a few extra things over a regular PDF, but nothing radically different. That could be a red herring.

So, compare those binary bytes. It is trivial to write a program to read binary files, byte by byte and compare for equality. By reporting what you find - looking inside the files - you may find clues to the nature of the damage, which could lead to a solution.

Really, this is a question for forums related to the DBMS you are using. If you cannot preserve binary data, you have a deeper problem than PDF. (If on the other hand you find the files are actually identical you have a different mystery).

Report · Jul 01, 2016

TSN is right, this is not about retrieving PDF files from your DBMS, it's about retrieving binary data. If the file size is identical (down to the last byte), then the next thing I would check is for byte order problems. Depending on how the data was stored and retrieved, it is possible that you have to swap each pair of bytes. You can easily find out if that's the case by doing a hex dump of both data objects.

The first test to see if you have a valid PDF file is to just open the resulting file in a text editor. If you don't see %PDF in the first line, you are already dealing with something that cannot be a PDF file. If you find that %PDF, then the next thing I would check is the end of both PDF files (the Xref table, which is a crucial part of the PDF format is at stored the end of a PDF document).