CF9 Solr Hangs on Corrupt PDFs

Report · Feb 25, 2015

I am indexing 34000 + documents physically located on the Hard Drive

Windows Server 2008 SP2

CF9

Oracle

Thanks to advice in another thread I started I am indexing the folders one at a time followed by an update after each. Some of the PDFs can be huge (130mb) but the average is closer to 1 mb. On occasion I will get to a PDF that is corrupt (If I copy it to my desktop and attempt to open it, Acrobat Pro says it is corrupt).

I have attempted using cfpdf to read header info in a cftry block with the catch creating a log entry. That should work but it hangs trying to read the doc (assuming that is what is happening with Solr too). I get no log entry and it will continue to hang until timeout for the request.

Can anyone think of a way to break out of a hung file and continue to index the remaining files?

Thanks

Report · Feb 25, 2015

If you are running the version of CF 9 that you have mentioned in your post previously (9.0.0.251028) then you are going to need to patch your server!

I'm thinking the issues you are having are related to a bug (Described not so well here - Bug#3040314 - Bug 80390:I have some corrupt PDFs ). It was fixed in build 9.0.0.263374.

Unless you really have a good reason, I would recommend updating you CF9 instance to at least the latest build of 9.0.2 anyway. If you are having trouble finding the downloads, Gavin Pickin maintains some here - http://www.gpickin.com/cfrepo/

Report · Feb 26, 2015

THANK YOUUU.

We are in the process of ordering CF11 ENt. Will see if i can apply the patch in the meantime (Nothing moves quite as slow as the speed of Government).

Will mark this as answered as soon as I know.

Adobe Community

CF9 Solr Hangs on Corrupt PDFs