CF9 Solr Hangs on Corrupt PDFs

Question

I am indexing 34000 + documents physically located on the Hard Drive

Windows Server 2008 SP2

CF9

Oracle

Thanks to advice in another thread I started I am indexing the folders one at a time followed by an update after each. Some of the PDFs can be huge (130mb) but the average is closer to 1 mb. On occasion I will get to a PDF that is corrupt (If I copy it to my desktop and attempt to open it, Acrobat Pro says it is corrupt).

I have attempted using cfpdf to read header info in a cftry block with the catch creating a log entry. That should work but it hangs trying to read the doc (assuming that is what is happening with Solr too). I get no log entry and it will continue to hang until timeout for the request.

Can anyone think of a way to break out of a hung file and continue to index the remaining files?

Thanks

haxtbh · Answer

If you are running the version of CF 9 that you have mentioned in your post previously (9.0.0.251028) then you are going to need to patch your server!

I'm thinking the issues you are having are related to a bug (Described not so well here - Bug#3040314 - Bug 80390:I have some corrupt PDFs ). It was fixed in build 9.0.0.263374.

Unless you really have a good reason, I would recommend updating you CF9 instance to at least the latest build of 9.0.2 anyway. If you are having trouble finding the downloads, Gavin Pickin maintains some here - http://www.gpickin.com/cfrepo/

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded