Skip to main content
Inspiring
August 28, 2009
Question

CF 7 Verity and pdf files - problem

  • August 28, 2009
  • 1 reply
  • 5006 views

I have a directory of .pdf files that I have used verity to index for several years. I'm using verity so that full text searches can be placed against all of the pdf documents. This has worked pretty good for some time but recently we have noticed that newer pdf documents are no longer working correctly in regards to the text search. They are indexed in the collection and will show up when I search for a blank text string which returns all documents in the collection but any attempt to search by any string of text in the document returns an empty search.

To test this I created a new collection that contained several pdf files that I know work and several that I know do not work. I used a simple form to supply key words to my cfsearch tag and then dumped the results out to see what happens. Searching by strings of text in the pdf work for some files and not others. I examined the files in question and the only difference I can see is the version. The working files list the creator as Adobe InDesign CS3 (5.0.2) and the now working files show the creator as Adobe InDesign CS3 (5.0.4). Has anyone else noticed this issue and found a solution or work around? I really don't want to migrate to some other search function at this time as this was an unexpected problem.

I found a few threads on the internet suggesting problems with Acrobat 9 files but list those as version 5 / version 7 issue. I used cffile read to look at the begining of both a working and non working file and they both list PDF-1.4 at the beginning of the file. I hope someone else has found a way around this issue.

    This topic has been closed for replies.

    1 reply

    Inspiring
    September 1, 2009

    I'm wondering if there is any other items I should look at with these pdf files or log enteries. I do not see errors in the verity log when index or optimizing for these files and they do show up in the results if a blank search string is passed with the cfsearch tag so they are in the index. No combination of words from the text will be found though. It seems that all of the files created in the last year fall into the non working category. I don't mind recreating the pdf files If I understood what needed to be done differently. Are there updates to Verity for Cold Fusion 7 that I'm not aware about? Would attaching a working and non working pdf file to this post be helpful?

    Inspiring
    September 9, 2009

    I've been able to isolate this problem to the fonts used and embedded in the pdf files. Previous pdf files that we have in the index use Interstate
    and Truesdell as fonts. At some point they switched toFonts (Suck!) Berthold Akzidenz Grotesk and Apollo MT the content is no longer searchable with verity. I tested this by building a small collection with versions of the pdf files in both the old and new fonts. If I send cfsearch a blank search criteria it finds all of the documents but if I send any text string from the documents to cfsearch as the criteria it only finds the pdf files with the Interstate and Truesdell fonts. Any ideas as to why and a way around this issue?

    Inspiring
    September 10, 2009

    Can you create an example PDF using each font scheme, and attach them here?

    --

    Adam