Skip to main content
Inspiring
April 9, 2012
Question

Can you find a string inside of apdf using cfpdf in Coldfusion?

  • April 9, 2012
  • 1 reply
  • 3204 views

Is it possilbe to determine if a String, for example "Not Possible",  exists in a PDF using CFPDF or another function inside of Coldfusion?  If so, any suggestions on how to do this would be appreciated.

Thanks!

This topic has been closed for replies.

1 reply

BKBK
Community Expert
Community Expert
April 10, 2012

Yes, it is possible. In the following example, the 2 files and the PDF ('myDoc.pdf') are in the same directory.

textFromPDF.cfm

<!--- Convert from PDF to text and search text --->

<cfset currentDir = getDirectoryFromPath(expandpath('*.*'))>

<cfset ddxfile = "#currentDir#myDDX.ddx">

<cfset inputStruct=StructNew()>

<cfset inputStruct.Doc1= "#currentDir#myDoc.pdf">

<cfset outputStruct=StructNew()><!--- Coldfusion automatically saves the text as XML file --->

<cfset outputStruct.Out1="#currentDir#my_PDF_doc_as_text.xml">

<cfpdf action="processddx" ddxfile="#ddxfile#" inputfiles="#inputStruct#" outputfiles="#outputStruct#" name="myDDXVar">

<cfif myDDXVar.out1 is "successful"><!--- read the text --->

    <cffile action="read" file="#currentDir#my_PDF_doc_as_text.xml" variable="my_PDF_doc_as_text">

</cfif>

Position of search text "Not Possible": <cfoutput>#findNoCase("Not Possible",my_PDF_doc_as_text)#</cfoutput>

myDDX.ddx

<?xml version="1.0" encoding="UTF-8"?>

<DDX xmlns="http://ns.adobe.com/DDX/1.0/"

   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

   xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">

   <DocumentText result="Out1">

      <PDF source="Doc1"/>

   </DocumentText>

</DDX>

PaizAuthor
Inspiring
April 10, 2012

BKBK,

Thanks for your help.  You've gotten me off to a great start! 

I keep getting a DDX is invalid error, Check for invalid construct or restricted keywords.   Is it possible that your ddx is somehow misformed?

BKBK
Community Expert
Community Expert
April 10, 2012

I suspect you made the same mistake I did in the beginning. Note that there is a space before the word coldfusion in:

"http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd"