Skip to main content
Inspiring
July 10, 2008
Answered

can I write the content of a pdf-file in a coldfusion Variable?

  • July 10, 2008
  • 3 replies
  • 659 views
Hi,
I have a question evaluating pdf-files.

Is it possible to get a pdf-file into a Coldfusion variable to evaluate the content e.g. with a Regex, to extract some Information as a string to store this in a Database.

I have to evaluate lots of pdf-files, which have specific strings in it.
I have to find these strings and check against a database whether they are already present in the database.

I have seen, that i can evaluate the properties of the pdf-file with <cfpdf action: "read">. But I I do not understand how to get the body of the pdf-file into a variable.

Any hint is highly appreciuated!
Malvina.
This topic has been closed for replies.
Correct answer BKBK
sorry, this does not work.

With this code:

<cfpdf action = "read" source = "dok_1.pdf" name = "mypdf">
<cfdump var="#mypdf#"/>


You shouldn't expect it to work. You can't dump a binary like that.

That was just a part answer to a part question. To peek into the text of a PDF, you may use the utility Daverms recommends.

If you choose to do it yourself, use Coldfusion 8's DDX functionality. Here is an example to illustrate. The folder ddxTest contains the files myDDX.ddx, textFromPDF.cfm and myDoc.pdf, an arbitrary PDF that contains the text.

3 replies

BKBK
Community Expert
Community Expert
July 13, 2008
I'm glad the problem is solved. A nice weekend to you, too.

BKBK
Community Expert
BKBKCommunity ExpertCorrect answer
Community Expert
July 11, 2008
sorry, this does not work.

With this code:

<cfpdf action = "read" source = "dok_1.pdf" name = "mypdf">
<cfdump var="#mypdf#"/>


You shouldn't expect it to work. You can't dump a binary like that.

That was just a part answer to a part question. To peek into the text of a PDF, you may use the utility Daverms recommends.

If you choose to do it yourself, use Coldfusion 8's DDX functionality. Here is an example to illustrate. The folder ddxTest contains the files myDDX.ddx, textFromPDF.cfm and myDoc.pdf, an arbitrary PDF that contains the text.

MalvinaAuthor
Inspiring
July 12, 2008
Great!
Thank you. Did it as you said. Success.
DDX is not soo complicated as it seems from the first look at it.
Nice weekend.

Malvina
BKBK
Community Expert
Community Expert
July 10, 2008
... how to get the body of the pdf-file into a variable
<cfpdf action = "read" source = "mydoc.pdf" name = "mypdf">


MalvinaAuthor
Inspiring
July 10, 2008
thank you for your immediate reply, but,
sorry, this does not work.

With this code:

<cfpdf action = "read" source = "dok_1.pdf" name = "mypdf">
<cfdump var="#mypdf#"/>

I get this result:
Everything, but no text of the document.

PDFDocument
Application name of application
Author bimbam Verlag GmbH
CenterWindowOnScreen [empty string]
ChangingDocument Allowed
Commenting Allowed
ContentExtraction Allowed
CopyContent Allowed
Created D:20080710
DocumentAssembly Allowed
Encryption No Security
FilePath [empty string]
FillingForm Allowed
FitToWindow [empty string]
HideMenubar [empty string]
HideToolbar [empty string]
HideWindowUI [empty string]
Keywords [empty string]
Language [empty string]
Modified [empty string]
PageLayout SinglePage
Printing Allowed
Producer [empty string]
Properties [empty string]
Secure Allowed
ShowDocumentsOption [empty string]
ShowWindowsOption [empty string]
Signing Allowed
Subject [empty string]
Title Rheinische Angler-Zeitschrift
TotalPages 1
Trapped [empty string]
Version 1.3

Maybe i do not understand the cfpdf tag the right way.
What i want is a kind of pdf-to-text conversion.
Do I have to use the processddx action? I do not think so. But there is a property DocumentText .. ?
Inspiring
July 11, 2008
Hi,

Try Ray's "PDFUtils" utility..

A nice little blog on this can be found here

HTH