Skip to main content
Inspiring
September 24, 2008
Question

Parsing PDF Files

  • September 24, 2008
  • 1 reply
  • 384 views
Hi everyone. So I have a bunch of pdf files that contain text data which I need to retrieve. Is there any way to parse it? The data is stored with the first line has the column names followed delimited commas. Every line after is one row of data.

ex:
id,fname,lname,age
532,Tom,Stevens,33
42,John,Baldwin,38
...

I've tried using the cfpdf tag, but as far as I know that tag is mainly used for pdf creation, not extraction.
It does have a read function: <cfpdf action="read" source="data.pdf" name="mypdf"> but I don't know how to use the 'mypdf' variable after I read the file. If I dump it out, it just shows details about the pdf file, but not what it contains.

Thanks for any help you can provide.
This topic has been closed for replies.

1 reply

Inspiring
September 24, 2008
Magikaru wrote:
> I've tried using the cfpdf tag, but as far as I know that tag is mainly used
> for pdf creation, not extraction.

It can be, you need to combine it with it's DDX capability. Here is a
blog with some example code in a handy dandy PDF utility CFC.

http://www.coldfusionjedi.com/index.cfm/2007/7/25/Reading-text-from-a-PDF-in-ColdFusion-8