Skip to main content
Inspiring
June 20, 2007
Question

data scraping

  • June 20, 2007
  • 2 replies
  • 392 views
Hi guys/gals,

It's been a while but I have a small coldfusion project where I am authenticating to a website using cfhttp and trying to download files that have a datestamp and then an extension. The way it work in the real world is I log into the site and get redirected to a page that has hyperlinks to files and I have to click each one to download it. There are about 9-10 files and I have to do this every day. So naturally I want to automate it.

I use

<cfhttp method="Get"
url="https://samplesite.com/cehttp/servlet/MailboxServlet?operation=LOGON&user=something&password=something&mailbox_server=something"
resolveurl="Yes">

I get: " Logon is successful."

Which is good

Then I loop through a build of the flies and it works like this

<cfset Possiblities = "#alterTime#.blah,
#alterTime#.bleh,
#alterTime#.tart,
#alterTime#.high,
#alterTime#.slip,
#alterTime#.cord,
#alterTime#.need,
#alterTime#.very">
<cfloop list="#Possiblities#" index="i">
<cfoutput>
<cfhttp method="get" url="https://samplesite.com/cehttp/servlet/MailboxServlet?operation=DOWNLOAD&mailbox_id=something&batch_num=something&data_format=A&batch_id=#i#" path="c:\temp" file="#urlString##i#">

#urlString##i#<br />
</cfoutput>
</cfloop>

So i loop through and attempt to download these file.

As part of the real logon process i get redirected to this:

https://samplesite.com/cehttp/servlet/MailboxServlet?operation=something&mailbox_id=&Submit=something

I would like to just zip through the available files and download them to a directory on my computer...

But when I test what is actually produced as a hyperlink and click on it it gives me:

"You are not logged on. Please logon first."

I am by passing the redirect portion but I think I should incorperate it some how.

Or maybe I don't know what I am doing with cfhttp...

Does anyone have tips or clues to my problem?

Thanks everyone!
    This topic has been closed for replies.

    2 replies

    Inspiring
    June 20, 2007
    Senior <CFHTTP> does not natively support cookies. No cookies = no session = no login. There are a couple of ways around this:

    1) CFX_HTTP5 has received a lot of mention on these forums for being able to support sessions

    2) Roll your own solution by parsing out the session cookie using the resulting CFHTTP scoped variables and send it back in every additional CFHTTP call. You can find examples on this on this forum. Try searching for CFHTTP session.
    Inspiring
    June 20, 2007
    oops forgot my date stuff

    <cfset alterTime = dateformat(DateAdd("d", -1, now()), "mmddyy")>