Skip to main content
BurtleEd
Known Participant
January 19, 2010
Answered

Using CFHTTP to access a site and click on a download button...

  • January 19, 2010
  • 1 reply
  • 1260 views

Does anyone know how to create a script that will go to a web page and download a file? The issue is that the ending url for the site changes, for example http://www.thesite.com/0101.html and the next time it may be http://www.thesite.com/0102.html the site will contain a download button that when press will download a zip file that I need saved on my local PC.

    This topic has been closed for replies.
    Correct answer Owainnorth

    Can you please assist with the coding a little more if possible, I am a little new to CF.


    You, sir, are lucky I'm bored and the boss isn't looking

    <!--- Create a connnection to your Exchange server --->

    <cfexchangeconnection connection="con" protocol="HTTPS" server="#serverIp#" action="open" username="#username#" password="#password#" />
      <!--- Get a query object of any emails that match the subject line --->
      <cfexchangemail action="get" connection="con" name="qEmails">
        <cfexchangefilter name="subject" value="Whatever your Subject Line is">
    </cfexchangemail>
    <cfexchangeconnection connection="con" action="close" />

    <!--- If we found emails matching the subject line, --->
    <cfif qEmails.recordcount >
        <!--- Loop through each email found, as there may be more than one --->
        <cfloop query="qEmails">
            <!--- Get the content of the message, assuming there's a plain text part --->
            <cfset thisContent = qEmails.message />
           
            <!--- Find the index of the string starting "http://" --->
            <cfset httpIndexChar = find("http://",thisContent) />
            <!--- Find the index of the word ".php" and add four to *include* the ".php" --->
            <cfset firstPhpIndexChar = find(".php",thisContent) + 4 />
            <!--- Do the same again from that point, as the URL has two ".php" parts --->
            <cfset secondPhpIndexChar = find(".php",thisContent,firstPhpIndexChar) + 4 />
           
            <!--- Calculate the length of the URL, so we can strip it out --->
            <cfset urlLength = secondPhpIndexChar - httpIndexChar >
           
            <!--- Strip down the message content to only the characters between those two values,
                    this should now be the valid URL in the email --->
            <cfset theURL = mid(thisContent,httpIndexChar,urlLength) />

            <!--- Create an HTTP connection to the URL and get back a struct of its content --->         
            <cfhttp url="#theUrl#" result="httpContent" />
           
            <!--- You now have a struct called HTTPCONTENT.filecontent which contains the HTML from the page.
                    You need to strip this down using methods similar to above, except the service
                    in question is currently down, so I can't test it --->
           
        </cfloop>
    </cfif>

    As I've put in there the service is down at the moment so I can't actually do the second part, but it's more or less the same as the first part - try to find some simple and consistent rules as to where on the page the link will be, strip it, do another CFHTTP to get that file, then CFFILE to write it to disk.

    Hope that helps, I'm off for a monster coffee.

    O.

    1 reply

    ilssac
    Inspiring
    January 19, 2010

    What is the business rule(s) that define(s) how the script will know when the url changes and|or what the url will be for each request?

    Owainnorth
    Inspiring
    January 19, 2010

    As Ian points out there, unless you know what the page is going to be called then I don't really see how you can do this reliably unless the page increments literally by one so you can loop through potential URLs until you get an HTTP 200, but that's pretty weak.

    One thing I'm also weary of with questions like this - if the URL is constantly changing without your knowledge, then are you really meant to be downloading the file?

    O.

    BurtleEd
    BurtleEdAuthor
    Known Participant
    January 20, 2010

    Yes we are able to download the file we are just trying to automate the process. This is what happens.

    1. We go to a weather site and select a region of interest. and type in our email address requesting the area of interest. the email below shows what is sent to us.

    Thank you for your data request submission to the NSSL OnDemand Data Request website submitted on ::  12/09/2009 08:02:05 CST

    You data request is being processed and you will be able to retrieve your data at:

    http://ondemand.nssl.noaa.gov/index.php?category=tmp/36XW8XM1X61W/index.php

    2. You then have to click on the link and which takes you to a download page. And you download your file.

    What we are trying to do.

    1. We would like this email sent to an exchange email account and have cold fusion check the email account, go to the web address in the email and download the file to a folder.
    2. We then are going to take the data and overlay it on google earth.

    The issue I think is that the "36XW8XM1X61W" changes depending on the reqested data.

    I hope all of this makes sense