Skip to main content
Participating Frequently
October 11, 2010
Question

Looking for a script that get images' urls from a website

  • October 11, 2010
  • 2 replies
  • 1489 views

Hi,

I have a website where users can insert theirs hotel informations.

Most of them do not insert photos, I guess because the procedure is quite long and difficoult for the average user's skills.

I'm looking for a script that take as input the url of the website and give as output the urls of the images of the site, so the user can decide which to upload on his tab.

Someone can help me?

This topic has been closed for replies.

2 replies

BKBK
Community Expert
Community Expert
October 12, 2010

La_Salamandra wrote:

I have a website where users can insert theirs hotel informations.

Most of them do not insert photos, I guess because the procedure is quite long and difficoult for the average user's skills.

I'm looking for a script that take as input the url of the website and give as output the urls of the images of the site, so the user can decide which to upload on his tab.

Someone can help me?

You have to improve your design. Even if you can find the Coldfusion code, your design will still fall short in 2 ways.

First, it is unreliable, because you're depending on some arbitrary site to be available and up to speed. Secondly, it is aesthetically wrong to be collecting pictures, especially large numbers of them, dynamically from someone else's site.  Think of their copyright and bandwidth.

Fortunately, there are simple solutions. First, identify, by eye, the web pages containing the pictures you're interested in. Ask for permission from the owner.

You could indeed use Coldfusion's cfhttp or any other script to download the JPGs, PNGs, and so on. But then, why waste your time re-inventing the wheel? It is infinitely better to use a web crawler !

With most crawlers, you only have to supply the URL of the site, and the file extensions it has to grab (in your case, jpg, png, bmp, and so on).

One click on the button, and you have them reeling in. Automatically. Some crawlers are considerate enough to enable you to adjust the download bandwidth. (We can learn from a million years evolution wisdom. The vampire bat is known to inject a painkiller before sucking!).

Now that you've downloaded the images to your site, the links you display to your users are all yours. You may choose to resize some of the images, display them as you wish, and the issues of reliability and bandwidth are now up to you.

Participating Frequently
October 12, 2010

Hi BKBK,

users should see photos and images from their own websites; there would not be any copyright violation.

Yes, you're right. What am I looking for is just a crawler. A crawler that take in input a url and returns a list of images contained in that website. Users will decide which ones to upload on their hotel tabs.

Have you in mind some coldfusion crawler I could use to achieve that?

ilssac
Inspiring
October 12, 2010

I got this from Google for "ColdFusion web crawler", after I passed up the top couple of sites that seem to just be search spam, returning results only tangently  related to what I wanted.

http://ketanjetty.com/coldfusion/useful-code/web-crawler/

ilssac
Inspiring
October 11, 2010

<cfhttp....> could be used to return the HTML of a URL.  Note you will want to use the option to make all the internal relative links in the HTML absolute.

reFind() could be used to find all the <img...> tags, or other desired content.

I.E.  I think this is a pretty good start at the regex expression" <img[^>]*>.  With a little work from regex people more skilled then me, that could be refined to find the href inside that tag.

Just note, that not all images will be in <img...> tags on all web sites, they may be flash based for example.

Also, all the images you return may not be very attractive.  Spacer gifs, or a single image divided up into several <img...> tags for example.

The basic parsing is pretty easy to do, decideing what to do with the extracted data, not so easy.

Participating Frequently
October 11, 2010

Thanks Ilsaac,

> The basic parsing is pretty easy to do, decideing what to do with the extracted data, not so easy.

I agree. It's not easy to program a professional parser. That's why maybe I have to mark that I'm asking for a professional script, also not completely free.

ilssac
Inspiring
October 11, 2010

Define professional?

Something that is going to be smart enough to handle all the ways that images are include into web pages and all the ways images are used in web pages, is going to be a complicated, messy, and a hard to control product.  I have never heard of anybody who has tried to make something like this.

As I said before, parsing the images out of HTML is pretty easy to do, knowing what that image is and how to use it, not so much and pretty much requires Google level engineering AND|OR human intervention.