Copy link to clipboard
Copied
I'm trying to read a page with cfhttp and extract code from it using #find. I've done this before and it works fine. The problem is that the page I'm trying to read has lots of embedded AJAX mechanisms — when I use my browser's "inspect" function on particular attributes I can see all the code, but when I simply say "show page code" (or use cfhttp) I just get the top-level static code without the sub-components that load when the page loads (the code of which are visible when I select elements and click "inspect" using the browser's debug tools).
Is there a way to tell coldfusion to read from a specific http location and load the entire file, the way a browser does, rather than just retrieving the top-level file and ignoring the javascript and other mechanisms that fill the page dynamically with code as it loads?
Thanks in advance for anyone's help or advice.
Jordan
Thanks for responding.
According to some discussion I've since found, it's possible to use jQuery ("in JSON format") from within cf, or using python or other tools to scrape AJAX pages. I realize this is on the boundary of what's a coldfusion discussion topic.
Copy link to clipboard
Copied
Not really, that is what the browser is designed to do. It has all the gear in it to parse the JS etc so the page works as intended.
CFHTTP is just doing a request and you will get what ever the request reads, just like cURL etc.
Copy link to clipboard
Copied
Thanks for replying. That's a bummer.
So maybe load a page locally, and then read it? Or is that kind of client-side browser manipulation beyond coldfusion's capabilities?
I'm trying to scrape a big bunch of data from a web resource.
Thanks again,
Jordan
Copy link to clipboard
Copied
It's beyond CF's capacities in that CF simply wasn't designed to do that. JavaScript is a different programming language. CF doesn't know how to execute JavaScript, or do anything dynamic except generate HTML (or occasionally other file formats) from CFML.
Dave Watts, Eidolon LLC
Copy link to clipboard
Copied
Thanks for responding.
According to some discussion I've since found, it's possible to use jQuery ("in JSON format") from within cf, or using python or other tools to scrape AJAX pages. I realize this is on the boundary of what's a coldfusion discussion topic.
Copy link to clipboard
Copied
@jordano79881478 , Thanks for sharing that. I would also recommend the AJAX method that WolfShade suggests.