Do bots send CFID/CFTOKEN in the request headers? Is that a reliable way to detect if a bot is visiting? User agent testing leads to hundreds of strings to test against, and is an ever-growing list. Is there a more reliable way of detecting bots in 2014 with CF?
Sending CFID/CFToken in request headers? Do you mean as a FORM or URL parameter? If so, yes, some crawlers & bots will attempt to maintain sessions if required to perform additional requests.
Regarding question #2
You could always try the Browscap CFC. This CFC will parse the browser string, perform a lookup and return a struct of browser features (including "Crawler".)
Another method to detect bots would be to use the rules from the Bad Behavior PHP script and write your own ColdFusion filter:
Thanks Jamo. If I look in the headers of each web request coming in, say using FusionReactor, Google bot does not seem to send CFID/CFTOKEN, so I was wondering if other bots did the same thing? But if you are saying some bots maintain sessions, then clearly this is no good. The scanning of user agents seems to be the most common way of identifying a bot, or perhaps scanning the number of IP requests in a given time frame perhaps. Not keen on either of those ideas 2013 User Agent Blacklist | Perishable Press has a really good rewrite rule that can be converted to a REFind() btw - the PHP code less so.
Which OS are you using? How much traffic do you get? I recently installed a third-party IIS Web Application Firewall for a client called Aqtronix WebKnight. It has lots of blocking rules/filters and provides protection before the request makes it to the ColdFusion layer.
Session IDs are normally passed via FORM, URL or COOKIE parameters. Many vulnerability scanning services will attempt to generate their own and randomize the session variables in an attempt to cause the web application to give them an existing session or throw an error. Some bots will retain a session that they initiated to access multiple pages, but they can opt not to send the tokens at any time (or send bad tokens.) If you ever passed CFTokens in the URL, Google and other search engines would be inadvertently following them & indexing them. (I've seen many people share links on Facebook that contain their personal session URL... if you click on it fast enough, you can usurp their session.)
I don't provide application sessions to bots... it's a waste of resources. I block many of the default user agents used by scripts. It's not 100% effective since they can be changed, but it keeps out many of the script kiddies.
Here's a technique I've documented regarding using ColdFusion to block fake Googlebots. This same method can be used to block fake BingBot & YahooSlurp user agents too.
Thanks for the great tips, that IIS filter looks interesting - we are on IIS 8 and will give it a try. Best to nip the problem in the bud before it gets to ColdFusion, that saves coding more and more anti-bot routines.