Skip to main content
Inspiring
June 3, 2010
Question

query string read as part of file name, throwing not found errors

  • June 3, 2010
  • 1 reply
  • 926 views

Hi all, I host a number of Web sites under a CF7 installation, Win2003.

One site in particular is throwing not-found errors in response to certain search bot requests.

In the IIS log, I noticed that for these requests, the query part of the URL is part of cs-uri-stem field value, but is not in the cs-uri-query field where it belongs:

cs-uri-stem=                                               /index.cfm?template=24hour5.cfm

cs-uri-query=<blank>

instead of

cs-uri-stem=                                               /index.cfm

cs-uri-query=template=24hour5.cfm

Evidently something somewhere is interpreting the entire URL as a filename, instead of a file name and a query string. When CF tries to locate the file it is throwing a not-found error.

Maybe there is something weird about the question mark, but it looks normal to me.

I can't seem to stop this error, since it is occuring at the OS, IIS, CF or jrun layer. Does anyone have any idea what is going on here, and what I can do about it?

Thanks in advance.

Joe

    This topic has been closed for replies.

    1 reply

    Inspiring
    June 8, 2010

    Since the GET that is being sent to your server is coming from a bot, there is the possibility that it might have messed up part of the URL while constructing the HTTP request being sent to you.  Are all GETs from that bot like this, or just some of them?  Maybe it has stored (in error) a non-printing character as part of your domain name in its database. Try using a code editor that will display all characters, and use it to look at the IIS log file and examine what is actually in the stem field - particularly what is immediately preceeding the ? mark.  You could do the same in a short CF script that reads the IIS log file and then dumps out that line of the log file character by character along with the ASCII value.  As you say, there must be something that is causing IIS to mis-parse the URL, and maybe this will give you a lead.  Is it always the same bot?  If this is creating problems for your app and the bot's access isn't important to you, then you could try blocking it in robots.txt and see if that keeps it away.

    Hope this gives you some debugging ideas.

    joecopleyAuthor
    Inspiring
    June 9, 2010

    Hey Reed, thanks for responding.

    I have a Cf utility that parses logs, so I modifed it to print out the ASCII codes for each character. They look normal, as far as I can tell. The question mark has a code of 63 which is correct, and no non-alphabetic characters precede or follow.

    One interesting thing is that the stem being called is an index.cfm file, and the query string argument happens to be a template name, and it ends in .cfm. That's why it is making it all the way to CF, which chokes on it, instead of IIS logging a 404 error.

    Often an identifiable bot is requesting these bad URLs, though I have spotted another request with agent 'Mozilla/4.0.' I suspect that is some kind of automated scan. (I also see other requests with the same agent name, though a different IP, that look like errononeously URL-encoded requests. These get filtered by URLScan.)

    I don't know for sure is whether the specific clients that make these bad calls always make them them wrong way. They appear to. Most clients that access the site do so normally.

    I wonder if there could be something in the request header, perhaps that instructs IIS to expect a different charset than what is actually used, or something like that.