spiders, robots.txt and /includes folder

Question

I am unsure whether or not I should put my /includes folder
into my robots.txt file. When search engine spiders crawl through a
site, how do they deal with files that are in the /includes folder?
Do they only see those files when they are "cfincluded" by a
calling page or do the spiders also see them as independent pages?

I don't want to see pages of mine showing up on a search
engine's rankings that are devoid of sibling content. (For example,
I wouldn't want just content from "column 1" without the pages'
header, footer, column 2 and sidebar also being displayed.) This
could give users a poor (and obviously misleading) impression of my
site and its content.

So, should I put my /includes folder into my robots.txt file
(ex. "Disallow: /includes/") or not? And would this prevent a
spider from following a <cfinclude>? I definitely don't want
that.

But if spiders ONLY crawl files within the /includes folder
when they are called from another file, then I wouldn't have to
worry about page components showing up in rankings under the guise
of complete pages.

Any information on this topic would be greatly appreciated.

PS. On a separate, but slightly related note, can search
engine spiders crawl JavaScript and CSS files?

Newsgroup_User · Answer

Spiders can read any and all content that is in web accessible folders.
The big boys who behave themselves are not going to bother with your
include folders, they are only going to get the pages as the are
presented by you. But if I can guess at your folder structure, and
/includes/ is a very simple guess, I can access your includes folder,
and can have a spider do it as well.

The best way to protect this content is to have it outside of the web
root. ColdFusion does not need the content to be in the web root to
include it in the pages it returns with proper request, but if they are
outside the web root then I or my spider can not get at it so easily.

DO
/includes/
/wwwroot/
/wwwroot/css/
/wwwroot/javascript/

DO NOT
/wwwroot/
/wwwroot/includes/
/wwwroot/css/
/wwwroot/javascript/

Sign up

To post, reply, or follow discussions, please sign in with your Adobe ID.

Sign in to Adobe Community

To post, reply, or follow discussions, please sign in with your Adobe ID.

Scanning file for viruses.

This file cannot be downloaded