Highlighted

Prevent search engines from crawling help output

New Here ,
Feb 04, 2016

Copy link to clipboard

Copied

I'm trying to figure out how we can prevent our HTML webhelp from being crawled by search engines like Google. I found these instructions while digging through some of the discussions on this forum: Stop search engine robots indexing Your private folders by ‘robots.txt’. | Internet marketing Blog

However, you would think we could add some code into the project itself in order to stop the search engines from crawling the help. We tried adding this code into our master page since the master page is applied on all topics, but the code didn't remain after the output was generated:

<meta name="robots" content="NOINDEX, NOFOLLOW" />

Does anyone know how we can prevent search engines from crawling our help?

The masterpage header won't work for this. Personally, I would also do a find and replace in the output. That's the fastest way.

Just remember that search engines not indexing your site based on meta tags is a courtesy, it doesn't block bots completely. Only the nice guys such as Google will listen. Not even a robots.txt will block crawlers. (For example, see: Learn about robots.txt files - Search Console Help) If you really don't want unauthorised access, you have to force authentication on your server.

Views

744

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more

Prevent search engines from crawling help output

New Here ,
Feb 04, 2016

Copy link to clipboard

Copied

I'm trying to figure out how we can prevent our HTML webhelp from being crawled by search engines like Google. I found these instructions while digging through some of the discussions on this forum: Stop search engine robots indexing Your private folders by ‘robots.txt’. | Internet marketing Blog

However, you would think we could add some code into the project itself in order to stop the search engines from crawling the help. We tried adding this code into our master page since the master page is applied on all topics, but the code didn't remain after the output was generated:

<meta name="robots" content="NOINDEX, NOFOLLOW" />

Does anyone know how we can prevent search engines from crawling our help?

The masterpage header won't work for this. Personally, I would also do a find and replace in the output. That's the fastest way.

Just remember that search engines not indexing your site based on meta tags is a courtesy, it doesn't block bots completely. Only the nice guys such as Google will listen. Not even a robots.txt will block crawlers. (For example, see: Learn about robots.txt files - Search Console Help) If you really don't want unauthorised access, you have to force authentication on your server.

Views

745

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
LEGEND ,
Feb 04, 2016

Copy link to clipboard

Copied

Adding a meta tag and a robots.txt is only a courtesy. A search engine *may* decide to skip your site. But there is no guarantee.

If you really don't want your content to be indexed, you have to cut of the access to your content. If you require authentication (for example, by using a .htaccess file Htaccess Authentication - Htaccess Tools) the search engines are no longer able to index your content.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
New Here ,
Feb 05, 2016

Copy link to clipboard

Copied

I need to do this as well. There doesn't seem to be a way to do it within RoboHelp. Several sources have suggested Find/Replace to add the <meta> tag to each .htm file. Arduous and error prone. Any other thoughts?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Adobe Community Professional ,
Feb 07, 2016

Copy link to clipboard

Copied


A weird "feature" that might work for you.

Make sure there is a Robohelp header section in your master page.

Switch to HTML view and paste the meta code into the "?rh_region_start type=header" and "?rh_region_end type=header" tags.

Save. RH automagically moves the code between the master page "head" tags.

When you generate, the meta tag will be in each page, but not within the "head" tags - you will find it further down the page, just above the first content in the topic (e.g the topic H1). I'm not sure if the placement affects the webcrawlers, though.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Participant ,
Feb 10, 2016

Copy link to clipboard

Copied

Hi Amebr

Thanks for the info. I tried it, and it looks good...the meta tag moves up into the header section of the Master page. But, when I publish the webhelp, it is not in the <head> section of the .htm files. It is in the <body> section and appears as:

<div style="width: 100%; position: relative;" id="header">

<meta name="robots" content="noindex, nofollow" />

  <p>&#160;</p>

</div>

When I look at the topics in the help, there is extra space at the top of the topic, above the breadcrumbs, so clearly something is there. But, it's not between the <head> and </head> tags in the .htm.

Too bad. That would have been easy.

This is what I did to get the meta tag in the right place:

  1. Publish the help to a designated folder (as usual).
  2. In RH, select Edit -> Find and Replace in Files.
  3. Specify </head> in the Find what field.
  4. Specify  <meta name="robots" content="noindex, nofollow"/> </head> in the Replace with field.
  5. Specify the folder with the published webhelp output in the Look in field.
  6. Select Text file types (*.htm ; *.html ; *.txt) in the Files of type field.
  7. Check the Include Subfolders option.
  8. Click Find Next, and then Replace All.

I chose to do the Find/Replace at the top level, so the folder that contains all of the output (the folder that contains the resource folder, whdata folder, whgdata foler, etc.). This means that the meta tag is in all of the .htm files, not just the ones with the topic content. I don't think there's any harm in that.

Now I need to get the meta tag in the head section of the .htm files of the responsive HTML5  output from FrameMaker. Any thoughts on that?

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Adobe Community Professional ,
Feb 10, 2016

Copy link to clipboard

Copied

Yeah, as I said, not in the head, but I don't know enough about the web side to know how much of a problem that is/isn't.

You can add the code into the screen layout although that can be a little hairy. It would need to go into every .slp file I believe. Willam van Weelden‌might be able to offer more advice.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
New Here ,
Feb 11, 2016

Copy link to clipboard

Copied

Thanks for the suggestions! Nice to have a place to knock ideas around.

I put the meta tag into the head area of the Screen Layout for topics (Topic.slp). In RH HTML view, the tag is in the correct place. When I open Topic.slp in Notepad, it's in the correct place. But, when I generate the webhelp, it is inserted in the body as:

<div style="width: 100%; position: relative;" id="header">

  <p>&#160;</p>

<meta name="robots" content="noindex, nofollow" />

</div>

Perhaps Willam van Weelden‌ will have another idea.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
Adobe Community Professional ,
Feb 11, 2016

Copy link to clipboard

Copied

Ah oops. I missed the bit about webhelp. The screen layouts are for Multiscreen or Responsive HTML5 output so updating them won't result in a change in webhelp. What you are seeing would be the code you added to the master page before.

I don't know if you can update the webhelp skin in the same way as the screen layouts, sorry.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
LEGEND ,
Feb 17, 2016

Copy link to clipboard

Copied

The masterpage header won't work for this. Personally, I would also do a find and replace in the output. That's the fastest way.

Just remember that search engines not indexing your site based on meta tags is a courtesy, it doesn't block bots completely. Only the nice guys such as Google will listen. Not even a robots.txt will block crawlers. (For example, see: Learn about robots.txt files - Search Console Help) If you really don't want unauthorised access, you have to force authentication on your server.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...
mnx2 LATEST
New Here ,
Feb 18, 2016

Copy link to clipboard

Copied

Thanks! This helps confirm that my company needs to work on forcing an authentication, which we're trying to do.

Likes

Translate

Translate

Report

Report
Community Guidelines
Be kind and respectful, give credit to the original source of content, and search for duplicates before posting. Learn more
Reply
Loading...