Skip to main content
Known Participant
February 4, 2016
Answered

Prevent search engines from crawling help output

  • February 4, 2016
  • 1 reply
  • 1936 views

I'm trying to figure out how we can prevent our HTML webhelp from being crawled by search engines like Google. I found these instructions while digging through some of the discussions on this forum: Stop search engine robots indexing Your private folders by ‘robots.txt’. | Internet marketing Blog

However, you would think we could add some code into the project itself in order to stop the search engines from crawling the help. We tried adding this code into our master page since the master page is applied on all topics, but the code didn't remain after the output was generated:

<meta name="robots" content="NOINDEX, NOFOLLOW" />

Does anyone know how we can prevent search engines from crawling our help?

    This topic has been closed for replies.
    Correct answer Willam van Weelden

    Ah oops. I missed the bit about webhelp. The screen layouts are for Multiscreen or Responsive HTML5 output so updating them won't result in a change in webhelp. What you are seeing would be the code you added to the master page before.

    I don't know if you can update the webhelp skin in the same way as the screen layouts, sorry.


    The masterpage header won't work for this. Personally, I would also do a find and replace in the output. That's the fastest way.

    Just remember that search engines not indexing your site based on meta tags is a courtesy, it doesn't block bots completely. Only the nice guys such as Google will listen. Not even a robots.txt will block crawlers. (For example, see: Learn about robots.txt files - Search Console Help) If you really don't want unauthorised access, you have to force authentication on your server.

    1 reply

    Willam van Weelden
    Inspiring
    February 4, 2016

    Adding a meta tag and a robots.txt is only a courtesy. A search engine *may* decide to skip your site. But there is no guarantee.

    If you really don't want your content to be indexed, you have to cut of the access to your content. If you require authentication (for example, by using a .htaccess file Htaccess Authentication - Htaccess Tools) the search engines are no longer able to index your content.

    Participant
    February 5, 2016

    I need to do this as well. There doesn't seem to be a way to do it within RoboHelp. Several sources have suggested Find/Replace to add the <meta> tag to each .htm file. Arduous and error prone. Any other thoughts?

    Community Expert
    February 7, 2016


    A weird "feature" that might work for you.

    Make sure there is a Robohelp header section in your master page.

    Switch to HTML view and paste the meta code into the "?rh_region_start type=header" and "?rh_region_end type=header" tags.

    Save. RH automagically moves the code between the master page "head" tags.

    When you generate, the meta tag will be in each page, but not within the "head" tags - you will find it further down the page, just above the first content in the topic (e.g the topic H1). I'm not sure if the placement affects the webcrawlers, though.