Skip to main content
LeeTramp
Participant
September 23, 2016
Question

noindex tags in a PDF

  • September 23, 2016
  • 2 replies
  • 5696 views

Greetings,

I'm looking for a way to encourage search engines to not index a PDF by placing a 'noindex' or similar tag in the PDF document.

I work with an educational organization who shares copyrighted PDF documents with members who are teachers and want to share the documents with their students on websites. Our document repository only members can view, so there is no problem with security there, but when teachers post these documents to their own websites (against our acceptable use policies), often search engines find them and students can then search and find these documents (including, often, documents with solutions added).

If we can embed a noindex tag in the actual PDF, this should help decrease the number of indexed documents on the web (we're a small organization without the capital to hire someone to search and follow up on violators of our policy).

Does anyone know if this is possible?

Thanks 🙂

This topic has been closed for replies.

2 replies

LeeTramp
LeeTrampAuthor
Participant
September 24, 2016

Thanks. That sounds like a good option!

Maybe someday they'll add a 'noindex' option in PDFs. It seems like something that should be easy to implement in tags or other meta content that search engines can read.

Legend
September 24, 2016

The interesting question is who is the "they" who would do that. It would need any specific changes to PDF to add more metadata, but people would prefer to see something simple in the UI (or a simple tool). But how do you persuade all of the makers of indexing tools that this is a thing they want to do? Each indexing tool would need to invest in it separately. Adobe don't control PDF any more, it is done by ISO, but they can take years to change anything at all.  Anyone could invent a tag, but would it help - would it in fact give a false sense of security?

In fact it's an HTTP tag; each PDF served has HTTP data, outside it. (HTML has it inside and outside). But most web curators don't have the power to set this. Google invented noindex, they would be the people to persuade.

Karl Heinz  Kremer
Community Expert
Community Expert
September 23, 2016

There is no "no index" tag in PDF - what you need to do is prevent the search engine from indexing the file. The most straight forward method is to use a robots.txt file on your web server and then hope that the search engine's spider program does actually honor the information in that file. In your case, that will not help, because you don't know in advance who is breaking the rules and makes the files available. To prevent content extraction, you can assign a permissions or owner password that prevents content extraction. To do that, open the PDF file is Acrobat, and then bring up the document information dialog (Ctrl-D or Cmd-D or via the menu item in the File menu). Then go to the Security tab and select to add password security. Now make sure that "Enable copying of text, images and other content" is not enabled. This should prevent a well behaved PDF indexer from accessing your content, but if somebody's software is not playing by the rules imposed by the PDF format, there is nothing you can do that would also severely restrict the usefulness of the PDF documents.