The “Robot Exclusion Protocol” (Robots Exclusion Protocol) defines the format of the robots.txt file and meta tag “robots”. It has been standardized and approved June 30, 1994. Various extensions have been proposed after that date. We’ll talk while also indicating which part of the original standard and what is part and extensions that may not be understood by all robots.
How the webmaster sends a message to robots
The robot exclusion protocol defines two complementary techniques of communication between the manager of a website and robots that visit: robots.txt and meta tag “robots”. They allow the webmaster to inform robots. It is important to understand that robots do exactly what they want this information.
If they are “polite” (like Google, Yahoo or Microsoft), they will do their utmost to meet your requests not to visit certain parts of your site. If they are malicious (like spam bots and hackers), they can use the contents of your robots.txt to know where to go do their evil deeds.
At the end of this article, we present a complementary technique that can effectively block spam malware.
The robots.txt file consists mainly of a series of instructions that indicates which pages you do not want visited by the web robots. The file may include a series of instructions for all robots and specific instructions for any one particular robot.
It is indeed a message for spam and not a device that would make the visit of the robot technically impossible.
See also our pages of detailed information about the robots.txt file.META “robots”
The meta tag “robots” is a line of HTML code to put in the source code of a page. It tells search engine robots what they can do or not with the content of the page. Again, there is a mechanism that involves the collaboration of the search engine.
While the content of the robots.txt file can apply to all robots meta tag “robots” is only for search engine robots.
See also our detailed information on the meta tag “robots”.
The meta tag “robots” that can be placed in HTML pages. To obtain an equivalent result with PDF files, images, videos or other files, you can use the X-Robots-Tag in the HTTP header returned by the web server. This is a protocol extension orginal. It was introduced by Google in 2007.
If your site is installed on an Apache server, you can place a file called. Htaccess in the root directory of your site. The file itself has nothing to do with the Robots Exclusion Protocol, but is especially effective because it can be used to block access to the website to certain IP addresses or certain “user agents”. Here it is asking for something more robots, but to impose a ban. This is the only technique that is effective in stopping malicious robots.
Special request removal of content
Google, through its “webmaster tools” allows you to request removal of an urgent or more of your pages in its index. Consider this request as practically irreversible. So use it with great caution.