A robots.txt file is a simple text file that is used to tell the search engines (robots) of what should not be visited and indexed. However, you can not tell search engines of what to do. A robots.txt should be placed in your root directory of your web server. Each website should only have one robots.txt file. The main purpose of having a robots.txt is to save your bandwidth from your web server, to give you a very basic of protection, to clean up your logs, and to prevent spams and penalties related to duplicated contents.
If you want the search engines to visit and index your site, then you need to use the following instructions to tell them to do so or some people just leave it blank:
|
User-Agent: * //Here “*” means all robots |
If you do not want the search engines to visit and index your site at all, then you have to use the following instructions to tell them not to do so:
|
User-Agent: * //Here “*” means all robots |
Can you see the difference here, the slash after Disallow: makes the opposite meaning.
If you only want the search engines not to visit and index certain pages, then you have to list those pages for them not to do so. See an example below:
|
User-agent: * |
Please also note that for the entire website, you may want to use a robots.txt to control it. However, for a single web page, you may want to use a robots meta tag to limit it being visited and indexed; and for a single link, you may consider using “no follow” attribute to prevent search engines to index that link.
Everybody can access your robots.txt file by typing http://yoursitename.com/robots.txt; so you should not list your secret directories in your robots.txt file because some ill-behaved people use some robots to target your robots.txt file for harvesting this kinds of information.
Robots.txt Checker is an useful tool that you can use to check whether your robots.txt is validated or not.
![]()
Related articles
Using The Robots Meta Tag
Preventing Comment Spams
Stumble It!





