How to Use a Robots.txt to Control Your Website?

A robots.txt file is a simple text file that is used to tell the search engines (robots) of what should not be visited and indexed. However, you can’t tell search engines of what to do. A robots.txt should be placed in your root directory of your web server. Each website should only have one robots.txt file.

The main purpose of having a robots.txt is to save your bandwidth from your web server, to give you a very basic of protection, to clean up your logs, and to prevent spams and penalties related to duplicated contents.

If you want the search engines to visit and index your site, then you need to use the following instructions to tell them to do so or some people just leave it blank:

User-Agent: * //Here * means all robots
Disallow:

If you don’t want the search engines to visit and index your site at all, then you have to use the following instructions to tell them not to do so:

User-Agent: * //Here * means all robots
Disallow: /

Can you see the difference here, the slash after Disallow: makes the opposite meaning.

If you only want the search engines not to visit and index certain pages, then you have to list those pages for them not to do so. See an example below:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /editor/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /media/
Disallow: /about/
Sitemap: http://www.yourdomain.com/sitemap.xml

Please also note that for the entire website, you may want to use a robots.txt to control it. However, for a single web page, you may want to use a robots meta tag to limit it being visited and indexed; and for a single link, you may consider using “no follow” attribute to prevent search engines to index that link.

Everybody can access your robots.txt file by typing http://yoursitename.com/robots.txt; so you should not list your secret directories in your robots.txt file because some ill-behaved people use some robots to target your robots.txt file for harvesting this kinds of information.

Robots.txt Checker is a useful tool that you can use to check whether your robots.txt is validated or not.

Would you like to use a robots.txt to control your website? Why or why not? Please leave a comment below.

My Signature

Related articles
Using The Robots Meta Tag
Preventing Comment Spams

Advertisements

7 responses to “How to Use a Robots.txt to Control Your Website?

  1. Pingback: How to Promote Your Website/Blog? « My Internet Stuff

  2. You should mention a sitemap in your article.

    Hose Reels

  3. Hi…your website very informative and lot of good stuffs. Keep it up. One question: how to fix ‘ file not found and error 404’..crawl by google ?

    Thank you very much.

    • Thanks for your kind words. As for “how to fix ‘file not found and error 404’..crawl by google?” You can use 301 redirect to direct that traffic to your new location of the file. Or you can direct that traffic to your sitemap page in your website. Hope it helps.

  4. I’m assuming that nothing much has changed about the robots in the last two years? I found this extremely informative, as I’ve been mystified by the asterisk in Google’s webmaster tools. As usual, it’s so simple once explained. Thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s