Work From Home                

Index
 

 Adventure Holiday Chat rooms
 Domains
 
Fishing
 
Free ringtones Free web design  Medical science Medical supplies Movies DVD   Money /Finance    Online games  Online shopping Pathology Pharmacy PayPal Safaris      Security Sedo      Sex info Sexy singles  Travel    

 
 

Banner

 

ROBOTS.TXT AND SITEMAPS

 

How to make a robots.txt file that includes search engine directions to your sitemap

One of the latest no brainer advancements is that the likes of Ask, Google, Microsoft and Yahoo now support autodiscovery of sitemaps via robots.txt files. This virtually eliminates the need to submit sitemaps to each search engine separately.

 

To take advantage of this you need to have made a sitemap by following the instructions on our Automated Sitemap Generator page.

 

All search crawlers recognise robots.txt files. A robots.txt is a file placed on your server to tell the various search engine spiders not to crawl or index certain sections or pages of your site. Very handy to keep the honest crawlers away from private pages and sensitive files. You can use it to prevent indexing totally, prevent certain areas of your site from being indexed or to issue individual indexing instructions to specific search engines. It can act as a sort of invitation into your site.

There are a number of situations where you may wish to exclude spiders from some or all of your site.

  1. You are still building the site, or certain pages, and do not want the unfinished work to appear in search engines
  2. You have information that, while not sensitive enough to bother password protecting, is of no interest to anyone but those it is intended for and you would prefer it did not appear in search engines.
  3. Most people will have some directories they would prefer were not crawled - for example do you really need to have your cgi-bin indexed? Or a directory that simply contains thank you or error pages.
  4. If you are using doorway pages (similar pages, each optimized for an individual search engine) you may wish to ensure that individual robots do not have access to all of them. This is important in order to avoid being penalized for spamming a search engine with a series of overly similar pages.
  5. You would like to exclude some bots or spiders altogether, for example those from search engines you do not want to appear in or those whose chief purpose is collecting email addresses.

The very fact that search engines are looking for them is reason enough to put one on your site

To incude information on how to find your sitemap, your file needs to look like this:

Sitemap: http://www.your.domain.com/sitemap.xml

User-Agent: *
Disallow: /cgi-bin/
Disallow: /image/
Disallow: /privatedirectory/
Disallow: /private.html

 

Allow all spiders to index everything

You can use the wildcard, '*', to let all spiders know they are welcome. The second, disallow, line you just leave empty, that is your disallow from nowhere.

User-agent: *
Disallow: 

Allow no spiders to index any part of your site

This requires just a tiny change from the command above - be careful!

User-agent: *
Disallow: /

Now you want to keep ie. Google away from those images. Google grabs these images with a sperate bot from the one that indexes pages generally, called Googlebot-Image. Change the wildcard * to the name of the bot.

User-Agent: Googlebot-Image
Disallow: /images/

Just edit and save in a text editor according to your needs. Change the sitemap line to reflect your website &/or subdomain and the correct path to your sitemap.xml file. The repeat of "Disallow:/xyz/" depends on the number of files or directories you wish to exclude. The file needs to be named robots.txt and placed in the root directory of your server or highest level directory in a subdomain. You can have different, seperate, appropriate robots.txt files in all of these areas.

Robots.txt is not a security method. It may stop your specified pages from appearing in search engines, but it will not make them unavailable. There are many hundreds of bots and spiders crawling the Internet now and while most will respect your robot.txt file, some will not and there are even some designed specifically to visit the very pages you are specifying as being out of bounds.                                                                                                             

Google
 
Sign up for PayPal and start accepting credit card payments instantly.
 
 
                More from domain.e-pond.info
  
How to find profitable Domain names  
Free ebooks
Affiliate marketing
Money, Wealth and Positive Thinking
Top Search Engine Rankings
Sitemap Generator
Robots.txt file and Sitemaps
Backup websites using cPanel and Cron jobs
RSS Feeds: How to create one for your site 
Creating External JavaScript files  
WAHM (Work At Home Moms) - SBI! Businesses
WAHM-IT!, Work At Home Mums - The Masters Course
Work From Home 

Copyright 2007 domain.e-pond.info. All Rights Reserved

Privacy Policy: Your privacy is important to us. No personal information is recorded in the use of this site.

 

Sitemap