Work From Home |
|
ROBOTS.TXT AND SITEMAPSHow to make a
robots.txt file that includes
search engine directions to your sitemap
One of the latest no brainer advancements is that the likes of Ask, Google, Microsoft and Yahoo now support autodiscovery of sitemaps via robots.txt files. This virtually eliminates the need to submit sitemaps to each search engine separately.
To take advantage of this you need to have made a sitemap by following the instructions on our Automated Sitemap Generator page. All search crawlers recognise robots.txt files. A robots.txt is a file placed on your server to tell the various search engine spiders not to crawl or index certain sections or pages of your site. Very handy to keep the honest crawlers away from private pages and sensitive files. You can use it to prevent indexing totally, prevent certain areas of your site from being indexed or to issue individual indexing instructions to specific search engines. It can act as a sort of invitation into your site. There are a number of situations where you may wish to exclude spiders from some or all of your site.
The very fact that search engines are looking for them is reason enough to put one on your site To incude information on how to find your sitemap, your file needs to look like this: Sitemap: http://www.your.domain.com/sitemap.xml User-Agent: *
Allow all spiders to index everything You can use the wildcard, '*', to let all spiders know they are welcome. The second, disallow, line you just leave empty, that is your disallow from nowhere. User-agent: * Allow no spiders to index any part of your site This requires just a tiny change from the command above - be careful! User-agent: * Now you want to keep ie. Google away from those images. Google grabs these images with a sperate bot from the one that indexes pages generally, called Googlebot-Image. Change the wildcard * to the name of the bot. User-Agent: Googlebot-Image Just edit and save in a text editor according to your needs. Change the sitemap line to reflect your website &/or subdomain and the correct path to your sitemap.xml file. The repeat of "Disallow:/xyz/" depends on the number of files or directories you wish to exclude. The file needs to be named robots.txt and placed in the root directory of your server or highest level directory in a subdomain. You can have different, seperate, appropriate robots.txt files in all of these areas. Robots.txt is not a security method. It may stop your specified pages from appearing in search engines, but it will not make them unavailable. There are many hundreds of bots and spiders crawling the Internet now and while most will respect your robot.txt file, some will not and there are even some designed specifically to visit the very pages you are specifying as being out of bounds.
|
|
|
Copyright 2007 domain.e-pond.info. All Rights Reserved |
|
Privacy Policy: Your privacy is important to us. No personal information is recorded in the use of this site.
|