For me, the most common use of robots.txt
is to prevent search engine robots from scanning a site while it is in development. To do that, create a simple text file called robots.txt
and the contents should be:
User-agent: * Disallow: /
Once your website goes live, then the robots.txt
file should not be used to disallow indexing. Instead, use Meta Robots instead. I will however, use robots.txt
to tell the spiders where to find the sitemap file.
Sitemap: http://www.yourDomain.com/sitemap.xml
Check out http://www.robotstxt.org/robotstxt.html if you want to get fancier with your use, such as preventing (asking really, since you really can't prevent anything/anyone from actually viewing your public site) only a section of your site from being indexed, or asking only certain robots from visiting.
Information about site maps.
Online generator
You can request specific action of the search engine spiders by including a meta robots tag in the head section of each web page. Example:
<meta name="robots" content="noindex, nofollow, noarchive">
Valid content keywords are: