robots.txt
The robots.txt file is a simple text file placed at the root of your website that tells search engine bots which pages or sections of your site they’re allowed to crawl or should avoid.
This file is part of the Robots Exclusion Protocol and is used to help manage crawl traffic, avoid indexing duplicate or sensitive content, and control how search engines interact with your site.
For example, if you don’t want search engines to crawl your admin or checkout pages, you can block them via robots.txt.
Why robots.txt matters for SEO
While robots.txt doesn’t control indexing directly (that’s handled by meta tags or HTTP headers), it:
- Helps prevent crawling of non-public or low-value pages
- Reduces crawl waste, so search engines focus on important pages
- Protects sensitive areas of your site from being accessed by bots
- Prevents overloading your server with unnecessary requests
- Can be used to allow or block specific bots
Basic syntax of robots.txt
Directive | What it does |
---|---|
User-agent | Specifies which crawler the rule applies to (e.g., * = all) |
Disallow | Blocks access to specified URLs or folders |
Allow | Permits access to a path, overriding a broader disallow rule |
Sitemap | Points search engines to your XML sitemap |
Example robots.txt file
User-agent: *
Disallow: /admin/
Disallow: /checkout/
Allow: /blog/
Sitemap: https://example.com/sitemap.xml
This example blocks all bots from crawling the /admin/
and /checkout/
directories, allows access to /blog/
, and provides the sitemap URL.
Best practices for using robots.txt
- Place the file at the root of your domain (e.g.,
example.com/robots.txt
) - Use Disallow carefully — blocking a page doesn’t mean it won’t be indexed if other pages link to it
- Combine with noindex meta tags for full control over indexing
- Don’t block pages you want to appear in search results
- Use Google Search Console’s robots.txt Tester to check for errors
- Keep the file simple and readable for easy maintenance
In summary, the robots.txt file is a key tool for controlling how search engines crawl your site. Used correctly, it helps optimize crawl efficiency, protect sensitive areas, and guide bots to your most important content.