Build a valid robots.txt file visually, or paste an existing one to validate rules and test if a URL is blocked.
Site Settings
Rules
✅ Your robots.txt
robots.txt
Paste Your robots.txt
Syntax Check
What Is a Robots.txt File?
A robots.txt file is a plain-text file placed at the root of your domain that tells web crawlers which pages they can access and which to skip. It follows the Robots Exclusion Standard and is read by Google, Bing, and all major AI crawlers before they begin indexing your site.
A correctly configured robots.txt file prevents crawl budget waste on low-value pages such as admin directories, search result pages, and duplicate pagination, while ensuring your most important content is crawled efficiently. In 2025 it also controls which AI systems can include your content in generated answers.
AI Crawlers and Robots.txt in 2025
AI search traffic is a growing referral channel. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and OAI-SearchBot all respect robots.txt disallow directives. If your file blocks these crawlers, your content is excluded from AI-generated answers across ChatGPT, Claude, and Perplexity. Given that AI-referred sessions grew 527 percent in early 2025, blocking AI search crawlers can meaningfully reduce referral traffic.
| Crawler | Owner | Block to prevent | Recommended |
|---|---|---|---|
| Googlebot | Organic search indexing | Allow content pages | |
| GPTBot | OpenAI | ChatGPT web citations | Allow content pages |
| ClaudeBot | Anthropic | Claude web citations | Allow content pages |
| PerplexityBot | Perplexity | Perplexity AI citations | Allow content pages |
| CCBot | Common Crawl | AI model training data | Block if desired |
| Bingbot | Microsoft | Bing/Copilot indexing | Allow content pages |
What to Block and What to Allow
Block admin directories such as /wp-admin/, internal search result pages, login and registration paths, staging or test directories, and duplicate parameter-based URLs. Do not block pages you want indexed or crawled by AI systems, even if they carry no-index meta tags, as crawl access and indexing are controlled separately. Never block your XML sitemap, CSS files, or JavaScript files that affect rendering.
Always add your sitemap URL at the bottom of your robots.txt using the Sitemap: directive. This helps all crawlers including AI bots find your sitemap immediately without a manual submission in Search Console.
The most damaging robots.txt mistake: A single line Disallow: / under User-agent: * removes your entire site from all search indexes and all AI citation simultaneously. Always validate your robots.txt using the built-in validator after any site change. Check Google Search Console for crawl errors after every update.
Common Robots.txt Mistakes
Blocking CSS and JavaScript. Google needs these files to render your pages correctly for indexing. Blocking them degrades Core Web Vitals assessment and can cause pages to appear broken in Search Console's rendering tool. Remove any rules that match .css, .js, or common asset directory paths.
Missing the Sitemap directive. Without Sitemap: https://yourdomain.com/sitemap.xml in your robots.txt, crawlers must discover your sitemap through a manual Search Console submission or by following links. Adding it accelerates discovery for all crawlers including AI systems.
Misspelled user agent names. Hreflang values are case-sensitive and so are user-agent names in robots.txt. GPTBot and gptbot are treated differently by some parsers. Always use the exact user-agent string from the crawler's official documentation.
Frequently Asked Questions About Robots.txt
Guide AI crawlers to your best content. Pair with robots.txt for complete crawler access control.
Open Tool →Declare your sitemap inside robots.txt for instant discovery by all crawlers including AI bots.
Open Tool →Build multilingual SEO tags for international sites. Add correct language signals alongside crawler rules.
Open Tool →