If you want LLMs to surface your content, allow reputable AI crawlers in robots.txt and guide them with llms.txt; if you need to limit training or access, set bot‑specific rules in robots.txt and only expose high‑signal content via llms.txt.
What each file does (and doesn’t)
- robots.txt: Access control. Allow/Disallow by user‑agent. Discover sitemaps. Enforceable convention most AI crawlers honor.
- llms.txt: Discovery hints. Point LLMs to FAQs, hubs, and stable, machine‑friendly pages. Treat as guidance, not a guarantee.
Decision guide: Allow, throttle, or block
- Maximize AI visibility: Allow major AI bots (e.g., GPTBot) in robots.txt; link sitemaps; publish llms.txt that lists answer hubs and txt/md assets.
- Controlled exposure: Allow crawl for specific paths (e.g., /docs, /faq), disallow private or thin areas; llms.txt lists only evergreen resources.
- Restrictive posture: Disallow AI bots globally, then carve exceptions (if desired) for public docs; llms.txt can still exist but should reference only permissible content.
Practical examples
robots.txt (selective allow for GPTBot and general allow for others)
User-agent: *Disallow:Sitemap: https://example.com/sitemap.xmlUser-agent: GPTBotAllow: /docs/Disallow: /private/llms.txt (high-signal hints)
# ExampleCo LLMs Hints# Canonical hubs and machine-friendly resourcesSitemap: https://example.com/sitemap.xml[Hubs]- FAQs: https://example.com/faq- Pricing explainer: https://example.com/pricing- Implementation guide: https://example.com/docs/implementMonitoring and compliance
- Verify bot behavior in your logs and dashboards; track user-agent patterns to confirm adherence to robots rules.
- Keep llms.txt and sitemaps current; avoid redirects and JS‑dependent pages in your hint lists.
- Review periodically—policies and bot capabilities evolve.
FAQs
- Do all AI crawlers respect robots.txt? Most reputable ones claim to; always validate in logs and adjust as needed.
- Is llms.txt required? No. It’s a helpful hint layer that complements (not replaces) robots.txt and sitemaps.
- Will blocking a bot reduce my AI visibility? Likely, yes. If a bot can’t crawl, it may rely on older training data or not cite you at all.
Other standards
Other standards are attempting to play a role as the ecosystem evolves. For example, ai.txt. Optional alias some publishers use; not a standard. Use only as a convenience, not a substitute for robots.txt.
https://spawning.substack.com/p/aitxt-a-new-way-for-websites-to-set