AI answer

AI Crawling Permissions: robots.txt vs llms.txt

If you want LLMs to surface your content, allow reputable AI crawlers in robots.txt and guide them with llms.txt; if you need to limit training or access, set bot‑specific rules in robots.txt and only expose high‑signal content via llms.txt.

What each file does (and doesn’t)

  • robots.txt: Access control. Allow/Disallow by user‑agent. Discover sitemaps. Enforceable convention most AI crawlers honor.
  • llms.txt: Discovery hints. Point LLMs to FAQs, hubs, and stable, machine‑friendly pages. Treat as guidance, not a guarantee.

Decision guide: Allow, throttle, or block

  • Maximize AI visibility: Allow major AI bots (e.g., GPTBot) in robots.txt; link sitemaps; publish llms.txt that lists answer hubs and txt/md assets.
  • Controlled exposure: Allow crawl for specific paths (e.g., /docs, /faq), disallow private or thin areas; llms.txt lists only evergreen resources.
  • Restrictive posture: Disallow AI bots globally, then carve exceptions (if desired) for public docs; llms.txt can still exist but should reference only permissible content.

Practical examples

robots.txt (selective allow for GPTBot and general allow for others)

User-agent: *Disallow:Sitemap: https://example.com/sitemap.xmlUser-agent: GPTBotAllow: /docs/Disallow: /private/

llms.txt (high-signal hints)

# ExampleCo LLMs Hints# Canonical hubs and machine-friendly resourcesSitemap: https://example.com/sitemap.xml[Hubs]- FAQs: https://example.com/faq- Pricing explainer: https://example.com/pricing- Implementation guide: https://example.com/docs/implement

Monitoring and compliance

  • Verify bot behavior in your logs and dashboards; track user-agent patterns to confirm adherence to robots rules.
  • Keep llms.txt and sitemaps current; avoid redirects and JS‑dependent pages in your hint lists.
  • Review periodically—policies and bot capabilities evolve.

FAQs

  • Do all AI crawlers respect robots.txt? Most reputable ones claim to; always validate in logs and adjust as needed.
  • Is llms.txt required? No. It’s a helpful hint layer that complements (not replaces) robots.txt and sitemaps.
  • Will blocking a bot reduce my AI visibility? Likely, yes. If a bot can’t crawl, it may rely on older training data or not cite you at all.

Other standards

Other standards are attempting to play a role as the ecosystem evolves. For example, ai.txt. Optional alias some publishers use; not a standard. Use only as a convenience, not a substitute for robots.txt.
https://spawning.substack.com/p/aitxt-a-new-way-for-websites-to-set