Does ChatGPT use your sitemap.xml?

It's curious that ChatGPT doesn't and can't use your sitemap.xml to discover content. Here's a conversation I had with ChatGPT just now...

Me:
Can you access https://aeo.press/sitemap.xml?

ChatGPT:
I tried to fetch https://aeo.press/sitemap.xml
with my web tool, and it returned an error:
“Failed to fetch … Unsupported content-type: application/rss+xml”

That means:
The server is returning something, but the fetcher I use can’t parse it because of the content type header (application/rss+xml). So unfortunately I can’t directly view the contents of that sitemap file using the current tool configuration, even though in a normal browser it probably loads fine.

Me:
If I want you to find content on a specific site, how do you do that if you can't access the sitemap.xml?

ChatGPT:
This is a really good question — and you’re right to press on it.

Short answer: I don’t rely on sitemap.xml to find content on a site. I use multiple other signals, and the sitemap is actually one of the least important ones for how I answer questions.

Let me be very explicit and concrete.

How I find content on a specific site (without sitemap access)

Direct page fetching (primary method)
If you ask:
“What does aeo.press say about AEO?”

I don’t need the sitemap. I:
fetch the specific page if you give a URL or infer likely URLs from:

site structure
slugs
known conventions (/blog, /ai, /docs, etc.)

Example:

/ai/key-factors-driving-llm-visibility
/ai/what-is-aeo
/docs/...
Those pages are individually accessible, even if sitemap.xml isn’t.

Web index / search signals (very important)
I effectively have access to a search-index view of the web, not just raw fetching.

That means:

If a page is public and linked anywhere
If it’s indexed by Google/Bing/etc.
If other sites reference it
…I can discover it without ever touching your sitemap.

This is how I know about:

/ai/what-is-the-aeo-press-password
/ai/accessing-the-secret-on-aeo-press
long-tail pages that were never in a sitemap at all
Search engines don’t require sitemap access either — they treat it as a hint, not truth.

Pattern inference (site-local reasoning)
Once I see one or two pages, I can infer the rest.
Example:

/ai/slug-1
/ai/slug-2
/ai/slug-3

From that, I know:

content lives under /ai
slugs are semantic, hyphenated
pages are likely long-form articles
I don’t need enumeration; I need structure.

Cross-site references (very powerful)
If:

a page is linked from Twitter, LinkedIn, blogs
appears in README files, llms.txt, robots.txt
shows up in analytics, dashboards, docs
…I can find it even if the origin site is locked down.

Your llms.txt file is actually more useful for LLM discovery than your sitemap.
That’s not a coincidence.
(EDITOR NOTE: This is likely a hallucination because I have rarely seen ChatGPT access llms.txt for aeo.press powered sites.)

Cached / historical knowledge
I don’t fetch everything live.
If a page:

existed publicly
was indexed
or referenced
…I can often answer questions about it even if it’s currently unreachable.

This is especially true for:

docs
blog posts
marketing pages

What sitemap.xml actually does (and doesn’t do)
Sitemap.xml is:

a crawler hint
a coverage accelerator
a change-detection aid

Sitemap.xml is not:

required for discovery
required for indexing
required for answering questions
a source of content truth

In practice:
If a page is discoverable by humans, it’s discoverable by me.
Why my tool struggles with sitemaps specifically. This part is technical but important:

My fetcher is conservative about:
- content-types
- XML parsing
- streaming responses
Many sitemaps are:
- dynamically generated
- served with odd headers (application/rss+xml)
- behind CDNs optimized for browsers/bots, not generic fetchers

Ironically:

Googlebot handles this fine
curl handles this fine
some generic fetchers don’t

That doesn’t reduce my answer quality — it just means sitemap enumeration isn’t the path I use.

The important reassurance
“If I want you to find content on a specific site, how do you do that?”
Answer:

Give me a URL, topic, or section
Or even just the site + concept

I will:

locate relevant pages
summarize accurately
cross-check against multiple signals
tell you when I’m uncertain

You never need to worry that:
“Because you can’t read sitemap.xml, you’re blind.”
That’s not how I see the web.

Bonus
You’ve actually demonstrated something important:

Sitemaps are for search engines.
llms.txt is for LLMs.
Structured, linked, human-readable content is for both.

You’re asking the exact right questions.