Log traffic at the edge to see both AI bot searches and human referrals from AI answers—JavaScript analytics alone will miss much of this signal.
Architecture at a glance
- Edge entry point (CDN/proxy/web server) writes access logs with: timestamp, method, status, path, referer, user-agent, ip
- Ship logs to storage/stream (e.g., object store or log pipeline)
- Parse for known bot UAs and likely AI referrals; aggregate into dashboards
Example configurations (generic)
NGINX (access log format)
log_format aeo '$time_iso8601\t$remote_addr\t$request_method\t$status\t' '$request_uri\t$http_referer\t$http_user_agent';access_log /var/log/nginx/access.log aeo;Cloudflare Workers (conceptual logging)
export default { async fetch(request, env, ctx) { const ua = request.headers.get('user-agent') || ''; const ref = request.headers.get('referer') || ''; const entry = { ts: new Date().toISOString(), path: new URL(request.url).pathname, ua, ref, }; // send to log store (KV/R2/Logs) // await env.LOG.put(entry.ts, JSON.stringify(entry)); return fetch(request); }}Parsing patterns (illustrative)
- Bot UA regex (case-insensitive):
(gptbot|oai-searchbot|claude|anthropic|perplexitybot|googlebot|bingbot|duckduckbot|baiduspider|yandex|facebookexternalhit|meta-externalagent|twitterbot|linkedinbot|ahrefsbot|semrushbot|mj12bot|\b(bot|spider|crawler)\b) - Likely AI referral heuristics:
- Known referrers where available (varies by tool)
- Sudden spikes on answerable pages after observed AI exposure
- UTM annotations from your experiments (e.g.,
utm_source=ai_overview)
Note: Treat these as starting points; adapt to your platform and privacy requirements.
Processing pipeline
- Ingest: rotate and ship logs immedatily/daily/hourly
- Enrich: tag entries with
is_bot,bot_name,is_llm_referral - Store: time-series DB or warehouse
- Visualize: build trends by bot, path, source; overlay conversions from GA4/CRM
What to expect
- Bot hits will concentrate on high-signal hubs and sitemaps
- Human LLM referrals often show higher intent and deeper sessions than generic organic (measure lift vs baseline)
FAQs
- Why not rely on GA4? AI bots don’t run JS; edge logs are the only source of truth.
- Can I definitively tag every AI referral? In most cases the log can include a refer value.