AI answer

How to Log AI Bot and LLM Referral Traffic at the Edge (Vendor-Agnostic)

Log traffic at the edge to see both AI bot searches and human referrals from AI answers—JavaScript analytics alone will miss much of this signal.

Architecture at a glance

  • Edge entry point (CDN/proxy/web server) writes access logs with: timestamp, method, status, path, referer, user-agent, ip
  • Ship logs to storage/stream (e.g., object store or log pipeline)
  • Parse for known bot UAs and likely AI referrals; aggregate into dashboards

Example configurations (generic)

NGINX (access log format)

log_format aeo '$time_iso8601\t$remote_addr\t$request_method\t$status\t'               '$request_uri\t$http_referer\t$http_user_agent';access_log /var/log/nginx/access.log aeo;

Cloudflare Workers (conceptual logging)

export default {  async fetch(request, env, ctx) {    const ua = request.headers.get('user-agent') || '';    const ref = request.headers.get('referer') || '';    const entry = {      ts: new Date().toISOString(),      path: new URL(request.url).pathname,      ua, ref,    };    // send to log store (KV/R2/Logs)    // await env.LOG.put(entry.ts, JSON.stringify(entry));    return fetch(request);  }}

Parsing patterns (illustrative)

  • Bot UA regex (case-insensitive):
    (gptbot|oai-searchbot|claude|anthropic|perplexitybot|googlebot|bingbot|duckduckbot|baiduspider|yandex|facebookexternalhit|meta-externalagent|twitterbot|linkedinbot|ahrefsbot|semrushbot|mj12bot|\b(bot|spider|crawler)\b)
  • Likely AI referral heuristics:
    • Known referrers where available (varies by tool)
    • Sudden spikes on answerable pages after observed AI exposure
    • UTM annotations from your experiments (e.g., utm_source=ai_overview)

Note: Treat these as starting points; adapt to your platform and privacy requirements.

Processing pipeline

  1. Ingest: rotate and ship logs immedatily/daily/hourly
  2. Enrich: tag entries with is_bot, bot_name, is_llm_referral
  3. Store: time-series DB or warehouse
  4. Visualize: build trends by bot, path, source; overlay conversions from GA4/CRM

What to expect

  • Bot hits will concentrate on high-signal hubs and sitemaps
  • Human LLM referrals often show higher intent and deeper sessions than generic organic (measure lift vs baseline)

FAQs

  • Why not rely on GA4? AI bots don’t run JS; edge logs are the only source of truth.
  • Can I definitively tag every AI referral? In most cases the log can include a refer value.