Practical Lens 04: Crawl access is an identity prerequisite

If one AI tool “knows” your services and another does not, assume uneven access to your core pages—not different “intelligence.”

What this lens means

AI cannot interpret what it cannot reliably fetch. Before “understanding” happens, systems need stable access to the pages that define your identity (about, services, contact, locations). If those surfaces are inconsistently reachable, identity resolution becomes inconsistent too.

Why tools disagree

  • Different systems use different crawlers, fetching policies, and retry logic—so access failures are not uniform.
  • Even small differences (redirects, cookies, bot rules) can change what content is actually retrieved.
  • If core identity pages are missing from what the system can fetch, the model fills gaps with partial context or third-party references.

What this usually indicates

  • Blocked or constrained bots (robots rules, WAF/bot protection, user-agent filtering).
  • Unstable responses (intermittent 403/404/5xx, timeouts, rate limits).
  • Variant delivery (different content by region, language, device, or cookies).
  • Soft-404 patterns (HTML says “not found” but HTTP is 200).
  • Discovery gaps (core pages not linked well, missing from sitemap.xml).

What to verify (evidence-only)

  • Do core pages return stable HTTP 200 (and only real missing pages return 404)?
  • Are there intermittent 403/429/5xx responses when fetched repeatedly?
  • Does robots.txt allow crawling of core pages and reference the sitemap?
  • Is sitemap.xml reachable and does it include core identity pages?
  • Do different user agents reach the same primary surfaces (no bot-only blocks/redirects)?
  • Is important identity content present in the initial HTML (not only after JS execution)?

What this is not

  • Not a claim that “allowing all bots” is always correct.
  • Not about ranking. This is about reliable identity fetchability.