Practical Lens 25: Error codes are identity signals
AI crawlers infer reliability from HTTP behavior. If reference pages intermittently return 403/404/500 (or soft 404), identity evidence becomes unstable and summaries drift.
What this lens means
AI crawlers treat HTTP behavior as a reliability signal. If key reference pages intermittently fail (403/404/500 or soft 404), crawlers won’t build stable identity evidence, and different tools may produce different summaries.
Why this happens
- AI crawlers discover and re-validate pages over time; intermittent failures create inconsistent evidence sets.
- Soft 404 pages look like normal HTML but communicate “not found” semantics, which reduces confidence.
- WAF rules, bot challenges, and misconfigured redirects often affect crawlers differently than browsers.
What this usually indicates
- Intermittent failures: the same URL sometimes returns 200 and sometimes 403/5xx.
- Soft 404 patterns: 200 OK responses that contain “not found” pages or empty shells.
- Uneven bot access: different crawlers see different status codes for the same URL.
- Crawl waste: crawlers spend budget on errors instead of reference pages.
What to verify (evidence-only)
- Do homepage/about/services/contact return 200 consistently (repeat the test multiple times)?
- Do crawler user agents receive the same status codes as a normal browser UA?
- Are error pages returning correct status codes (404 for not found, not 200)?
- Do redirects terminate cleanly (301/302 → 200) without loops?
- Do the same URLs behave consistently across language variants and subpaths?
Frequently Asked Questions
Why do HTTP errors affect AI identity?
Because crawlers can only use what they reliably fetch. If reference pages fail, the crawler's evidence set becomes incomplete or inconsistent.
What is a soft 404?
A page that returns 200 OK but is effectively a "not found" page (or empty shell). It confuses discovery and reduces confidence.
What's the fastest way to spot uneven access?
Compare status codes for the same URL using a normal UA and a crawler UA (e.g., Googlebot). If they differ, you have inconsistent crawl access.