Practical Lens 48: Outdated PDFs and legacy pages dominate citations

Old PDFs and legacy pages can remain crawlable long after the offer has changed. If they are still easy to access, AI may use them as stable evidence.

What this lens means

PDFs, archived pages and legacy landing pages often stay online because they still serve a narrow purpose. The AI visibility risk starts when those old assets are easier to crawl, cite or understand than the current source of truth.

Key terms

Legacy page
An old page that remains public after the current website or offer has changed.
Outdated PDF
A downloadable document that may contain old names, offers, prices or positioning.
Citation source
A page or document that an AI system may use as evidence when answering questions.

Why this happens

  • Old PDFs remain linked from blog posts, partner pages or search results.
  • Legacy pages return 200 OK and look like valid current pages.
  • New pages are less explicit than old documents, so old assets appear easier to cite.
  • Sitemaps, internal links or external profiles still expose outdated documents.

What this usually indicates

  • Stale evidence risk: AI may cite old documents when current pages are weaker or harder to find.
  • Version confusion: Crawlers may see both current and legacy information as valid.
  • Authority drift: External links may keep pointing to older PDFs or legacy pages.
  • Weak source control: The website does not clearly mark which information is current.

What to verify (evidence-only)

  • List public PDF links from the homepage, sitemap and key landing pages.
  • Check whether legacy pages return 200 OK, redirect or noindex.
  • Compare PDF dates, titles and service names with the current website.
  • Review whether old assets include canonical, noindex or clear archive labels where appropriate.
  • Confirm that current pages explain the up-to-date offer more clearly than old documents.

Terminal check example

Replace example.com with the audited domain. The goal is to verify the specific evidence signals behind this lens.

curl -s https://example.com/ | grep -iE '\.pdf|legacy|archive|old|download'
curl -s https://example.com/sitemap.xml | grep -iE '\.pdf|legacy|archive|old'
curl -I https://example.com/old-document.pdf

PowerShell check example

Use this on Windows to inspect the same signals from visible content, sitemap or headers.

$home = Invoke-RestMethod -Uri "https://example.com/"
$home | Select-String -Pattern '\.pdf|legacy|archive|old|download'

$sitemap = Invoke-RestMethod -Uri "https://example.com/sitemap.xml"
$sitemap | Select-String -Pattern '\.pdf|legacy|archive|old'

Invoke-WebRequest -Uri "https://example.com/old-document.pdf" -Method Head

Frequently Asked Questions

Why can outdated PDFs affect AI visibility?

Because old documents can remain crawlable and may look like stable evidence.

Should every old PDF be removed?

No. But old documents should be clearly current, redirected, archived or excluded from indexing where appropriate.

What is the fastest check?

Search the homepage and sitemap for PDF and legacy URLs, then verify whether they still represent the current offer.