geoaeoschema

Schema Markup Doesn't Help AI Citations (Ahrefs Study)

Ahrefs tracked 1,885 pages adding schema. AI citations barely moved. What actually works for GEO.

7 min

Schema markup does not meaningfully improve AI citations. Ahrefs tracked 1,885 pages that added structured data and found AI citation rates barely moved. The reason: LLMs do not parse JSON-LD or microdata. They consume tokenized text from retrieved documents. What actually drives AI citations is answer-first content format, presence in Google's index, and topical authority.

Why schema does not help LLMs

Schema markup (JSON-LD, RDFa, microdata) is designed for search engine crawlers that parse structured data into knowledge graphs. LLMs work differently. When an LLM retrieves a web page through a search API, the page content is converted to plain text tokens. The JSON-LD block in your page header becomes noise tokens that the model either ignores or confuses with page content. Schema helps Google's traditional crawler understand your page structure. It does not help GPT-4, Claude, or Perplexity understand your content better.

What Ahrefs found

The Ahrefs study tracked 1,885 pages across multiple domains that added schema markup (FAQ, HowTo, Article, Product) over a 90-day period. AI Overview citation rates for these pages showed no statistically significant improvement compared to a control group. Rich snippet appearance in traditional Google results did improve, confirming schema's value for classic SEO. But AI Overviews source selection operates on different signals entirely.

What actually drives AI citations

Three factors consistently correlate with AI Overview and LLM citations. First: answer-first content format. Pages that lead with a direct answer to the query in the first sentence get cited more often because retrieval systems extract top-of-page content as context. Second: being in Google's index. LLMs use search APIs under the hood. If your page is not indexed, it cannot be retrieved. Third: topical authority. Pages from domains with deep coverage of a topic get preferential retrieval ranking. None of these require schema markup.

Track AI Overview citations via API

Instead of adding schema and hoping, directly monitor whether your pages appear as AI Overview sources. The Scavio search API returns AI Overview data including source citations for any Google query.

Python
import requests, os

H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def track_ai_citations(keywords: list[str], domain: str) -> dict:
    """Check which keywords cite your domain in AI Overviews."""
    results = {'cited': [], 'not_cited': [], 'no_overview': []}

    for kw in keywords:
        resp = requests.post('https://api.scavio.dev/api/v1/search',
            headers=H,
            json={'query': kw, 'platform': 'google'})
        data = resp.json()
        ai_overview = data.get('ai_overview', {})

        if not ai_overview:
            results['no_overview'].append(kw)
            continue

        sources = ai_overview.get('sources', [])
        cited = any(domain in s.get('link', '') for s in sources)

        if cited:
            results['cited'].append(kw)
        else:
            results['not_cited'].append(kw)

    return results

# Track 50 keywords at $0.005/query = $0.25
report = track_ai_citations(
    keywords=['best crm for startups', 'crm pricing comparison 2026'],
    domain='yoursite.com'
)

The answer-first format that works

Structure your content so the first paragraph directly answers the title question. No "In this article, we will explore..." preambles. No throat-clearing. LLM retrieval systems extract the top of the page as context. If your answer is buried in paragraph four, a competitor whose answer is in paragraph one gets cited instead. This is the single highest-leverage change for AI citation rates: move your answer to the first sentence.

Daily citation monitoring script

Python
import json, datetime

def daily_citation_check():
    keywords = json.load(open('target_keywords.json'))
    report = track_ai_citations(keywords, 'yoursite.com')

    date = datetime.date.today().isoformat()
    with open(f'citations_{date}.json', 'w') as f:
        json.dump(report, f, indent=2)

    cited_pct = len(report['cited']) / len(keywords) * 100
    print(f"{date}: {cited_pct:.1f}% citation rate "
          f"({len(report['cited'])}/{len(keywords)} keywords)")

# Run daily via cron. Track trends over weeks, not days.
# Cost: 50 keywords/day x 30 days = 1,500 credits = $7.50/month

Where to invest instead of schema

Rewrite your top 20 pages to answer-first format. Ensure all target pages are indexed (use Google Search Console). Build topical clusters: 10+ pages covering different angles of your core topic. Add comparison tables and concrete data points that retrieval systems can extract verbatim. Monitor AI Overview citations weekly via API to measure what actually moves the needle. Schema markup is not harmful, but treating it as an AI citation strategy is a misallocation of effort. The ROI is in content format and topical depth, not metadata tags.