markdownllmtokens

Webpage to Markdown for LLMs: Save 40-60% Tokens in 2026

Raw HTML wastes 40-60% of the context window. Three strategies to convert pages to LLM-ready markdown with real token benchmarks.

6 min read

Three threads in the last week (r/mlops, r/LLMDevs, r/chrome_extensions) all asked variants of the same question: what is the best way to convert a webpage to LLM-ready markdown in 2026? The theme behind the question is always the same: token cost is real, and raw HTML wastes 40 to 60% of the context window.

The Token Waste Problem

A typical API documentation page ships around 8,000 tokens of HTML. Of that, roughly 2,500 tokens are actual content. The other 5,500 are navigation, footer, cookie banners, sidebar links, and JSON-LD metadata. A coding agent that ingests the raw page spends 70% of its context budget on noise. Multiply by 5 to 10 doc fetches per session and the context window is full before real work starts.

The Three Strategies

  1. DIY with BeautifulSoup. Free but brittle. Every site has different structure. Maintenance burden grows with site count.
  2. Browser extension (Mdown, SingleFile). Good for manual one-offs. Not automated, not suitable for agent pipelines.
  3. Hosted API (Scavio extract, Jina Reader, Firecrawl). Pay per conversion, gain a maintained pipeline, works at scale.

The Honest Comparison of the Hosted APIs

Three serious options. Each has a reasonable free tier for evaluation.

  • Scavio extract. Pairs with Scavio's search API, so the same API key covers both "find the doc" and "fetch the doc". Typed markdown output. $0.003 to $0.005 per conversion.
  • Jina Reader. Generous free tier. $0.02 per request above the free tier. Simple URL prefix pattern makes it trivial to test.
  • Firecrawl. Deepest JS rendering for complex sites. More expensive at high volume. $19 to $399 depending on tier.

Measuring Token Savings

Run a quick benchmark before committing. For your specific site set, measure raw HTML tokens versus converted markdown tokens. Savings vary wildly by site (40% on clean docs, 80% on bloated marketing pages).

Python
import os, requests, tiktoken
enc = tiktoken.get_encoding('cl100k_base')
API_KEY = os.environ['SCAVIO_API_KEY']

def measure(url: str):
    raw = requests.get(url).text
    md = requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': API_KEY},
        json={'url': url, 'format': 'markdown'}).json().get('markdown', '')
    return {
        'raw_tokens': len(enc.encode(raw)),
        'md_tokens': len(enc.encode(md)),
        'savings_pct': 100 * (1 - len(enc.encode(md)) / len(enc.encode(raw)))
    }

for url in ['https://docs.prisma.io/orm',
            'https://fastapi.tiangolo.com',
            'https://nextjs.org/docs']:
    print(url, measure(url))

The Agent Integration

For a coding agent, the right pattern is an MCP tool that combines "find" and "fetch". The user asks "what is the latest Prisma migrate syntax?" and the agent runs SERP to find the doc URL, then extract to pull clean markdown into context.

Python
def find_and_fetch(topic: str, doc_domain: str) -> str:
    # Step 1: find the URL via SERP
    serp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': f'site:{doc_domain} {topic}',
              'num_results': 3}).json()
    url = serp.get('organic_results', [{}])[0].get('link')
    if not url:
        return ''

    # Step 2: fetch as markdown
    md = requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': API_KEY},
        json={'url': url, 'format': 'markdown'}).json()
    return md.get('markdown', '')

When DIY Still Makes Sense

For a fixed set of known sites (your own docs, a customer's portal you integrate with daily), a hand-tuned BeautifulSoup extractor beats any hosted API on quality and cost. The rule: DIY for the 5 sites you touch every day, hosted API for the 500 sites you touch occasionally.

The Cache Layer

Doc pages change slowly. A small Redis or file cache with a 24-hour TTL cuts repeat conversions to near zero. For a coding agent hitting the same Prisma docs 20 times a session, the cache is the biggest single cost savings after the markdown conversion itself.

Where to Start

Pick one heavy-use doc site. Run the token benchmark. If savings are above 40%, wire Scavio extract into the agent flow this week. The tutorial at how-to-convert-website-to-llm-ready-markdown has the full working code.