Tutorial

How to Convert a Website to LLM-Ready Markdown

Strip nav, cookie banners, and footers from any page before sending it to an LLM. Typed JSON output cuts tokens by 60% versus raw HTML.

LLM-ready markdown matters because token waste is real cost. A typical API doc page ships 8,000 tokens of HTML but only 2,500 tokens of signal. This tutorial uses Scavio's extract endpoint to produce token-efficient markdown ready for agent context.

Prerequisites

  • Python 3.10+ or Node 20+
  • A Scavio API key

Walkthrough

Step 1: Call the extract endpoint

Scavio returns markdown stripped of nav and chrome.

Python
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def to_markdown(url):
    r = requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': API_KEY},
        json={'url': url, 'format': 'markdown'})
    return r.json().get('markdown', '')

Step 2: Measure token savings

Compare raw HTML size against markdown.

Python
import tiktoken
enc = tiktoken.get_encoding('cl100k_base')

def compare(url):
    md = to_markdown(url)
    raw = requests.get(url).text
    return {'raw_tokens': len(enc.encode(raw)), 'md_tokens': len(enc.encode(md))}

Step 3: Feed into an LLM agent

Markdown slots directly into a user message.

Python
import anthropic
client = anthropic.Anthropic()

def summarize(url):
    md = to_markdown(url)
    msg = client.messages.create(
        model='claude-sonnet-4-6',
        max_tokens=512,
        messages=[{'role': 'user', 'content': f'Summarize in 5 bullets:\n{md[:6000]}'}])
    return msg.content[0].text

Step 4: Cache frequently fetched pages

Avoid repeat calls for stable doc pages.

Python
from functools import lru_cache

@lru_cache(maxsize=500)
def cached_markdown(url):
    return to_markdown(url)

Step 5: Batch-convert a site map

Loop through a sitemap.xml for bulk conversion.

Python
from xml.etree import ElementTree
def bulk(sitemap_url):
    r = requests.get(sitemap_url)
    urls = [e.text for e in ElementTree.fromstring(r.text).iter('{*}loc')]
    return {u: to_markdown(u) for u in urls[:50]}

Python Example

Python
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']

def to_markdown(url):
    r = requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': API_KEY},
        json={'url': url, 'format': 'markdown'})
    return r.json().get('markdown', '')

print(to_markdown('https://docs.prisma.io')[:500])

JavaScript Example

JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;
export async function toMarkdown(url) {
  const r = await fetch('https://api.scavio.dev/api/v1/extract', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ url, format: 'markdown' })
  });
  return (await r.json()).markdown || '';
}

Expected Output

JSON
Clean markdown representation of the page, stripped of nav and cookies. Token count drops 40 to 60% versus raw HTML.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+ or Node 20+. A Scavio API key. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Strip nav, cookie banners, and footers from any page before sending it to an LLM. Typed JSON output cuts tokens by 60% versus raw HTML.