LLM-ready markdown matters because token waste is real cost. A typical API doc page ships 8,000 tokens of HTML but only 2,500 tokens of signal. This tutorial uses Scavio's extract endpoint to produce token-efficient markdown ready for agent context.
Prerequisites
- Python 3.10+ or Node 20+
- A Scavio API key
Walkthrough
Step 1: Call the extract endpoint
Scavio returns markdown stripped of nav and chrome.
import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']
def to_markdown(url):
r = requests.post('https://api.scavio.dev/api/v1/extract',
headers={'x-api-key': API_KEY},
json={'url': url, 'format': 'markdown'})
return r.json().get('markdown', '')Step 2: Measure token savings
Compare raw HTML size against markdown.
import tiktoken
enc = tiktoken.get_encoding('cl100k_base')
def compare(url):
md = to_markdown(url)
raw = requests.get(url).text
return {'raw_tokens': len(enc.encode(raw)), 'md_tokens': len(enc.encode(md))}Step 3: Feed into an LLM agent
Markdown slots directly into a user message.
import anthropic
client = anthropic.Anthropic()
def summarize(url):
md = to_markdown(url)
msg = client.messages.create(
model='claude-sonnet-4-6',
max_tokens=512,
messages=[{'role': 'user', 'content': f'Summarize in 5 bullets:\n{md[:6000]}'}])
return msg.content[0].textStep 4: Cache frequently fetched pages
Avoid repeat calls for stable doc pages.
from functools import lru_cache
@lru_cache(maxsize=500)
def cached_markdown(url):
return to_markdown(url)Step 5: Batch-convert a site map
Loop through a sitemap.xml for bulk conversion.
from xml.etree import ElementTree
def bulk(sitemap_url):
r = requests.get(sitemap_url)
urls = [e.text for e in ElementTree.fromstring(r.text).iter('{*}loc')]
return {u: to_markdown(u) for u in urls[:50]}Python Example
import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']
def to_markdown(url):
r = requests.post('https://api.scavio.dev/api/v1/extract',
headers={'x-api-key': API_KEY},
json={'url': url, 'format': 'markdown'})
return r.json().get('markdown', '')
print(to_markdown('https://docs.prisma.io')[:500])JavaScript Example
const API_KEY = process.env.SCAVIO_API_KEY;
export async function toMarkdown(url) {
const r = await fetch('https://api.scavio.dev/api/v1/extract', {
method: 'POST',
headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({ url, format: 'markdown' })
});
return (await r.json()).markdown || '';
}Expected Output
Clean markdown representation of the page, stripped of nav and cookies. Token count drops 40 to 60% versus raw HTML.