How long does this convert a website to llm-ready markdown tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.10+ or Node 20+. A Scavio API key. A Scavio API key gives you 50 free credits on signup.

Can I run this tutorial with the free tier?

Yes. The free tier includes 50 credits on signup, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Website to LLM-Ready Markdown (2026)

LLM-ready markdown matters because token waste is real cost. A typical API doc page ships 8,000 tokens of HTML but only 2,500 tokens of signal. This tutorial uses Scavio's extract endpoint to produce token-efficient markdown ready for agent context.

Prerequisites

Python 3.10+ or Node 20+
A Scavio API key

Walkthrough

Step 1: Call the extract endpoint

Scavio returns markdown stripped of nav and chrome.

Python

import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def to_markdown(url):
    r = requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': API_KEY},
        json={'url': url, 'format': 'markdown'})
    return r.json().get('markdown', '')

Step 2: Measure token savings

Compare raw HTML size against markdown.

Python

import tiktoken
enc = tiktoken.get_encoding('cl100k_base')

def compare(url):
    md = to_markdown(url)
    raw = requests.get(url).text
    return {'raw_tokens': len(enc.encode(raw)), 'md_tokens': len(enc.encode(md))}

Step 3: Feed into an LLM agent

Markdown slots directly into a user message.

Python

import anthropic
client = anthropic.Anthropic()

def summarize(url):
    md = to_markdown(url)
    msg = client.messages.create(
        model='claude-sonnet-4-6',
        max_tokens=512,
        messages=[{'role': 'user', 'content': f'Summarize in 5 bullets:\n{md[:6000]}'}])
    return msg.content[0].text

Step 4: Cache frequently fetched pages

Avoid repeat calls for stable doc pages.

Python

from functools import lru_cache

@lru_cache(maxsize=500)
def cached_markdown(url):
    return to_markdown(url)

Step 5: Batch-convert a site map

Loop through a sitemap.xml for bulk conversion.

Python

from xml.etree import ElementTree
def bulk(sitemap_url):
    r = requests.get(sitemap_url)
    urls = [e.text for e in ElementTree.fromstring(r.text).iter('{*}loc')]
    return {u: to_markdown(u) for u in urls[:50]}

Python Example

Python

import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']

def to_markdown(url):
    r = requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': API_KEY},
        json={'url': url, 'format': 'markdown'})
    return r.json().get('markdown', '')

print(to_markdown('https://docs.prisma.io')[:500])

JavaScript Example

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY;
export async function toMarkdown(url) {
  const r = await fetch('https://api.scavio.dev/api/v1/extract', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ url, format: 'markdown' })
  });
  return (await r.json()).markdown || '';
}

Expected Output

JSON

Clean markdown representation of the page, stripped of nav and cookies. Token count drops 40 to 60% versus raw HTML.

Prerequisites

Python 3.10+ or Node 20+
A Scavio API key

Walkthrough

Step 1: Call the extract endpoint

Scavio returns markdown stripped of nav and chrome.

Python

import requests, os
API_KEY = os.environ['SCAVIO_API_KEY']

def to_markdown(url):
    r = requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': API_KEY},
        json={'url': url, 'format': 'markdown'})
    return r.json().get('markdown', '')

Step 2: Measure token savings

Compare raw HTML size against markdown.

Python

import tiktoken
enc = tiktoken.get_encoding('cl100k_base')

def compare(url):
    md = to_markdown(url)
    raw = requests.get(url).text
    return {'raw_tokens': len(enc.encode(raw)), 'md_tokens': len(enc.encode(md))}

Step 3: Feed into an LLM agent

Markdown slots directly into a user message.

Python

import anthropic
client = anthropic.Anthropic()

def summarize(url):
    md = to_markdown(url)
    msg = client.messages.create(
        model='claude-sonnet-4-6',
        max_tokens=512,
        messages=[{'role': 'user', 'content': f'Summarize in 5 bullets:\n{md[:6000]}'}])
    return msg.content[0].text

Step 4: Cache frequently fetched pages

Avoid repeat calls for stable doc pages.

Python

from functools import lru_cache

@lru_cache(maxsize=500)
def cached_markdown(url):
    return to_markdown(url)

Step 5: Batch-convert a site map

Loop through a sitemap.xml for bulk conversion.

Python

from xml.etree import ElementTree
def bulk(sitemap_url):
    r = requests.get(sitemap_url)
    urls = [e.text for e in ElementTree.fromstring(r.text).iter('{*}loc')]
    return {u: to_markdown(u) for u in urls[:50]}

Python Example

Python

import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']

def to_markdown(url):
    r = requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': API_KEY},
        json={'url': url, 'format': 'markdown'})
    return r.json().get('markdown', '')

print(to_markdown('https://docs.prisma.io')[:500])

JavaScript Example

JavaScript

const API_KEY = process.env.SCAVIO_API_KEY;
export async function toMarkdown(url) {
  const r = await fetch('https://api.scavio.dev/api/v1/extract', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ url, format: 'markdown' })
  });
  return (await r.json()).markdown || '';
}

Expected Output

JSON

Clean markdown representation of the page, stripped of nav and cookies. Token count drops 40 to 60% versus raw HTML.

How to Convert a Website to LLM-Ready Markdown

Prerequisites

Walkthrough

Step 1: Call the extract endpoint

Step 2: Measure token savings

Step 3: Feed into an LLM agent

Step 4: Cache frequently fetched pages

Step 5: Batch-convert a site map

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this convert a website to llm-ready markdown tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Web Scraping API for LLMs in 2026

Token-Efficient Web Search for AI Agents

Token-Efficient Search Context for LLM Pipelines

Best Web Search API for Local LLMs in 2026

HTML to Markdown Pre-LLM Workflow

HTML Token Cost

Start Building

How to Convert a Website to LLM-Ready Markdown

Prerequisites

Walkthrough

Step 1: Call the extract endpoint

Step 2: Measure token savings

Step 3: Feed into an LLM agent

Step 4: Cache frequently fetched pages

Step 5: Batch-convert a site map

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this convert a website to llm-ready markdown tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Related Resources

Best Web Scraping API for LLMs in 2026

Token-Efficient Web Search for AI Agents

Token-Efficient Search Context for LLM Pipelines

Best Web Search API for Local LLMs in 2026

HTML to Markdown Pre-LLM Workflow

HTML Token Cost

Start Building