Tutorial

How to Build an MCP Server for HTML Extraction

An r/ClaudeAI post launched PullMD for HTML to markdown via MCP. The same pattern with Scavio's hosted endpoint or your own FastMCP server.

An r/ClaudeAI post launched PullMD: an MCP server for HTML to markdown extraction. This tutorial walks two paths — hosted (Scavio MCP) and self-hosted (FastMCP wrapping Scavio extract).

Prerequisites

  • Python 3.10+ for self-hosted
  • Claude Code or any MCP client

Walkthrough

Step 1: Path A: Use Scavio's hosted MCP

Zero infra.

Bash
claude mcp add scavio https://mcp.scavio.dev/mcp --header "x-api-key: $SCAVIO_API_KEY"

Step 2: Path B: Self-host with FastMCP

Install fastmcp.

Bash
pip install fastmcp requests

Step 3: Wrap Scavio extract

FastMCP server exposing extract tool.

Python
import os, requests
from fastmcp import FastMCP

mcp = FastMCP('html-extractor')

@mcp.tool()
def extract(url: str) -> dict:
    return requests.post('https://api.scavio.dev/api/v1/extract',
        headers={'x-api-key': os.environ['SCAVIO_API_KEY']},
        json={'url': url, 'format': 'markdown'}).json()

if __name__ == '__main__':
    mcp.run()

Step 4: Run locally

Listen on stdio or SSE.

Bash
python server.py

Step 5: Attach to Claude Code

Custom MCP config.

Bash
claude mcp add html-extractor python /path/to/server.py

Python Example

Python
# Path A is hosted, simplest, $0.0043/extract.
# Path B is self-hosted, $0/extract apart from Scavio underneath.

JavaScript Example

JavaScript
// Same in TS using @modelcontextprotocol/sdk.

Expected Output

JSON
Claude Code agent has a clean extract tool that returns markdown for any URL. Token usage drops 10x versus passing raw HTML.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.10+ for self-hosted. Claude Code or any MCP client. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

An r/ClaudeAI post launched PullMD for HTML to markdown via MCP. The same pattern with Scavio's hosted endpoint or your own FastMCP server.