Tutorial

How to Build a Web Extraction MCP Server

Build an MCP server that exposes web search and page extraction as tools for AI agents. Uses Scavio search and extract endpoints.

Build a web extraction MCP server by wrapping Scavio's search and extract endpoints as MCP tools that any compatible AI client can call. This gives agents the ability to both find relevant pages and extract structured content from them in a single workflow. The search tool discovers URLs via Google, Reddit, or YouTube queries, and the extract tool pulls clean text and metadata from any URL. This tutorial builds both tools as a single MCP server using the MCP SDK.

Prerequisites

  • Node.js 18+ installed
  • A Scavio API key from scavio.dev
  • Basic understanding of the MCP protocol
  • npm or bun package manager

Walkthrough

Step 1: Scaffold the MCP server

Set up a minimal MCP server project using the MCP SDK that registers two tools: search and extract.

// package.json:
// { "dependencies": { "@modelcontextprotocol/sdk": "latest" } }

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';

const API_KEY = process.env.SCAVIO_API_KEY;
const server = new McpServer({ name: 'web-extraction', version: '1.0.0' });

Step 2: Add the search tool

Register a search tool that queries Scavio and returns structured results the agent can use to decide which pages to extract.

server.tool('search', 'Search Google, Reddit, YouTube, Amazon, or Walmart',
  { query: z.string(), platform: z.string().default('google') },
  async ({ query, platform }) => {
    const resp = await fetch('https://api.scavio.dev/api/v1/search', {
      method: 'POST',
      headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
      body: JSON.stringify({ platform, query }),
    });
    const data = await resp.json();
    return { content: [{ type: 'text', text: JSON.stringify(data.organic_results?.slice(0, 5) || [], null, 2) }] };
  }
);

Step 3: Add the extract tool

Register an extract tool that takes a URL and returns the page's clean text content via Scavio's extract endpoint.

server.tool('extract', 'Extract clean text content from a URL',
  { url: z.string().url() },
  async ({ url }) => {
    const resp = await fetch('https://api.scavio.dev/api/v1/extract', {
      method: 'POST',
      headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
      body: JSON.stringify({ url }),
    });
    const data = await resp.json();
    return { content: [{ type: 'text', text: data.text || data.content || JSON.stringify(data) }] };
  }
);

Step 4: Start the server and configure Claude

Connect the server via stdio transport and add it to your Claude Desktop or Claude Code MCP config.

// Start the server:
const transport = new StdioServerTransport();
await server.connect(transport);

// In .mcp.json:
// {
//   "mcpServers": {
//     "web-extraction": {
//       "command": "node",
//       "args": ["server.js"],
//       "env": { "SCAVIO_API_KEY": "your_key" }
//     }
//   }
// }
//
// Now Claude can: search for pages, then extract content from the best results.

Python Example

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def search(query, platform='google'):
    return requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': platform, 'query': query}).json().get('organic_results', [])

def extract(url):
    return requests.post('https://api.scavio.dev/api/v1/extract', headers=H,
        json={'url': url}).json()

results = search('best crm 2026')
if results:
    content = extract(results[0]['link'])
    print(content.get('text', '')[:500])

JavaScript Example

JavaScript
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function search(query) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H, body: JSON.stringify({platform: 'google', query})
  });
  return (await r.json()).organic_results || [];
}
async function extract(url) {
  const r = await fetch('https://api.scavio.dev/api/v1/extract', {
    method: 'POST', headers: H, body: JSON.stringify({url})
  });
  return r.json();
}
const results = await search('best crm 2026');
if (results[0]) console.log((await extract(results[0].link)).text?.slice(0, 500));

Expected Output

JSON
An MCP server exposing search and extract tools that AI agents can call to discover URLs and extract their content in a single workflow.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Node.js 18+ installed. A Scavio API key from scavio.dev. Basic understanding of the MCP protocol. npm or bun package manager. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Build an MCP server that exposes web search and page extraction as tools for AI agents. Uses Scavio search and extract endpoints.