Build a web extraction MCP server by wrapping Scavio's search and extract endpoints as MCP tools that any compatible AI client can call. This gives agents the ability to both find relevant pages and extract structured content from them in a single workflow. The search tool discovers URLs via Google, Reddit, or YouTube queries, and the extract tool pulls clean text and metadata from any URL. This tutorial builds both tools as a single MCP server using the MCP SDK.
Prerequisites
- Node.js 18+ installed
- A Scavio API key from scavio.dev
- Basic understanding of the MCP protocol
- npm or bun package manager
Walkthrough
Step 1: Scaffold the MCP server
Set up a minimal MCP server project using the MCP SDK that registers two tools: search and extract.
// package.json:
// { "dependencies": { "@modelcontextprotocol/sdk": "latest" } }
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { z } from 'zod';
const API_KEY = process.env.SCAVIO_API_KEY;
const server = new McpServer({ name: 'web-extraction', version: '1.0.0' });Step 2: Add the search tool
Register a search tool that queries Scavio and returns structured results the agent can use to decide which pages to extract.
server.tool('search', 'Search Google, Reddit, YouTube, Amazon, or Walmart',
{ query: z.string(), platform: z.string().default('google') },
async ({ query, platform }) => {
const resp = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST',
headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({ platform, query }),
});
const data = await resp.json();
return { content: [{ type: 'text', text: JSON.stringify(data.organic_results?.slice(0, 5) || [], null, 2) }] };
}
);Step 3: Add the extract tool
Register an extract tool that takes a URL and returns the page's clean text content via Scavio's extract endpoint.
server.tool('extract', 'Extract clean text content from a URL',
{ url: z.string().url() },
async ({ url }) => {
const resp = await fetch('https://api.scavio.dev/api/v1/extract', {
method: 'POST',
headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({ url }),
});
const data = await resp.json();
return { content: [{ type: 'text', text: data.text || data.content || JSON.stringify(data) }] };
}
);Step 4: Start the server and configure Claude
Connect the server via stdio transport and add it to your Claude Desktop or Claude Code MCP config.
// Start the server:
const transport = new StdioServerTransport();
await server.connect(transport);
// In .mcp.json:
// {
// "mcpServers": {
// "web-extraction": {
// "command": "node",
// "args": ["server.js"],
// "env": { "SCAVIO_API_KEY": "your_key" }
// }
// }
// }
//
// Now Claude can: search for pages, then extract content from the best results.Python Example
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
def search(query, platform='google'):
return requests.post('https://api.scavio.dev/api/v1/search', headers=H,
json={'platform': platform, 'query': query}).json().get('organic_results', [])
def extract(url):
return requests.post('https://api.scavio.dev/api/v1/extract', headers=H,
json={'url': url}).json()
results = search('best crm 2026')
if results:
content = extract(results[0]['link'])
print(content.get('text', '')[:500])JavaScript Example
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function search(query) {
const r = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST', headers: H, body: JSON.stringify({platform: 'google', query})
});
return (await r.json()).organic_results || [];
}
async function extract(url) {
const r = await fetch('https://api.scavio.dev/api/v1/extract', {
method: 'POST', headers: H, body: JSON.stringify({url})
});
return r.json();
}
const results = await search('best crm 2026');
if (results[0]) console.log((await extract(results[0].link)).text?.slice(0, 500));Expected Output
An MCP server exposing search and extract tools that AI agents can call to discover URLs and extract their content in a single workflow.