Tutorial

How to Extract Structured Data from Any Website

Learn how to use Scavio's extract endpoint to pull structured data from any URL without writing custom scrapers.

Extracting structured data from websites typically requires writing custom scrapers for each site's HTML layout. Scavio's extract endpoint takes a URL and returns structured content without any parsing code. This tutorial shows how to extract data from product pages, articles, and company websites using a single API call.

Prerequisites

  • Python 3.8+ or Node.js 18+
  • requests library (Python) or built-in fetch (JS)
  • A Scavio API key from scavio.dev

Walkthrough

Step 1: Extract content from a URL

Send a URL to the extract endpoint and receive structured content.

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def extract(url: str) -> dict:
    resp = requests.post('https://api.scavio.dev/api/v1/extract',
        headers=H, json={'url': url}, timeout=30)
    return resp.json()

data = extract('https://example.com/product-page')
print(data)

Step 2: Extract multiple URLs in batch

Process a list of URLs and aggregate the extracted data.

Python
import time

def extract_batch(urls: list, delay: float = 0.5) -> list:
    results = []
    for url in urls:
        try:
            data = extract(url)
            results.append({'url': url, 'status': 'ok', 'data': data})
        except Exception as e:
            results.append({'url': url, 'status': 'error', 'error': str(e)})
        time.sleep(delay)
    return results

urls = ['https://example.com/page1', 'https://example.com/page2']
extracted = extract_batch(urls)

Step 3: Combine search + extract for enrichment

Search for companies, then extract structured data from their websites.

Python
def search_and_extract(query: str) -> list:
    # Search for relevant pages
    search_resp = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': query}, timeout=10)
    results = search_resp.json().get('organic', [])[:3]
    # Extract structured data from each result
    enriched = []
    for r in results:
        try:
            extracted = extract(r['link'])
            enriched.append({'title': r['title'], 'url': r['link'], 'extracted': extracted})
        except: pass
    return enriched

data = search_and_extract('best CRM software pricing')

Step 4: Save extracted data

Export the extracted data for downstream processing.

Python
import json

def save_extracted(data: list, filepath: str):
    with open(filepath, 'w') as f:
        json.dump(data, f, indent=2)
    print(f'Saved {len(data)} extracted records to {filepath}')

save_extracted(extracted, 'extracted_data.json')

Python Example

Python
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def extract(url):
    return requests.post('https://api.scavio.dev/api/v1/extract',
        headers=H, json={'url': url}, timeout=30).json()

# Extract structured data from any URL:
data = extract('https://example.com/pricing')

JavaScript Example

JavaScript
async function extract(url) {
  const resp = await fetch('https://api.scavio.dev/api/v1/extract', {
    method: 'POST', headers: {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'},
    body: JSON.stringify({url})
  });
  return resp.json();
}

Expected Output

JSON
Structured data extracted from any URL via a single API call, with no custom parsing code needed.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.8+ or Node.js 18+. requests library (Python) or built-in fetch (JS). A Scavio API key from scavio.dev. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Learn how to use Scavio's extract endpoint to pull structured data from any URL without writing custom scrapers.