HTML to Markdown Pre-LLM

Overview

Pre-LLM hop that converts URLs to markdown via Scavio /extract before the LLM sees them. Cuts input tokens ~10x for HTML-heavy tasks.

Trigger

Per-URL processing in any agent loop

Schedule

Per-task

Workflow Steps

Receive URL list

From SERP results or user input.

Scavio /extract per URL

POST with {url, format: 'markdown'}.

Optional cache hit

If markdown was extracted in last 24h, return cached.

Pass markdown to LLM

LLM context now ~3K tokens per page instead of ~30K.

LLM produces output

Summary, classification, extraction, or whatever the task is.

Optional second-pass extract

If markdown is too long, re-extract with summary mode or chunk.

Python Implementation

Python

import os, requests
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}

def extract(url):
    return requests.post('https://api.scavio.dev/api/v1/extract', headers=H, json={'url': url, 'format': 'markdown'}).json().get('markdown', '')

JavaScript Implementation

JavaScript

const H = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
async function extract(url) {
  const r = await fetch('https://api.scavio.dev/api/v1/extract', { method:'POST', headers:H, body: JSON.stringify({ url, format: 'markdown' }) }).then(r => r.json());
  return r.markdown || '';
}

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Frequently Asked Questions

Pre-LLM hop that converts URLs to markdown via Scavio /extract before the LLM sees them. Cuts input tokens ~10x for HTML-heavy tasks.

This workflow uses a per-url processing in any agent loop. Per-task.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 50 credits on signup with no credit card required. That is enough to test and validate this workflow before scaling it.

Workflow Steps

Receive URL list

From SERP results or user input.

Scavio /extract per URL

POST with {url, format: 'markdown'}.

Optional cache hit

If markdown was extracted in last 24h, return cached.

Pass markdown to LLM

LLM context now ~3K tokens per page instead of ~30K.

LLM produces output

Summary, classification, extraction, or whatever the task is.

Optional second-pass extract

If markdown is too long, re-extract with summary mode or chunk.

import os, requests H = {'x-api-key': os.environ['SCAVIO_API_KEY']} def extract(url): return requests.post('https://api.scavio.dev/api/v1/extract', headers=H, json={'url': url, 'format': 'markdown'}).json().get('markdown', '')

JavaScript Implementation

JavaScript

const H = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
async function extract(url) {
  const r = await fetch('https://api.scavio.dev/api/v1/extract', { method:'POST', headers:H, body: JSON.stringify({ url, format: 'markdown' }) }).then(r => r.json());
  return r.markdown || '';
}

Frequently Asked Questions

Pre-LLM hop that converts URLs to markdown via Scavio /extract before the LLM sees them. Cuts input tokens ~10x for HTML-heavy tasks.

This workflow uses a per-url processing in any agent loop. Per-task.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 50 credits on signup with no credit card required. That is enough to test and validate this workflow before scaling it.

HTML to Markdown Pre-LLM Workflow

Overview

Trigger

Schedule

Workflow Steps

Receive URL list

Scavio /extract per URL

Optional cache hit

Pass markdown to LLM

LLM produces output

Optional second-pass extract

Python Implementation

JavaScript Implementation

Platforms Used

Google

Frequently Asked Questions

What does the HTML to Markdown Pre-LLM Workflow workflow do?

How is this workflow triggered?

Which Scavio platforms does this workflow use?

Can I run this workflow on the free tier?