Overview
Asynchronous bid-discovery pipeline. Cron at dawn runs Google Dorks across government domains, extracts PDFs, converts to typed JSON, caches in SQLite. CrewAI agent reads from cache.
Trigger
Daily 4 AM
Schedule
Daily 4 AM
Workflow Steps
Cron at 4 AM
Run pipeline before business hours.
Google Dorks across target domains
site:gov.br filetype:pdf 2026, site:europa.eu filetype:pdf, etc.
Filter for fresh PDFs
Date filter or LLM screening.
Scavio extract per PDF
PDF-aware extract returns markdown.
LLM converts to typed JSON
Strict-schema prompt.
Cache in SQLite
Sub-50ms repeat lookups.
CrewAI agent queries cache
Fresh bids available without latency.
Python Implementation
import os, requests, sqlite3, json, time
API_KEY = os.environ['SCAVIO_API_KEY']
H = {'x-api-key': API_KEY}
DORKS = ['site:gov.br filetype:pdf 2026 contratos', 'site:europa.eu filetype:pdf AI act']
conn = sqlite3.connect('bids.db')
conn.execute('CREATE TABLE IF NOT EXISTS bids(url TEXT PRIMARY KEY, payload TEXT, ts REAL)')
def crawl():
for q in DORKS:
r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
for o in r.get('organic_results', []):
if o.get('link', '').endswith('.pdf'):
e = requests.post('https://api.scavio.dev/api/v1/extract', headers=H, json={'url': o['link'], 'format': 'markdown'}).json()
conn.execute('INSERT OR REPLACE INTO bids VALUES (?, ?, ?)', (o['link'], json.dumps(e), time.time()))
conn.commit()JavaScript Implementation
const H = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
const DORKS = ['site:gov.br filetype:pdf 2026 contratos'];
async function crawl() {
for (const q of DORKS) {
const r = await fetch('https://api.scavio.dev/api/v1/search', { method: 'POST', headers: H, body: JSON.stringify({ query: q }) }).then(r => r.json());
// Per result, fetch extract and write to cache.
}
}Platforms Used
Web search with knowledge graph, PAA, and AI overviews