Workflow

Government Bid Monitoring Workflow

Cron-driven Google Dorks pipeline for fresh government bid PDFs. Async cache returns hits in 50ms.

Overview

Asynchronous bid-discovery pipeline. Cron at dawn runs Google Dorks across government domains, extracts PDFs, converts to typed JSON, caches in SQLite. CrewAI agent reads from cache.

Trigger

Daily 4 AM

Schedule

Daily 4 AM

Workflow Steps

1

Cron at 4 AM

Run pipeline before business hours.

2

Google Dorks across target domains

site:gov.br filetype:pdf 2026, site:europa.eu filetype:pdf, etc.

3

Filter for fresh PDFs

Date filter or LLM screening.

4

Scavio extract per PDF

PDF-aware extract returns markdown.

5

LLM converts to typed JSON

Strict-schema prompt.

6

Cache in SQLite

Sub-50ms repeat lookups.

7

CrewAI agent queries cache

Fresh bids available without latency.

Python Implementation

Python
import os, requests, sqlite3, json, time
API_KEY = os.environ['SCAVIO_API_KEY']
H = {'x-api-key': API_KEY}
DORKS = ['site:gov.br filetype:pdf 2026 contratos', 'site:europa.eu filetype:pdf AI act']
conn = sqlite3.connect('bids.db')
conn.execute('CREATE TABLE IF NOT EXISTS bids(url TEXT PRIMARY KEY, payload TEXT, ts REAL)')

def crawl():
    for q in DORKS:
        r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
        for o in r.get('organic_results', []):
            if o.get('link', '').endswith('.pdf'):
                e = requests.post('https://api.scavio.dev/api/v1/extract', headers=H, json={'url': o['link'], 'format': 'markdown'}).json()
                conn.execute('INSERT OR REPLACE INTO bids VALUES (?, ?, ?)', (o['link'], json.dumps(e), time.time()))
    conn.commit()

JavaScript Implementation

JavaScript
const H = { 'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json' };
const DORKS = ['site:gov.br filetype:pdf 2026 contratos'];
async function crawl() {
  for (const q of DORKS) {
    const r = await fetch('https://api.scavio.dev/api/v1/search', { method: 'POST', headers: H, body: JSON.stringify({ query: q }) }).then(r => r.json());
    // Per result, fetch extract and write to cache.
  }
}

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Frequently Asked Questions

Asynchronous bid-discovery pipeline. Cron at dawn runs Google Dorks across government domains, extracts PDFs, converts to typed JSON, caches in SQLite. CrewAI agent reads from cache.

This workflow uses a daily 4 am. Daily 4 AM.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 500 credits per month with no credit card required. That is enough to test and validate this workflow before scaling it.

Government Bid Monitoring Workflow

Cron-driven Google Dorks pipeline for fresh government bid PDFs. Async cache returns hits in 50ms.