Gov Portal Search Fallback

Overview

Daily run: per gov-doc topic, dork-search via Scavio for indexed pages; route auth-gated targets to Playwright. Extract structured records.

Trigger

Daily cron 7am

Schedule

Daily 7am

Workflow Steps

Load target list (domain + topic)

From a YAML config or DB table.

Per target: classify indexed vs auth-gated

Use a per-target flag set during onboarding.

Indexed: Scavio dorked search across 4 templates

site:, filetype:, intitle:, inurl: variations.

Dedupe URLs across templates

Same URL across dorks = one source.

Scavio /extract for top-N URLs

Markdown ready for LLM extraction.

Auth-gated: Playwright/Stagehand fetch

Only the small subset that requires login.

LLM structured extraction

Per markdown blob, return JSON {title, date, summary, entities}.

Append to records DB

Postgres / Sheets / etc.

Python Implementation

Python

import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
DORKS = ['site:{d} filetype:pdf {t}', 'site:{d} intitle:{t}', 'site:{d} inurl:reports {t}']

def search_first(domain, topic):
    urls = []
    for tpl in DORKS:
        q = tpl.format(d=domain, t=topic)
        r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
        urls.extend(o['link'] for o in r.get('organic_results', [])[:5])
    return list(set(urls))

JavaScript Implementation

JavaScript

// Same in TS.

Platforms Used

Google

Web search with knowledge graph, PAA, and AI overviews

Frequently Asked Questions

Daily run: per gov-doc topic, dork-search via Scavio for indexed pages; route auth-gated targets to Playwright. Extract structured records.

This workflow uses a daily cron 7am. Daily 7am.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 50 credits on signup with no credit card required. That is enough to test and validate this workflow before scaling it.

Workflow Steps

Load target list (domain + topic)

From a YAML config or DB table.

Per target: classify indexed vs auth-gated

Use a per-target flag set during onboarding.

Indexed: Scavio dorked search across 4 templates

site:, filetype:, intitle:, inurl: variations.

Dedupe URLs across templates

Same URL across dorks = one source.

Scavio /extract for top-N URLs

Markdown ready for LLM extraction.

Auth-gated: Playwright/Stagehand fetch

Only the small subset that requires login.

LLM structured extraction

Per markdown blob, return JSON {title, date, summary, entities}.

Append to records DB

Postgres / Sheets / etc.

Python Implementation

Python

import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
DORKS = ['site:{d} filetype:pdf {t}', 'site:{d} intitle:{t}', 'site:{d} inurl:reports {t}']

def search_first(domain, topic):
    urls = []
    for tpl in DORKS:
        q = tpl.format(d=domain, t=topic)
        r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
        urls.extend(o['link'] for o in r.get('organic_results', [])[:5])
    return list(set(urls))

Frequently Asked Questions

Daily run: per gov-doc topic, dork-search via Scavio for indexed pages; route auth-gated targets to Playwright. Extract structured records.

This workflow uses a daily cron 7am. Daily 7am.

This workflow uses the following Scavio platforms: google. Each platform is called via the same unified API endpoint.

Yes. Scavio's free tier includes 50 credits on signup with no credit card required. That is enough to test and validate this workflow before scaling it.

Gov Portal Search Fallback Workflow

Overview

Trigger

Schedule

Workflow Steps

Load target list (domain + topic)

Per target: classify indexed vs auth-gated

Indexed: Scavio dorked search across 4 templates

Dedupe URLs across templates

Scavio /extract for top-N URLs

Auth-gated: Playwright/Stagehand fetch

LLM structured extraction

Append to records DB

Python Implementation

JavaScript Implementation

Platforms Used

Google

Frequently Asked Questions

What does the Gov Portal Search Fallback Workflow workflow do?

How is this workflow triggered?

Which Scavio platforms does this workflow use?

Can I run this workflow on the free tier?