Overview
Daily 6am cron pulls latest job postings from a target-employer list, parses with LLM, dedupes against history, ranks for end-user filters.
Trigger
Daily cron 6am
Schedule
Daily 6am
Workflow Steps
Iterate target-employer list
From a Postgres table.
Per employer: Scavio dorked discovery
site:{d}/careers + site:jobs.lever.co/{d} + site:boards.greenhouse.io/{d}.
Per career-page URL: Scavio /extract for markdown
Clean markdown for LLM input.
LLM structured parse
Returns JSON list of {title, location, salary_min, salary_max, summary, apply_url}.
Dedupe against history (employer, title, location)
Same role on multiple aggregators = one record.
Rank by user-skill match + salary + recency
Per user filter, return top-N.
Push to user notifier (email / Slack / push)
Per user's saved filters.
Python Implementation
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
def daily_employer_pull(domain):
dorks = [f'site:{domain}/careers', f'site:jobs.lever.co/{domain.split(".")[0]}']
urls = []
for q in dorks:
r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
urls.extend(o['link'] for o in r.get('organic_results', [])[:10])
return list(set(urls))JavaScript Implementation
// Same in TS.Platforms Used
Web search with knowledge graph, PAA, and AI overviews