An r/hiringcafe thread shared the AI Job Search Agent pattern: pull from real employer career pages, AI-summarize each role, surface salary upfront. This walks a HiringCafe-style aggregator with Scavio + LLM.
Prerequisites
- Scavio API key
- An LLM API key
- A list of target employers (or a way to discover them)
Walkthrough
Step 1: Discover career-page URLs via dorked search
site:company.com careers + jobs.lever.co + boards.greenhouse.io patterns.
import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
DORKS = [
'site:{domain}/careers',
'site:{domain}/jobs',
'site:jobs.lever.co/{domain}',
'site:boards.greenhouse.io/{domain}',
]
def find_career_urls(domain):
out = []
for d in DORKS:
q = d.format(domain=domain.replace('.com',''))
r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
out.extend(o['link'] for o in r.get('organic_results', [])[:5])
return list(set(out))Step 2: Extract listing pages as markdown
Scavio /extract turns the careers page into clean markdown.
def extract(url):
return requests.post('https://api.scavio.dev/api/v1/extract',
headers=H, json={'url': url, 'format': 'markdown'}).json().get('markdown', '')Step 3: Parse roles with an LLM
Structured extraction: title, location, salary if shown, summary.
PROMPT = '''Extract job postings from this careers page. For each, return JSON with:
- title, team, location, remote (bool), salary_min, salary_max (null if not shown), apply_url, summary (2 sentences).
Return a JSON list.
Page:
{md}'''
result = llm.complete(PROMPT.format(md=markdown))Step 4: Dedupe by (employer, title, location)
Same role on multiple aggregators = one record.
def dedupe(roles):
seen = set(); out = []
for r in roles:
key = (r['employer'], r['title'], r['location'])
if key not in seen:
seen.add(key); out.append(r)
return outStep 5: Rank by salary + recency + match score
User-input filters drive the surface.
def rank(roles, user_skills):
for r in roles:
match = sum(1 for s in user_skills if s.lower() in (r['summary'] + r['title']).lower())
r['score'] = (r.get('salary_max') or 0) * 0.3 + match * 100
return sorted(roles, key=lambda x: -x['score'])Python Example
# Per-employer cost: ~3 dorked searches + 1 extract + 1 LLM call = ~$0.02-0.05JavaScript Example
// Same flow in TS.Expected Output
JSON list of jobs with title, salary, summary, apply_url. Dedupes across aggregators. Ranks by user skills + salary. The hard part remains the relevance ranking; the data layer is the easy part.