Build a Job Search Agent That Actually Works
Most job search agents just wrap LinkedIn. Build one that aggregates Google Jobs, Reddit hiring threads, and career pages via search API.
Most "AI job search agents" in 2026 are just LinkedIn wrappers with a ChatGPT layer on top. They search one platform, apply generic filters, and call it intelligent. A job search agent that actually works needs to aggregate multiple sources: Google Jobs for aggregated listings, Reddit hiring threads for unlisted positions, and company career pages for direct applications. Here is how to build one.
Why single-platform agents fail
LinkedIn has roughly 40% of job listings. The other 60% are on company career pages, niche job boards, and community forums. An agent that only searches LinkedIn misses more than half the market. Worse, LinkedIn actively blocks automated access, so these agents break constantly. The fix is to search where the data is open and structured.
The three-source approach
- Google Jobs -- Aggregates listings from Indeed, Glassdoor, LinkedIn, ZipRecruiter, and company career pages into one searchable index.
- Reddit -- r/forhire, r/remotework, r/cscareerquestions monthly hiring threads, and company-specific subreddits.
- Google Web -- Direct career page searches like "site:careers.stripe.com engineer" find positions not yet on job boards.
Be honest: no API replaces networking
Before we get to the code: the highest-signal job leads still come from personal connections, referrals, and direct outreach. An agent can surface opportunities faster, but it cannot replace a warm introduction from a former colleague. Use this tool to cast a wide net, not as your only strategy.
The core search pipeline
import requests, os, json
from datetime import date
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
URL = 'https://api.scavio.dev/api/v1/search'
def search_jobs(role: str, location: str) -> dict:
"""Search multiple sources for job listings."""
results = {}
# Google Jobs: aggregated listings
google_q = f"{role} jobs {location} 2026"
resp = requests.post(URL, headers=H,
json={'platform': 'google_jobs', 'query': google_q}, timeout=15)
results['google_jobs'] = resp.json().get('jobs_results', [])[:10]
# Reddit: hiring threads and unlisted positions
reddit_q = f"{role} hiring {location}"
resp = requests.post(URL, headers=H,
json={'platform': 'reddit', 'query': reddit_q}, timeout=15)
results['reddit'] = resp.json().get('organic_results', [])[:10]
# Direct career page search
career_q = f"site:careers.* OR site:jobs.* {role} {location}"
resp = requests.post(URL, headers=H,
json={'platform': 'google', 'query': career_q}, timeout=15)
results['career_pages'] = resp.json().get('organic_results', [])[:10]
return results
jobs = search_jobs('senior backend engineer', 'remote')
print(f"Google Jobs: {len(jobs['google_jobs'])} listings")
print(f"Reddit: {len(jobs['reddit'])} threads")
print(f"Career pages: {len(jobs['career_pages'])} direct listings")Deduplication and scoring
The same job appears on multiple platforms. Deduplicate by company name and role title, then score by recency, source quality, and match relevance.
from difflib import SequenceMatcher
def deduplicate_jobs(all_jobs: list[dict]) -> list[dict]:
"""Remove duplicate listings by fuzzy matching title + company."""
seen = []
unique = []
for job in all_jobs:
key = f"{job.get('company', '')} {job.get('title', '')}".lower()
is_dup = False
for s in seen:
if SequenceMatcher(None, key, s).ratio() > 0.8:
is_dup = True
break
if not is_dup:
seen.append(key)
unique.append(job)
return unique
def score_job(job: dict, target_role: str) -> float:
"""Score a job listing by relevance to target role."""
title = job.get('title', '').lower()
role_lower = target_role.lower()
title_match = SequenceMatcher(None, title, role_lower).ratio()
recency_bonus = 0.2 if '2026' in job.get('date', '') else 0.0
source_bonus = 0.1 if job.get('source') == 'career_page' else 0.0
return title_match + recency_bonus + source_bonusDaily monitoring
Run this pipeline daily via cron or a simple scheduler. Track new listings that appear since the last run. Send yourself a digest with only new, high-scoring matches.
import requests, os, json, hashlib
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
URL = 'https://api.scavio.dev/api/v1/search'
SEEN_FILE = 'seen_jobs.json'
def load_seen() -> set:
try:
with open(SEEN_FILE) as f:
return set(json.load(f))
except FileNotFoundError:
return set()
def job_hash(job: dict) -> str:
raw = f"{job.get('title','')}{job.get('company','')}{job.get('link','')}"
return hashlib.md5(raw.encode()).hexdigest()
def find_new_jobs(role: str, location: str) -> list[dict]:
seen = load_seen()
resp = requests.post(URL, headers=H,
json={'platform': 'google_jobs', 'query': f'{role} {location}'})
jobs = resp.json().get('jobs_results', [])
new_jobs = [j for j in jobs if job_hash(j) not in seen]
seen.update(job_hash(j) for j in jobs)
with open(SEEN_FILE, 'w') as f:
json.dump(list(seen), f)
return new_jobsWhat this costs
Three searches per run (Google Jobs, Reddit, Google Web) times 30 days equals 90 API calls per month. Well within the 500 free credits on Scavio. Even searching multiple roles and locations daily keeps you under the free tier. If you need higher volume, $30/mo for 7,000 credits covers aggressive multi-role monitoring.
- Define your target roles and locations
- Set up the three-source search pipeline
- Run daily, deduplicate, score, and surface new matches
- Apply directly through career pages when possible
- Use the agent for discovery, use your network for referrals