jobsagentsautomation

Build a Job Search Agent That Actually Works

Most job search agents just wrap LinkedIn. Build one that aggregates Google Jobs, Reddit hiring threads, and career pages via search API.

6 min read

Most "AI job search agents" in 2026 are just LinkedIn wrappers with a ChatGPT layer on top. They search one platform, apply generic filters, and call it intelligent. A job search agent that actually works needs to aggregate multiple sources: Google Jobs for aggregated listings, Reddit hiring threads for unlisted positions, and company career pages for direct applications. Here is how to build one.

Why single-platform agents fail

LinkedIn has roughly 40% of job listings. The other 60% are on company career pages, niche job boards, and community forums. An agent that only searches LinkedIn misses more than half the market. Worse, LinkedIn actively blocks automated access, so these agents break constantly. The fix is to search where the data is open and structured.

The three-source approach

  • Google Jobs -- Aggregates listings from Indeed, Glassdoor, LinkedIn, ZipRecruiter, and company career pages into one searchable index.
  • Reddit -- r/forhire, r/remotework, r/cscareerquestions monthly hiring threads, and company-specific subreddits.
  • Google Web -- Direct career page searches like "site:careers.stripe.com engineer" find positions not yet on job boards.

Be honest: no API replaces networking

Before we get to the code: the highest-signal job leads still come from personal connections, referrals, and direct outreach. An agent can surface opportunities faster, but it cannot replace a warm introduction from a former colleague. Use this tool to cast a wide net, not as your only strategy.

The core search pipeline

Python
import requests, os, json
from datetime import date

H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
URL = 'https://api.scavio.dev/api/v1/search'

def search_jobs(role: str, location: str) -> dict:
    """Search multiple sources for job listings."""
    results = {}

    # Google Jobs: aggregated listings
    google_q = f"{role} jobs {location} 2026"
    resp = requests.post(URL, headers=H,
        json={'platform': 'google_jobs', 'query': google_q}, timeout=15)
    results['google_jobs'] = resp.json().get('jobs_results', [])[:10]

    # Reddit: hiring threads and unlisted positions
    reddit_q = f"{role} hiring {location}"
    resp = requests.post(URL, headers=H,
        json={'platform': 'reddit', 'query': reddit_q}, timeout=15)
    results['reddit'] = resp.json().get('organic_results', [])[:10]

    # Direct career page search
    career_q = f"site:careers.* OR site:jobs.* {role} {location}"
    resp = requests.post(URL, headers=H,
        json={'platform': 'google', 'query': career_q}, timeout=15)
    results['career_pages'] = resp.json().get('organic_results', [])[:10]

    return results

jobs = search_jobs('senior backend engineer', 'remote')
print(f"Google Jobs: {len(jobs['google_jobs'])} listings")
print(f"Reddit: {len(jobs['reddit'])} threads")
print(f"Career pages: {len(jobs['career_pages'])} direct listings")

Deduplication and scoring

The same job appears on multiple platforms. Deduplicate by company name and role title, then score by recency, source quality, and match relevance.

Python
from difflib import SequenceMatcher

def deduplicate_jobs(all_jobs: list[dict]) -> list[dict]:
    """Remove duplicate listings by fuzzy matching title + company."""
    seen = []
    unique = []
    for job in all_jobs:
        key = f"{job.get('company', '')} {job.get('title', '')}".lower()
        is_dup = False
        for s in seen:
            if SequenceMatcher(None, key, s).ratio() > 0.8:
                is_dup = True
                break
        if not is_dup:
            seen.append(key)
            unique.append(job)
    return unique

def score_job(job: dict, target_role: str) -> float:
    """Score a job listing by relevance to target role."""
    title = job.get('title', '').lower()
    role_lower = target_role.lower()
    title_match = SequenceMatcher(None, title, role_lower).ratio()
    recency_bonus = 0.2 if '2026' in job.get('date', '') else 0.0
    source_bonus = 0.1 if job.get('source') == 'career_page' else 0.0
    return title_match + recency_bonus + source_bonus

Daily monitoring

Run this pipeline daily via cron or a simple scheduler. Track new listings that appear since the last run. Send yourself a digest with only new, high-scoring matches.

Python
import requests, os, json, hashlib

H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
URL = 'https://api.scavio.dev/api/v1/search'
SEEN_FILE = 'seen_jobs.json'

def load_seen() -> set:
    try:
        with open(SEEN_FILE) as f:
            return set(json.load(f))
    except FileNotFoundError:
        return set()

def job_hash(job: dict) -> str:
    raw = f"{job.get('title','')}{job.get('company','')}{job.get('link','')}"
    return hashlib.md5(raw.encode()).hexdigest()

def find_new_jobs(role: str, location: str) -> list[dict]:
    seen = load_seen()
    resp = requests.post(URL, headers=H,
        json={'platform': 'google_jobs', 'query': f'{role} {location}'})
    jobs = resp.json().get('jobs_results', [])
    new_jobs = [j for j in jobs if job_hash(j) not in seen]
    seen.update(job_hash(j) for j in jobs)
    with open(SEEN_FILE, 'w') as f:
        json.dump(list(seen), f)
    return new_jobs

What this costs

Three searches per run (Google Jobs, Reddit, Google Web) times 30 days equals 90 API calls per month. Well within the 500 free credits on Scavio. Even searching multiple roles and locations daily keeps you under the free tier. If you need higher volume, $30/mo for 7,000 credits covers aggressive multi-role monitoring.

  1. Define your target roles and locations
  2. Set up the three-source search pipeline
  3. Run daily, deduplicate, score, and surface new matches
  4. Apply directly through career pages when possible
  5. Use the agent for discovery, use your network for referrals