AI Job Search Agent with Live Listings

A r/ItaliaCareerAdvice thread showed a Python script that scrapes job listings and filters with an LLM. It works but it is fragile: sites break, patterns change, and the LLM eats the context window on noisy HTML. This post is the durable version of that pattern, using Scavio for the listing discovery layer.

Why DIY Job Scrapers Break

Indeed, LinkedIn, and Glassdoor all fight scrapers aggressively in 2026. A script that works today breaks in two weeks. The usual failure modes: IP blocks, CAPTCHA walls, DOM structure changes, and rate limits. Maintenance eats more time than the agent saves.

The Indirect Pattern

Go through Google SERP with site operators instead of scraping the boards directly. Indeed posts are indexed on Google. LinkedIn job pages are indexed. Glassdoor listings are indexed. A SERP query returns the public part of each listing with title, company, location, and snippet - which is almost all the filtering signal you need.

Python

import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']

def search_jobs(role: str, location: str) -> list[dict]:
    sources = [
        f'site:indeed.com {role} {location}',
        f'site:linkedin.com/jobs {role} {location}',
        f'site:glassdoor.com {role} {location}',
    ]
    results = []
    for query in sources:
        r = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': API_KEY},
            json={'query': query, 'num_results': 20})
        for x in r.json().get('organic_results', []):
            results.append({
                'source': query.split()[0],
                'title': x['title'],
                'url': x['link'],
                'snippet': x.get('snippet', '')
            })
    return results

The LLM Filter Step

Once listings are collected, an LLM classifies each against the user's preferences. Unlike the noisy HTML approach, the LLM now has clean structured input and can focus on matching.

Python

import anthropic
client = anthropic.Anthropic()

def filter_jobs(listings: list[dict], preferences: str) -> list[dict]:
    relevant = []
    for job in listings:
        prompt = f'''Preferences: {preferences}

Job: {job['title']}
Snippet: {job['snippet']}

Does this job match the preferences? Respond YES or NO followed by
a one-sentence reason.'''

        msg = client.messages.create(
            model='claude-haiku-4-5-20251001',
            max_tokens=100,
            messages=[{'role': 'user', 'content': prompt}])

        answer = msg.content[0].text
        if answer.startswith('YES'):
            job['why'] = answer.split('\\n', 1)[-1]
            relevant.append(job)
    return relevant

The Company Research Step

For each matched job, the agent runs a second pass: Reddit mentions, recent news, Glassdoor reviews via SERP. This is the signal the candidate actually wants. "Is this company a dumpster fire?" is the question a cold job listing cannot answer.

Python

def research_company(domain: str) -> dict:
    reddit = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': domain, 'platform': 'reddit'}).json()

    news = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': f'{domain} layoffs OR funding OR CEO',
              'time_range': 'month'}).json()

    reviews = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': f'site:glassdoor.com {domain} reviews'}).json()

    return {
        'reddit_threads': reddit.get('posts', [])[:5],
        'recent_news': news.get('organic_results', [])[:5],
        'review_snippets': reviews.get('organic_results', [])[:3]
    }

The Daily Loop

Schedule the full flow to run daily at 7 AM. New listings land in the candidate's inbox with the company research attached. Two weeks of running this turns a passive job search into an informed pipeline.

Why This Beats Hand-Crafted Scrapers

Three wins. One, resilience: Google SERP does not break every week. Two, coverage: Scavio's multi-source SERP + Reddit + news in one API beats a stack of separate scrapers. Three, maintenance: a single schedule runs the entire pipeline, no per-site fixes required.

Where the Pattern Fails

Two places. One, jobs that do not get indexed on Google. These are typically small-company roles on ATS platforms. Consider the impact per role: if the candidate wants FAANG, Google is fine. If the candidate wants niche early-stage, a separate Wellfound/Hacker News scraper helps. Two, same-day postings: Google indexes with a lag, and new listings sometimes take 24 hours to appear.

Operational Cost

Daily run of 30 queries across 3 sources = 90 SERP queries + 20 company enrichments = roughly 200 credits per day. At $30/mo for 7,000 credits, that is well within the plan with room for weekend deep-dives. Haiku classification at ~5 tokens per job is negligible. Total pipeline cost: under $35/mo all-in.

The use case page is at ai-career-agent-data-api and the solution architecture at ai-job-search-agent.

The Indirect Pattern

Python

import os, requests
API_KEY = os.environ['SCAVIO_API_KEY']

def search_jobs(role: str, location: str) -> list[dict]:
    sources = [
        f'site:indeed.com {role} {location}',
        f'site:linkedin.com/jobs {role} {location}',
        f'site:glassdoor.com {role} {location}',
    ]
    results = []
    for query in sources:
        r = requests.post('https://api.scavio.dev/api/v1/search',
            headers={'x-api-key': API_KEY},
            json={'query': query, 'num_results': 20})
        for x in r.json().get('organic_results', []):
            results.append({
                'source': query.split()[0],
                'title': x['title'],
                'url': x['link'],
                'snippet': x.get('snippet', '')
            })
    return results

The LLM Filter Step

Once listings are collected, an LLM classifies each against the user's preferences. Unlike the noisy HTML approach, the LLM now has clean structured input and can focus on matching.

Python

import anthropic
client = anthropic.Anthropic()

def filter_jobs(listings: list[dict], preferences: str) -> list[dict]:
    relevant = []
    for job in listings:
        prompt = f'''Preferences: {preferences}

Job: {job['title']}
Snippet: {job['snippet']}

Does this job match the preferences? Respond YES or NO followed by
a one-sentence reason.'''

        msg = client.messages.create(
            model='claude-haiku-4-5-20251001',
            max_tokens=100,
            messages=[{'role': 'user', 'content': prompt}])

        answer = msg.content[0].text
        if answer.startswith('YES'):
            job['why'] = answer.split('\\n', 1)[-1]
            relevant.append(job)
    return relevant

The Company Research Step

Python

def research_company(domain: str) -> dict:
    reddit = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': domain, 'platform': 'reddit'}).json()

    news = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': f'{domain} layoffs OR funding OR CEO',
              'time_range': 'month'}).json()

    reviews = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'query': f'site:glassdoor.com {domain} reviews'}).json()

    return {
        'reddit_threads': reddit.get('posts', [])[:5],
        'recent_news': news.get('organic_results', [])[:5],
        'review_snippets': reviews.get('organic_results', [])[:3]
    }

Where the Pattern Fails

Operational Cost

AI Job Search Agent with Live Listings

Why DIY Job Scrapers Break

The Indirect Pattern

The LLM Filter Step

The Company Research Step

The Daily Loop

Why This Beats Hand-Crafted Scrapers

Where the Pattern Fails

Operational Cost

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters

AI Job Search Agent with Live Listings

Why DIY Job Scrapers Break

The Indirect Pattern

The LLM Filter Step

The Company Research Step

The Daily Loop

Why This Beats Hand-Crafted Scrapers

Where the Pattern Fails

Operational Cost

Continue reading

AEO Tracking for D2C Ecommerce Brands in 2026

Agent Discovery vs Extraction: Why Cost Split Matters