How long does this build a job listing aggregator without scraping tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Python 3.8+. A Scavio API key. Basic knowledge of job listing data. A Scavio API key gives you 500 free credits per month.

Can I run this tutorial with the free tier?

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Build Job Listing Aggregator Without Scraping (2026)

Job boards (Indeed, LinkedIn, Greenhouse) actively fight scrapers. But their listings are indexed by Google. Instead of maintaining scrapers that break monthly, search Google for indexed job listings and Reddit for who-is-hiring threads. This tutorial builds an aggregator that survives anti-bot updates.

Prerequisites

Python 3.8+
A Scavio API key
Basic knowledge of job listing data

Walkthrough

Step 1: Search Google for job listings

Use site: operators to find listings on specific job boards indexed by Google.

Python

import requests, os

H = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

def find_jobs(role: str, location: str = '') -> list:
    query = f'{role} {location} site:greenhouse.io OR site:lever.co OR site:jobs.ashbyhq.com'
    resp = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': query}, timeout=10)
    return [{'title': r.get('title',''), 'url': r.get('link',''), 'snippet': r.get('snippet','')}
            for r in resp.json().get('organic', [])]

Step 2: Add Reddit hiring thread search

Find who-is-hiring posts on relevant subreddits.

Python

def find_reddit_jobs(role: str) -> list:
    resp = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'reddit', 'query': f'{role} hiring'}, timeout=10)
    threads = resp.json().get('organic', [])
    return [{'title': t.get('title',''), 'url': t.get('link',''), 'subreddit': t.get('subreddit',''),
             'score': t.get('score', 0)}
            for t in threads if 'hiring' in t.get('title','').lower() or 'job' in t.get('title','').lower()]

Step 3: Deduplicate and rank

Remove duplicate listings and rank by relevance.

Python

def aggregate_jobs(role: str, location: str = '') -> dict:
    google_jobs = find_jobs(role, location)
    reddit_jobs = find_reddit_jobs(role)
    
    seen_urls = set()
    unique_jobs = []
    for job in google_jobs + reddit_jobs:
        url = job.get('url', '')
        if url and url not in seen_urls:
            seen_urls.add(url)
            unique_jobs.append(job)
    
    return {
        'role': role,
        'location': location,
        'total_found': len(unique_jobs),
        'from_google': len(google_jobs),
        'from_reddit': len(reddit_jobs),
        'listings': unique_jobs,
    }

results = aggregate_jobs('senior python developer', 'remote')
print(f"Found {results['total_found']} unique listings")
for job in results['listings'][:5]:
    print(f"  - {job['title']}")

Step 4: Schedule daily updates

Run the aggregator on a schedule and track new listings.

Python

import json
from pathlib import Path
from datetime import date

def daily_job_check(roles: list, location: str = '') -> dict:
    today = date.today().isoformat()
    all_listings = []
    
    for role in roles:
        results = aggregate_jobs(role, location)
        all_listings.extend(results['listings'])
    
    # Load previous listings to find new ones
    history_file = Path('job_history.json')
    seen = set()
    if history_file.exists():
        seen = set(json.loads(history_file.read_text()).get('urls', []))
    
    new_listings = [j for j in all_listings if j.get('url') not in seen]
    
    # Update history
    seen.update(j.get('url','') for j in all_listings)
    history_file.write_text(json.dumps({'urls': list(seen), 'last_run': today}))
    
    return {'date': today, 'new_listings': len(new_listings), 'total_tracked': len(seen), 'new': new_listings}

Python Example

Python

import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY'], 'Content-Type': 'application/json'}

def find_jobs(role, location=''):
    q = f'{role} {location} site:greenhouse.io OR site:lever.co'
    r = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': q}).json()
    return [{'title': x['title'], 'url': x.get('link','')} for x in r.get('organic',[])]

JavaScript Example

JavaScript

async function findJobs(role, location = '') {
  const q = `${role} ${location} site:greenhouse.io OR site:lever.co`;
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'},
    body: JSON.stringify({platform: 'google', query: q})
  });
  return (await r.json()).organic?.map(x => ({title: x.title, url: x.link})) || [];
}

Expected Output

JSON

A job listing aggregator that finds positions from Google-indexed job boards and Reddit hiring threads without maintaining any scrapers.

How to Build a Job Listing Aggregator Without Scraping

Prerequisites

Walkthrough

Step 1: Search Google for job listings

Step 2: Add Reddit hiring thread search

Step 3: Deduplicate and rank

Step 4: Schedule daily updates

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build a job listing aggregator without scraping tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Start Building