How long does this build a hiringcafe-style job aggregator tutorial take?

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

What do I need before starting?

Scavio API key. An LLM API key. A list of target employers (or a way to discover them). A Scavio API key gives you 500 free credits per month.

Can I run this tutorial with the free tier?

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

What frameworks does this work with?

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Build a HiringCafe-Style Aggregator (2026)

An r/hiringcafe thread shared the AI Job Search Agent pattern: pull from real employer career pages, AI-summarize each role, surface salary upfront. This walks a HiringCafe-style aggregator with Scavio + LLM.

Prerequisites

Scavio API key
An LLM API key
A list of target employers (or a way to discover them)

Walkthrough

Step 1: Discover career-page URLs via dorked search

site:company.com careers + jobs.lever.co + boards.greenhouse.io patterns.

Python

import requests, os
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
DORKS = [
    'site:{domain}/careers',
    'site:{domain}/jobs',
    'site:jobs.lever.co/{domain}',
    'site:boards.greenhouse.io/{domain}',
]
def find_career_urls(domain):
    out = []
    for d in DORKS:
        q = d.format(domain=domain.replace('.com',''))
        r = requests.post('https://api.scavio.dev/api/v1/search', headers=H, json={'query': q}).json()
        out.extend(o['link'] for o in r.get('organic_results', [])[:5])
    return list(set(out))

Step 2: Extract listing pages as markdown

Scavio /extract turns the careers page into clean markdown.

Python

def extract(url):
    return requests.post('https://api.scavio.dev/api/v1/extract',
        headers=H, json={'url': url, 'format': 'markdown'}).json().get('markdown', '')

Step 3: Parse roles with an LLM

Structured extraction: title, location, salary if shown, summary.

Python

PROMPT = '''Extract job postings from this careers page. For each, return JSON with:
- title, team, location, remote (bool), salary_min, salary_max (null if not shown), apply_url, summary (2 sentences).
Return a JSON list.
Page:
{md}'''
result = llm.complete(PROMPT.format(md=markdown))

Step 4: Dedupe by (employer, title, location)

Same role on multiple aggregators = one record.

Python

def dedupe(roles):
    seen = set(); out = []
    for r in roles:
        key = (r['employer'], r['title'], r['location'])
        if key not in seen:
            seen.add(key); out.append(r)
    return out

Step 5: Rank by salary + recency + match score

User-input filters drive the surface.

Python

def rank(roles, user_skills):
    for r in roles:
        match = sum(1 for s in user_skills if s.lower() in (r['summary'] + r['title']).lower())
        r['score'] = (r.get('salary_max') or 0) * 0.3 + match * 100
    return sorted(roles, key=lambda x: -x['score'])

Python Example

Python

# Per-employer cost: ~3 dorked searches + 1 extract + 1 LLM call = ~$0.02-0.05

JavaScript Example

JavaScript

// Same flow in TS.

Expected Output

JSON

JSON list of jobs with title, salary, summary, apply_url. Dedupes across aggregators. Ranks by user skills + salary. The hard part remains the relevance ranking; the data layer is the easy part.

How to Build a HiringCafe-Style Job Aggregator

Prerequisites

Walkthrough

Step 1: Discover career-page URLs via dorked search

Step 2: Extract listing pages as markdown

Step 3: Parse roles with an LLM

Step 4: Dedupe by (employer, title, location)

Step 5: Rank by salary + recency + match score

Python Example

JavaScript Example

Expected Output

Related Tutorials

Frequently Asked Questions

How long does this build a hiringcafe-style job aggregator tutorial take?

What do I need before starting?

Can I run this tutorial with the free tier?

What frameworks does this work with?

Start Building