agentsrecruitmentarchitecture

People Search Agent Domain Harness

General agent frameworks fail for people/recruitment search. Domain-specific harness handles dedup, hard vs soft requirements, and clarifying questions.

9 min

General-purpose agent frameworks fail at people/recruitment search because they do not understand deduplication across sources, the difference between hard requirements (must have Python experience) and soft preferences (ideally in the Bay Area), or when to ask clarifying questions instead of returning bad results. A domain-specific harness solves these problems with search-aware architecture patterns.

Why general frameworks fail here

  • No dedup logic: the same person appears on LinkedIn, GitHub, and personal sites. General agents treat each as a separate result
  • No requirement parsing: "5 years of Python, ideally with ML experience" has a hard requirement and a soft preference. Agents treat both as equally important
  • No clarification loop: when a search returns 500 results, the right move is to ask "do you need them in a specific city?" not to dump all 500
  • No source prioritization: LinkedIn profiles are more reliable for job titles than personal blogs

Domain harness architecture

Python
import requests, os
from dataclasses import dataclass

H = {"x-api-key": os.environ["SCAVIO_API_KEY"]}

@dataclass
class SearchRequirement:
    field: str          # "skill", "location", "experience"
    value: str          # "Python", "San Francisco", "5 years"
    hard: bool = True   # hard requirement vs soft preference

def parse_requirements(query: str) -> list:
    """Parse search query into hard and soft requirements."""
    # In production, use an LLM to parse natural language
    # This is a simplified pattern
    requirements = []
    if "must" in query or "required" in query:
        requirements.append(SearchRequirement("skill", "Python", hard=True))
    if "ideally" in query or "prefer" in query:
        requirements.append(SearchRequirement("location", "Bay Area", hard=False))
    return requirements

def multi_source_people_search(name: str):
    """Search for a person across multiple platforms."""
    sources = {
        "google": f"{name} site:linkedin.com",
        "github": f"{name} site:github.com",
        "general": f"{name} software engineer",
    }
    results = {}
    for source, query in sources.items():
        resp = requests.post("https://api.scavio.dev/api/v1/search",
            headers=H, json={"query": query, "platform": "google"})
        results[source] = resp.json().get("organic_results", [])[:3]
    return results

Deduplication across sources

Python
from urllib.parse import urlparse

def deduplicate_results(multi_source_results: dict) -> list:
    """Merge results from multiple sources into unique person profiles."""
    seen_domains = {}
    profiles = []

    for source, results in multi_source_results.items():
        for r in results:
            url = r.get("link", "")
            domain = urlparse(url).netloc

            # Group by person: linkedin URLs are unique per person
            if "linkedin.com/in/" in url:
                username = url.split("/in/")[1].rstrip("/")
                if username not in seen_domains:
                    seen_domains[username] = {
                        "name": r.get("title", "").split(" - ")[0],
                        "linkedin": url,
                        "sources": [source],
                    }
                else:
                    seen_domains[username]["sources"].append(source)
            # Add non-LinkedIn results as supplementary data
            elif domain not in seen_domains:
                profiles.append({
                    "title": r.get("title"),
                    "url": url,
                    "source": source,
                })

    return list(seen_domains.values()) + profiles

When to ask clarifying questions

The harness should trigger a clarification request when: search returns more than 50 results (query too broad), no results match hard requirements (requirements may be too strict), or results span multiple unrelated domains (ambiguous query). General agent frameworks return whatever they find. A domain harness knows when results are not useful and asks for refinement.

Cost structure

  • Per-person search: 3 queries (LinkedIn + GitHub + general) = $0.015
  • Batch of 20 candidates: 60 queries = $0.30
  • Daily sourcing pipeline (100 candidates): $1.50/day = $45/mo
  • Compare: LinkedIn Recruiter Lite starts at $170/mo with limited InMail credits