b2blead-gensearch-layer

Building a Search Layer for B2B Prospect Discovery

60% qualified from a search layer is solid. The fastest improvement comes from building a stronger negative model to filter false-positive ICP matches.

May 7, 2026

7 min read

The search layer for B2B prospect discovery is a programmatic loop that queries Google (or multiple platforms) with your ICP keywords, filters results against negative signals, and outputs a list of qualified company domains. Negative filtering — removing companies that obviously do not fit — matters more than positive keyword matching for reducing noise.

The problem with positive-only search

A thread on r/AI_Agents about "finding the right target companies" described the standard approach: search for "[industry] + [company type] + [location]" and scrape the results. This produces a list that is 70% noise — enterprise companies when you sell to SMBs, companies in the wrong vertical, agencies instead of end customers, and job boards ranking for your keywords.

The fix is layering negative filters on top of positive queries. Instead of only searching for what you want, you also actively exclude what you do not want. This sounds obvious but almost nobody implements it programmatically.

Architecture of the search layer

The system has three stages:

Stage 1: Broad search — Query Google with ICP keywords across multiple query variations. Collect 100–500 unique domains.
Stage 2: Negative filter — Remove domains matching exclusion patterns (job boards, directories, news sites, government, education, known enterprise companies above your size threshold).
Stage 3: Enrichment check — For remaining domains, run a second search to verify company characteristics (team size signals, tech stack mentions, funding indicators).

Stage 1: Building the query matrix

Do not run one query. Run a matrix of queries combining your target terms. If you sell developer tools to mid-market SaaS companies, your matrix might be:

Python

import requests
from itertools import product

verticals = ["saas", "devtools", "api platform", "developer tools"]
signals = ["series a", "hiring engineers", "launched 2025", "growing team"]
locations = ["san francisco", "new york", "remote"]

queries = [
    f"{v} company {s} {l}"
    for v, s, l in product(verticals, signals, locations)
]

all_domains = set()

for query in queries:
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": "YOUR_KEY"},
        json={
            "platform": "google",
            "query": query,
            "country": "us",
            "num_results": 20
        }
    )
    for item in resp.json().get("organic_results", []):
        domain = item.get("link", "").split("/")[2] if "://" in item.get("link", "") else ""
        if domain:
            all_domains.add(domain)

print(f"Collected {len(all_domains)} unique domains from {len(queries)} queries")

Stage 2: Negative filtering

This is the step most prospect discovery pipelines skip, and it is the most impactful. Your negative filter list should include:

Job boards and directories: linkedin.com, glassdoor.com, indeed.com, crunchbase.com, g2.com, capterra.com, yelp.com
News and content sites: techcrunch.com, medium.com, substack.com, reddit.com, news.ycombinator.com
Government and education: any .gov, .edu domain
Known too-large companies: google.com, microsoft.com, amazon.com — customize to your ICP ceiling
Your own domain and competitors: you are not prospecting yourself

Python

EXCLUDE_DOMAINS = {
    "linkedin.com", "glassdoor.com", "indeed.com", "crunchbase.com",
    "g2.com", "capterra.com", "yelp.com", "techcrunch.com",
    "medium.com", "substack.com", "reddit.com", "news.ycombinator.com",
    "github.com", "stackoverflow.com", "wikipedia.org",
    "google.com", "microsoft.com", "amazon.com", "apple.com",
}

EXCLUDE_TLDS = {".gov", ".edu", ".mil"}

def passes_negative_filter(domain: str) -> bool:
    # Check exact domain matches
    base = domain.replace("www.", "")
    if base in EXCLUDE_DOMAINS:
        return False

    # Check TLD exclusions
    for tld in EXCLUDE_TLDS:
        if domain.endswith(tld):
            return False

    # Check for directory/list patterns in domain
    directory_patterns = ["directory", "listing", "yellowpages", "whitepages"]
    if any(p in domain for p in directory_patterns):
        return False

    return True

filtered = {d for d in all_domains if passes_negative_filter(d)}
print(f"After negative filter: {len(filtered)} domains (removed {len(all_domains) - len(filtered)})")

Stage 3: Enrichment verification

For each surviving domain, run a targeted search to verify it matches your ICP. Search for the company name plus signals that indicate fit: team size, funding stage, tech stack, or product category.

Python

def enrich_domain(domain: str) -> dict:
    """Search for company signals to verify ICP fit."""
    company_name = domain.replace(".com", "").replace(".io", "").replace("www.", "")

    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": "YOUR_KEY"},
        json={
            "platform": "google",
            "query": f"{company_name} company team size funding",
            "country": "us",
            "num_results": 5
        }
    )

    snippets = " ".join(
        item.get("snippet", "")
        for item in resp.json().get("organic_results", [])
    ).lower()

    return {
        "domain": domain,
        "has_funding_signal": any(w in snippets for w in ["series a", "series b", "raised", "funded"]),
        "has_team_signal": any(w in snippets for w in ["employees", "team of", "hiring"]),
        "has_saas_signal": any(w in snippets for w in ["saas", "platform", "api", "software"]),
        "snippet_preview": snippets[:200]
    }

prospects = [enrich_domain(d) for d in list(filtered)[:50]]
qualified = [p for p in prospects if p["has_saas_signal"] and (p["has_funding_signal"] or p["has_team_signal"])]
print(f"Qualified prospects: {len(qualified)} out of {len(prospects)} checked")

Credit budget

For a run of 36 query variations (4 verticals x 3 signals x 3 locations) at 20 results each, plus 50 enrichment queries: 86 total API calls. At Scavio's $0.005/credit, that is $0.43 per discovery run. The 500 free credits/month cover about 5 full runs. For weekly prospecting, the $30/month plan for 7,000 credits gives you roughly 80 runs.

Why this beats manual prospecting

A human doing this manually would open Google, search each query, scan results, copy domains into a spreadsheet, and check each one. At 2 minutes per query plus 3 minutes per domain check, the same 86-query run takes about 5 hours. The programmatic version runs in under 3 minutes and produces structured, filterable output.

Bottom line

The search layer for B2B prospect discovery is not about finding more results — it is about removing bad ones faster. Build the negative filter first, make it aggressive, and let the search API handle volume. A $0.43 API run that produces 30 qualified prospects beats a 5-hour manual session that produces the same list with more errors.

Building a Search Layer for B2B Prospect Discovery

The problem with positive-only search

Architecture of the search layer

Stage 1: Building the query matrix

Stage 2: Negative filtering

Stage 3: Enrichment verification

Credit budget

Why this beats manual prospecting

Bottom line

Continue reading

Connect Scavio to Any AI Assistant with MCP

Build a Cross-Platform Product Research Agent with LangGraph