Building a Search Layer for B2B Prospect Discovery
60% qualified from a search layer is solid. The fastest improvement comes from building a stronger negative model to filter false-positive ICP matches.
The search layer for B2B prospect discovery is a programmatic loop that queries Google (or multiple platforms) with your ICP keywords, filters results against negative signals, and outputs a list of qualified company domains. Negative filtering — removing companies that obviously do not fit — matters more than positive keyword matching for reducing noise.
The problem with positive-only search
A thread on r/AI_Agents about "finding the right target companies" described the standard approach: search for "[industry] + [company type] + [location]" and scrape the results. This produces a list that is 70% noise — enterprise companies when you sell to SMBs, companies in the wrong vertical, agencies instead of end customers, and job boards ranking for your keywords.
The fix is layering negative filters on top of positive queries. Instead of only searching for what you want, you also actively exclude what you do not want. This sounds obvious but almost nobody implements it programmatically.
Architecture of the search layer
The system has three stages:
- Stage 1: Broad search — Query Google with ICP keywords across multiple query variations. Collect 100–500 unique domains.
- Stage 2: Negative filter — Remove domains matching exclusion patterns (job boards, directories, news sites, government, education, known enterprise companies above your size threshold).
- Stage 3: Enrichment check — For remaining domains, run a second search to verify company characteristics (team size signals, tech stack mentions, funding indicators).
Stage 1: Building the query matrix
Do not run one query. Run a matrix of queries combining your target terms. If you sell developer tools to mid-market SaaS companies, your matrix might be:
import requests
from itertools import product
verticals = ["saas", "devtools", "api platform", "developer tools"]
signals = ["series a", "hiring engineers", "launched 2025", "growing team"]
locations = ["san francisco", "new york", "remote"]
queries = [
f"{v} company {s} {l}"
for v, s, l in product(verticals, signals, locations)
]
all_domains = set()
for query in queries:
resp = requests.post(
"https://api.scavio.dev/api/v1/search",
headers={"x-api-key": "YOUR_KEY"},
json={
"platform": "google",
"query": query,
"country": "us",
"num_results": 20
}
)
for item in resp.json().get("organic_results", []):
domain = item.get("link", "").split("/")[2] if "://" in item.get("link", "") else ""
if domain:
all_domains.add(domain)
print(f"Collected {len(all_domains)} unique domains from {len(queries)} queries")Stage 2: Negative filtering
This is the step most prospect discovery pipelines skip, and it is the most impactful. Your negative filter list should include:
- Job boards and directories: linkedin.com, glassdoor.com, indeed.com, crunchbase.com, g2.com, capterra.com, yelp.com
- News and content sites: techcrunch.com, medium.com, substack.com, reddit.com, news.ycombinator.com
- Government and education: any .gov, .edu domain
- Known too-large companies: google.com, microsoft.com, amazon.com — customize to your ICP ceiling
- Your own domain and competitors: you are not prospecting yourself
EXCLUDE_DOMAINS = {
"linkedin.com", "glassdoor.com", "indeed.com", "crunchbase.com",
"g2.com", "capterra.com", "yelp.com", "techcrunch.com",
"medium.com", "substack.com", "reddit.com", "news.ycombinator.com",
"github.com", "stackoverflow.com", "wikipedia.org",
"google.com", "microsoft.com", "amazon.com", "apple.com",
}
EXCLUDE_TLDS = {".gov", ".edu", ".mil"}
def passes_negative_filter(domain: str) -> bool:
# Check exact domain matches
base = domain.replace("www.", "")
if base in EXCLUDE_DOMAINS:
return False
# Check TLD exclusions
for tld in EXCLUDE_TLDS:
if domain.endswith(tld):
return False
# Check for directory/list patterns in domain
directory_patterns = ["directory", "listing", "yellowpages", "whitepages"]
if any(p in domain for p in directory_patterns):
return False
return True
filtered = {d for d in all_domains if passes_negative_filter(d)}
print(f"After negative filter: {len(filtered)} domains (removed {len(all_domains) - len(filtered)})")Stage 3: Enrichment verification
For each surviving domain, run a targeted search to verify it matches your ICP. Search for the company name plus signals that indicate fit: team size, funding stage, tech stack, or product category.
def enrich_domain(domain: str) -> dict:
"""Search for company signals to verify ICP fit."""
company_name = domain.replace(".com", "").replace(".io", "").replace("www.", "")
resp = requests.post(
"https://api.scavio.dev/api/v1/search",
headers={"x-api-key": "YOUR_KEY"},
json={
"platform": "google",
"query": f"{company_name} company team size funding",
"country": "us",
"num_results": 5
}
)
snippets = " ".join(
item.get("snippet", "")
for item in resp.json().get("organic_results", [])
).lower()
return {
"domain": domain,
"has_funding_signal": any(w in snippets for w in ["series a", "series b", "raised", "funded"]),
"has_team_signal": any(w in snippets for w in ["employees", "team of", "hiring"]),
"has_saas_signal": any(w in snippets for w in ["saas", "platform", "api", "software"]),
"snippet_preview": snippets[:200]
}
prospects = [enrich_domain(d) for d in list(filtered)[:50]]
qualified = [p for p in prospects if p["has_saas_signal"] and (p["has_funding_signal"] or p["has_team_signal"])]
print(f"Qualified prospects: {len(qualified)} out of {len(prospects)} checked")Credit budget
For a run of 36 query variations (4 verticals x 3 signals x 3 locations) at 20 results each, plus 50 enrichment queries: 86 total API calls. At Scavio's $0.005/credit, that is $0.43 per discovery run. The 500 free credits/month cover about 5 full runs. For weekly prospecting, the $30/month plan for 7,000 credits gives you roughly 80 runs.
Why this beats manual prospecting
A human doing this manually would open Google, search each query, scan results, copy domains into a spreadsheet, and check each one. At 2 minutes per query plus 3 minutes per domain check, the same 86-query run takes about 5 hours. The programmatic version runs in under 3 minutes and produces structured, filterable output.
Bottom line
The search layer for B2B prospect discovery is not about finding more results — it is about removing bad ones faster. Build the negative filter first, make it aggressive, and let the search API handle volume. A $0.43 API run that produces 30 qualified prospects beats a 5-hour manual session that produces the same list with more errors.