Tutorial

How to Add Negative Filters to a B2B Search Pipeline

Filter out irrelevant leads from your B2B search pipeline using negative keyword matching and domain exclusion. Python tutorial with Scavio API.

Add negative filters to a B2B search pipeline by defining exclusion rules that remove irrelevant results before they reach your CRM or outreach tool. Without negative filtering, B2B search pipelines produce 30-50% noise: job boards, news aggregators, directories, and competitors clog your lead list. This tutorial builds a post-processing layer that filters search results by domain blocklist, keyword exclusion, and content signals, ensuring only qualified leads pass through.

Prerequisites

  • Python 3.8+ installed
  • requests library installed
  • A Scavio API key from scavio.dev
  • An existing B2B search pipeline or lead list

Walkthrough

Step 1: Define your negative filter rules

Set up blocklists for domains, keywords, and URL patterns that indicate non-lead results.

Python
import os, requests

API_KEY = os.environ['SCAVIO_API_KEY']

BLOCKED_DOMAINS = {
    'linkedin.com', 'indeed.com', 'glassdoor.com', 'crunchbase.com',
    'wikipedia.org', 'reddit.com', 'youtube.com', 'medium.com',
    'g2.com', 'capterra.com',
}

NEGATIVE_KEYWORDS = [
    'job posting', 'careers', 'hiring', 'salary',
    'review site', 'comparison chart', 'free template',
]

BLOCKED_URL_PATTERNS = ['/careers', '/jobs', '/hiring', '/press-release']

Step 2: Build the filter functions

Create filter functions that check each result against domain, keyword, and URL pattern rules.

Python
from urllib.parse import urlparse

def is_blocked_domain(url: str) -> bool:
    domain = urlparse(url).netloc.replace('www.', '')
    return any(blocked in domain for blocked in BLOCKED_DOMAINS)

def has_negative_keyword(title: str, snippet: str) -> bool:
    text = f'{title} {snippet}'.lower()
    return any(neg in text for neg in NEGATIVE_KEYWORDS)

def has_blocked_url_pattern(url: str) -> bool:
    path = urlparse(url).path.lower()
    return any(pattern in path for pattern in BLOCKED_URL_PATTERNS)

def is_valid_lead(result: dict) -> bool:
    url = result.get('link', '')
    title = result.get('title', '')
    snippet = result.get('snippet', '')
    if is_blocked_domain(url): return False
    if has_negative_keyword(title, snippet): return False
    if has_blocked_url_pattern(url): return False
    return True

Step 3: Apply filters to search results

Search for B2B leads and apply all negative filters, reporting how many results were filtered out.

Python
def filtered_search(query: str) -> dict:
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'platform': 'google', 'query': query}, timeout=15)
    results = resp.json().get('organic_results', [])
    valid = [r for r in results if is_valid_lead(r)]
    filtered_count = len(results) - len(valid)
    return {
        'query': query,
        'total': len(results),
        'valid': len(valid),
        'filtered': filtered_count,
        'leads': [{'title': r['title'], 'url': r['link'], 'snippet': r.get('snippet', '')} for r in valid],
    }

result = filtered_search('martech companies series a 2026')
print(f"{result['valid']}/{result['total']} results passed filters")
for lead in result['leads'][:5]:
    print(f"  {lead['title']}")

Step 4: Log filtered results for rule tuning

Save filtered-out results separately so you can review them and adjust your rules over time.

Python
def search_with_logging(query: str, log_file: str = 'filter_log.jsonl') -> dict:
    import json
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY},
        json={'platform': 'google', 'query': query}, timeout=15)
    results = resp.json().get('organic_results', [])
    valid, rejected = [], []
    for r in results:
        if is_valid_lead(r):
            valid.append(r)
        else:
            reason = 'domain' if is_blocked_domain(r.get('link', '')) else 'keyword' if has_negative_keyword(r.get('title', ''), r.get('snippet', '')) else 'url_pattern'
            rejected.append({'title': r.get('title', ''), 'url': r.get('link', ''), 'reason': reason})
    with open(log_file, 'a') as f:
        f.write(json.dumps({'query': query, 'rejected': rejected}) + '\n')
    print(f'{query}: {len(valid)} valid, {len(rejected)} rejected')
    return {'leads': valid, 'rejected': rejected}

search_with_logging('fintech startups hiring engineers 2026')

Python Example

Python
import requests, os
from urllib.parse import urlparse
H = {'x-api-key': os.environ['SCAVIO_API_KEY']}
BLOCKED = {'linkedin.com', 'indeed.com', 'glassdoor.com', 'wikipedia.org'}

def filtered_search(query):
    data = requests.post('https://api.scavio.dev/api/v1/search', headers=H,
        json={'platform': 'google', 'query': query}).json()
    results = data.get('organic_results', [])
    valid = [r for r in results if not any(b in urlparse(r.get('link', '')).netloc for b in BLOCKED)]
    print(f'{len(valid)}/{len(results)} passed filters')
    return valid

filtered_search('martech companies series a 2026')

JavaScript Example

JavaScript
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
const BLOCKED = ['linkedin.com', 'indeed.com', 'glassdoor.com'];
async function filteredSearch(query) {
  const r = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST', headers: H, body: JSON.stringify({platform: 'google', query})
  });
  const results = (await r.json()).organic_results || [];
  const valid = results.filter(r => !BLOCKED.some(b => r.link?.includes(b)));
  console.log(`${valid.length}/${results.length} passed filters`);
  return valid;
}
filteredSearch('martech companies series a 2026');

Expected Output

JSON
A filtered B2B search pipeline that removes noise results by domain blocklist, negative keywords, and URL patterns, with logging for rule tuning.

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

Python 3.8+ installed. requests library installed. A Scavio API key from scavio.dev. An existing B2B search pipeline or lead list. A Scavio API key gives you 500 free credits per month.

Yes. The free tier includes 500 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Filter out irrelevant leads from your B2B search pipeline using negative keyword matching and domain exclusion. Python tutorial with Scavio API.