B2B directories like Clutch, G2, and industry-specific listings are rich sources of company data for outbound sales. Instead of building fragile browser scrapers that break when directory layouts change, you can search for directory listings via a search API and extract structured data from the SERP snippets. This approach is faster, cheaper, and more maintainable. This tutorial builds an n8n pipeline that queries directories, extracts company information, and outputs a clean lead list. Each search costs $0.005 via Scavio.
Prerequisites
- n8n instance running (self-hosted or cloud)
- A Scavio API key from scavio.dev
- Target industry or niche to prospect
- Google Sheets or CRM for output
Walkthrough
Step 1: Define directory search queries
Craft search queries that target B2B directory listings. The pattern is: site:directory.com + niche + location. This retrieves only directory pages, not random websites.
// n8n Code node to generate targeted queries:
const directories = [
{ name: 'clutch', query: 'site:clutch.co' },
{ name: 'g2', query: 'site:g2.com' },
{ name: 'goodfirms', query: 'site:goodfirms.co' }
];
const niches = ['marketing agency', 'web development company', 'IT consulting'];
const queries = [];
for (const dir of directories) {
for (const niche of niches) {
queries.push({
json: {
directory: dir.name,
searchQuery: `${dir.query} ${niche}`,
niche
}
});
}
}
return queries; // 9 targeted directory queriesStep 2: Execute search queries via HTTP Request
For each query, call the Scavio API to get directory listing results. The organic results contain company names, descriptions, and ratings in the snippets.
// n8n HTTP Request node:
// Method: POST
// URL: https://api.scavio.dev/api/v1/search
// Headers: x-api-key: {{ $env.SCAVIO_API_KEY }}
// Body:
{
"query": "{{ $json.searchQuery }}",
"country_code": "us"
}Step 3: Parse company data from SERP results
Extract company names, URLs, and descriptions from the organic results. Directory pages have predictable title formats that can be parsed.
// n8n Code node to parse directory listings:
const data = $input.first().json;
const companies = (data.organic_results || []).map(r => {
// Clutch titles: "Company Name - Reviews, Cost & More"
// G2 titles: "Company Name Reviews 2026"
const name = r.title.split(' - ')[0].split(' Reviews')[0].trim();
return {
name,
url: r.link,
description: r.snippet || '',
directory: $('Code').first().json.directory,
niche: $('Code').first().json.niche
};
}).filter(c => c.name.length > 2 && c.name.length < 100);
return companies.map(c => ({ json: c }));Step 4: Deduplicate and enrich with website search
Remove duplicate companies across directories and optionally enrich with a direct search for each company to find their actual website and contact info.
// Dedup in a Code node:
const seen = new Set();
const unique = [];
for (const item of $input.all()) {
const key = item.json.name.toLowerCase();
if (!seen.has(key)) {
seen.add(key);
unique.push(item);
}
}
return unique;
// Then enrich each with a second search:
// HTTP Request node:
// Body:
{
"query": "{{ $json.name }} company website contact",
"country_code": "us"
}Step 5: Export to Google Sheets with cost tracking
Write the cleaned leads to a Google Sheet. Add a cost column so you know exactly what the extraction run cost.
// Final Code node before Google Sheets:
const leads = $input.all().map((item, i) => ({
json: {
...item.json,
extractedAt: new Date().toISOString(),
estimatedCost: ((i + 1) * 0.005).toFixed(3)
}
}));
const totalCost = (leads.length * 0.005).toFixed(2);
console.log(`Extracted ${leads.length} leads, cost: $${totalCost}`);
return leads;
// Google Sheets node: Append to "B2B Leads" sheet
// Total cost: 9 directory queries + ~50 enrichment queries = ~$0.30Python Example
import os, requests, time
API_KEY = os.environ['SCAVIO_API_KEY']
def search(query: str) -> dict:
resp = requests.post('https://api.scavio.dev/api/v1/search',
headers={'x-api-key': API_KEY, 'Content-Type': 'application/json'},
json={'query': query, 'country_code': 'us'})
return resp.json()
def scrape_directory(directory_site: str, niche: str) -> list:
data = search(f'site:{directory_site} {niche}')
companies = []
for r in data.get('organic_results', []):
name = r['title'].split(' - ')[0].split(' Reviews')[0].strip()
companies.append({'name': name, 'url': r['link'],
'snippet': r.get('snippet', ''), 'directory': directory_site})
return companies
def main():
directories = ['clutch.co', 'g2.com', 'goodfirms.co']
all_leads = []
seen = set()
for d in directories:
companies = scrape_directory(d, 'marketing agency')
for c in companies:
if c['name'].lower() not in seen:
seen.add(c['name'].lower())
all_leads.append(c)
time.sleep(0.3)
print(f'Found {len(all_leads)} unique companies')
for lead in all_leads[:5]:
print(f' {lead["name"]} ({lead["directory"]})')
if __name__ == '__main__':
main()JavaScript Example
const API_KEY = process.env.SCAVIO_API_KEY;
async function search(query) {
const resp = await fetch('https://api.scavio.dev/api/v1/search', {
method: 'POST',
headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
body: JSON.stringify({ query, country_code: 'us' })
});
return resp.json();
}
async function main() {
const directories = ['clutch.co', 'g2.com', 'goodfirms.co'];
const seen = new Set();
const leads = [];
for (const dir of directories) {
const data = await search(`site:${dir} marketing agency`);
for (const r of data.organic_results || []) {
const name = r.title.split(' - ')[0].split(' Reviews')[0].trim();
if (!seen.has(name.toLowerCase())) {
seen.add(name.toLowerCase());
leads.push({ name, url: r.link, directory: dir });
}
}
}
console.log(`Found ${leads.length} unique companies`);
leads.slice(0, 5).forEach(l => console.log(` ${l.name} (${l.directory})`));
}
main().catch(console.error);Expected Output
Found 27 unique companies across 3 directories
WebFX (clutch.co)
Ignite Digital (clutch.co)
SmartSites (g2.com)
Thrive Internet Marketing (goodfirms.co)
Disruptive Advertising (g2.com)
Cost: 3 directory queries = $0.015
With enrichment: ~$0.15 total