Tutorial

How to Scrape B2B Directories with n8n and a Search API

Build an n8n workflow that extracts company data from B2B directories via search API. No browser automation needed. Step-by-step tutorial.

B2B directories like Clutch, G2, and industry-specific listings are rich sources of company data for outbound sales. Instead of building fragile browser scrapers that break when directory layouts change, you can search for directory listings via a search API and extract structured data from the SERP snippets. This approach is faster, cheaper, and more maintainable. This tutorial builds an n8n pipeline that queries directories, extracts company information, and outputs a clean lead list. Each search costs $0.005 via Scavio.

Prerequisites

  • n8n instance running (self-hosted or cloud)
  • A Scavio API key from scavio.dev
  • Target industry or niche to prospect
  • Google Sheets or CRM for output

Walkthrough

Step 1: Define directory search queries

Craft search queries that target B2B directory listings. The pattern is: site:directory.com + niche + location. This retrieves only directory pages, not random websites.

JavaScript
// n8n Code node to generate targeted queries:
const directories = [
  { name: 'clutch', query: 'site:clutch.co' },
  { name: 'g2', query: 'site:g2.com' },
  { name: 'goodfirms', query: 'site:goodfirms.co' }
];
const niches = ['marketing agency', 'web development company', 'IT consulting'];

const queries = [];
for (const dir of directories) {
  for (const niche of niches) {
    queries.push({
      json: {
        directory: dir.name,
        searchQuery: `${dir.query} ${niche}`,
        niche
      }
    });
  }
}
return queries; // 9 targeted directory queries

Step 2: Execute search queries via HTTP Request

For each query, call the Scavio API to get directory listing results. The organic results contain company names, descriptions, and ratings in the snippets.

JSON
// n8n HTTP Request node:
// Method: POST
// URL: https://api.scavio.dev/api/v1/search
// Headers: x-api-key: {{ $env.SCAVIO_API_KEY }}
// Body:
{
  "query": "{{ $json.searchQuery }}",
  "country_code": "us"
}

Step 3: Parse company data from SERP results

Extract company names, URLs, and descriptions from the organic results. Directory pages have predictable title formats that can be parsed.

JavaScript
// n8n Code node to parse directory listings:
const data = $input.first().json;
const companies = (data.organic_results || []).map(r => {
  // Clutch titles: "Company Name - Reviews, Cost & More"
  // G2 titles: "Company Name Reviews 2026"
  const name = r.title.split(' - ')[0].split(' Reviews')[0].trim();
  return {
    name,
    url: r.link,
    description: r.snippet || '',
    directory: $('Code').first().json.directory,
    niche: $('Code').first().json.niche
  };
}).filter(c => c.name.length > 2 && c.name.length < 100);

return companies.map(c => ({ json: c }));

Step 4: Deduplicate and enrich with website search

Remove duplicate companies across directories and optionally enrich with a direct search for each company to find their actual website and contact info.

JavaScript
// Dedup in a Code node:
const seen = new Set();
const unique = [];
for (const item of $input.all()) {
  const key = item.json.name.toLowerCase();
  if (!seen.has(key)) {
    seen.add(key);
    unique.push(item);
  }
}
return unique;

// Then enrich each with a second search:
// HTTP Request node:
// Body:
{
  "query": "{{ $json.name }} company website contact",
  "country_code": "us"
}

Step 5: Export to Google Sheets with cost tracking

Write the cleaned leads to a Google Sheet. Add a cost column so you know exactly what the extraction run cost.

JavaScript
// Final Code node before Google Sheets:
const leads = $input.all().map((item, i) => ({
  json: {
    ...item.json,
    extractedAt: new Date().toISOString(),
    estimatedCost: ((i + 1) * 0.005).toFixed(3)
  }
}));

const totalCost = (leads.length * 0.005).toFixed(2);
console.log(`Extracted ${leads.length} leads, cost: $${totalCost}`);
return leads;

// Google Sheets node: Append to "B2B Leads" sheet
// Total cost: 9 directory queries + ~50 enrichment queries = ~$0.30

Python Example

Python
import os, requests, time

API_KEY = os.environ['SCAVIO_API_KEY']

def search(query: str) -> dict:
    resp = requests.post('https://api.scavio.dev/api/v1/search',
        headers={'x-api-key': API_KEY, 'Content-Type': 'application/json'},
        json={'query': query, 'country_code': 'us'})
    return resp.json()

def scrape_directory(directory_site: str, niche: str) -> list:
    data = search(f'site:{directory_site} {niche}')
    companies = []
    for r in data.get('organic_results', []):
        name = r['title'].split(' - ')[0].split(' Reviews')[0].strip()
        companies.append({'name': name, 'url': r['link'],
                         'snippet': r.get('snippet', ''), 'directory': directory_site})
    return companies

def main():
    directories = ['clutch.co', 'g2.com', 'goodfirms.co']
    all_leads = []
    seen = set()
    for d in directories:
        companies = scrape_directory(d, 'marketing agency')
        for c in companies:
            if c['name'].lower() not in seen:
                seen.add(c['name'].lower())
                all_leads.append(c)
        time.sleep(0.3)
    print(f'Found {len(all_leads)} unique companies')
    for lead in all_leads[:5]:
        print(f'  {lead["name"]} ({lead["directory"]})')

if __name__ == '__main__':
    main()

JavaScript Example

JavaScript
const API_KEY = process.env.SCAVIO_API_KEY;

async function search(query) {
  const resp = await fetch('https://api.scavio.dev/api/v1/search', {
    method: 'POST',
    headers: { 'x-api-key': API_KEY, 'Content-Type': 'application/json' },
    body: JSON.stringify({ query, country_code: 'us' })
  });
  return resp.json();
}

async function main() {
  const directories = ['clutch.co', 'g2.com', 'goodfirms.co'];
  const seen = new Set();
  const leads = [];
  for (const dir of directories) {
    const data = await search(`site:${dir} marketing agency`);
    for (const r of data.organic_results || []) {
      const name = r.title.split(' - ')[0].split(' Reviews')[0].trim();
      if (!seen.has(name.toLowerCase())) {
        seen.add(name.toLowerCase());
        leads.push({ name, url: r.link, directory: dir });
      }
    }
  }
  console.log(`Found ${leads.length} unique companies`);
  leads.slice(0, 5).forEach(l => console.log(`  ${l.name} (${l.directory})`));
}

main().catch(console.error);

Expected Output

JSON
Found 27 unique companies across 3 directories
  WebFX (clutch.co)
  Ignite Digital (clutch.co)
  SmartSites (g2.com)
  Thrive Internet Marketing (goodfirms.co)
  Disruptive Advertising (g2.com)

Cost: 3 directory queries = $0.015
With enrichment: ~$0.15 total

Related Tutorials

Frequently Asked Questions

Most developers complete this tutorial in 15 to 30 minutes. You will need a Scavio API key (free tier works) and a working Python or JavaScript environment.

n8n instance running (self-hosted or cloud). A Scavio API key from scavio.dev. Target industry or niche to prospect. Google Sheets or CRM for output. A Scavio API key gives you 250 free credits per month.

Yes. The free tier includes 250 credits per month, which is more than enough to complete this tutorial and prototype a working solution.

Scavio has a native LangChain package (langchain-scavio), an MCP server, and a plain REST API that works with any HTTP client. This tutorial uses the raw REST API, but you can adapt to your framework of choice.

Start Building

Build an n8n workflow that extracts company data from B2B directories via search API. No browser automation needed. Step-by-step tutorial.