Overview
Online directories like Clutch, G2, and Capterra have hundreds of pages of listings. Manually browsing is slow and incomplete. This n8n workflow automates paginated search queries to extract all listings in a category, deduplicates results, and builds a master prospect list. Each page of search results costs $0.005.
Trigger
Weekly cron on Monday at 3 AM UTC or on-demand for new categories.
Schedule
Weekly (Monday 3 AM UTC)
Workflow Steps
Configure Directory and Category Targets
Define which directories to search and which categories to extract. Each target includes the directory domain and category keywords.
Execute Paginated Search Queries
For each directory-category pair, run multiple search queries with page offsets to capture all listings. Continue until results are empty or reach max pages.
Extract Company Data from Results
Parse company names, descriptions, and URLs from organic results. Extract additional signals from snippets (ratings, review counts, specialties).
Deduplicate Against Master List
Compare new results against the existing master list. Add only new companies. Flag companies that appeared in previous runs but are now missing.
Export to Google Sheets or CRM
Append new companies to the master spreadsheet or create new CRM contacts. Tag with directory source, category, and extraction date.
Python Implementation
import requests, os, json
API_KEY = os.environ["SCAVIO_API_KEY"]
def paginated_directory_search(directory: str, category: str, max_pages: int = 5) -> list:
"""Search a directory with pagination."""
all_results = []
for page in range(max_pages):
resp = requests.post(
"https://api.scavio.dev/api/v1/search",
headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
json={"query": f"site:{directory} {category}", "country_code": "us", "start": page * 10},
timeout=15,
)
data = resp.json()
results = data.get("organic_results", [])
if not results:
break
for r in results:
all_results.append({"title": r.get("title", ""), "url": r.get("link", ""), "snippet": r.get("snippet", "")})
# Deduplicate by URL
seen = set()
unique = []
for r in all_results:
if r["url"] not in seen:
seen.add(r["url"])
unique.append(r)
return unique
listings = paginated_directory_search("clutch.co", "seo agencies", max_pages=5)
print(f"Extracted {len(listings)} unique listings from Clutch")JavaScript Implementation
const H = {'x-api-key': process.env.SCAVIO_API_KEY, 'Content-Type': 'application/json'};
async function paginatedSearch(directory, category, maxPages=5) {
const all = [];
for (let page=0; page<maxPages; page++) {
const r = await fetch('https://api.scavio.dev/api/v1/search', {method:'POST', headers:H, body:JSON.stringify({query:'site:'+directory+' '+category, country_code:'us', start:page*10})});
const d = await r.json();
if (!(d.organic_results||[]).length) break;
d.organic_results.forEach(r => all.push({title:r.title, url:r.link, snippet:r.snippet}));
}
const seen = new Set();
return all.filter(r => { if (seen.has(r.url)) return false; seen.add(r.url); return true; });
}
const listings = await paginatedSearch('clutch.co', 'seo agencies', 5);
console.log(listings.length + ' unique listings extracted');Platforms Used
Web search with knowledge graph, PAA, and AI overviews