n8nscrapingsearch-api

n8n Directory Scraping with Structured API

Replace brittle n8n HTML scraping workflows with structured search API calls. No selectors, no proxies, no CAPTCHAs.

7 min

n8n directory scraping workflows that use raw HTTP scraping break constantly because directory sites change their HTML structure. A structured search API returns parsed JSON with business names, addresses, phones, and ratings -- no CSS selectors to maintain, no anti-bot detection to evade, no proxy rotation to manage.

Why scraping directories fails in n8n

The HTML Extraction node works until the target site updates its layout. Yelp, Yellow Pages, and niche directories redesign quarterly. Each redesign breaks your selectors. You also face CAPTCHAs, IP blocks after 50-100 requests, and JavaScript-rendered content that n8n HTTP nodes cannot execute.

The structured alternative

Instead of scraping directory HTML, search for the same data via Google Maps or Google Search. Directories rank for local queries, so SERP data includes the same businesses plus Google-verified information (hours, phone, reviews, website).

JSON
{
  "nodes": [
    {
      "name": "Schedule",
      "type": "n8n-nodes-base.scheduleTrigger",
      "parameters": { "rule": { "interval": [{ "field": "days", "daysInterval": 1 }] } }
    },
    {
      "name": "Generate Queries",
      "type": "n8n-nodes-base.function",
      "parameters": {
        "functionCode": "const niches = ['dentist', 'plumber', 'lawyer'];
const cities = ['Miami FL', 'Tampa FL'];
return niches.flatMap(n => cities.map(c => ({json: {query: n + ' in ' + c}})));"
      }
    },
    {
      "name": "Search API",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "method": "POST",
        "url": "https://api.scavio.dev/api/v1/search",
        "headers": { "x-api-key": "={{ $env.SCAVIO_API_KEY }}" },
        "body": "={{ JSON.stringify({query: $json.query, search_type: 'maps', num_results: 20}) }}"
      }
    }
  ]
}

Parsing the structured response

JavaScript
// n8n Function node: extract business data
const results = $input.first().json.local_results || [];

return results.map(biz => ({
  json: {
    name: biz.title,
    address: biz.address,
    phone: biz.phone || "N/A",
    website: biz.website || "N/A",
    rating: biz.rating || 0,
    review_count: biz.reviews || 0,
    category: biz.type || "",
    hours: biz.hours || "",
    latitude: biz.gps_coordinates?.latitude,
    longitude: biz.gps_coordinates?.longitude,
  }
}));

Deduplication across runs

JavaScript
// n8n Function node: deduplicate against existing sheet
const newLeads = $input.all().map(i => i.json);
const existing = $('Read Sheet').all().map(i => i.json);

const existingPhones = new Set(existing.map(e => e.phone));
const existingNames = new Set(existing.map(e => e.name.toLowerCase()));

const unique = newLeads.filter(lead =>
  !existingPhones.has(lead.phone) &&
  !existingNames.has(lead.name.toLowerCase())
);

return unique.map(lead => ({ json: lead }));

Cost: scraping vs structured API

  • Scraping: free (if it works) + proxy costs ($20-50/mo for residential)
  • Firecrawl: 500 free pages, 3K/mo on Hobby plan
  • SerpAPI Maps: $75/mo for 5K queries
  • Scavio Maps: $0.005/query, 1K queries = $5
  • DataForSEO Maps: $0.002/query live, $50 minimum deposit

When scraping still makes sense

If you need data from a specific directory that is not indexed in Google Maps (niche professional directories, government registries, specialized databases), you still need scraping. But for general business discovery -- restaurants, contractors, agencies, medical practices -- Google Maps data via API is more reliable, more complete, and cheaper than scraping individual directories.

Key takeaway

Replace brittle n8n scraping workflows with structured API calls. One HTTP Request node, no selectors to maintain, no proxies to rotate, no CAPTCHAs to solve. The data is the same or better, and the workflow runs without intervention.