n8nscrapingautomation

n8n Directory Scraping: Why HTTP Request Fails

n8n HTTP Request returns empty HTML on JS-rendered directories. The fix: use a search API node instead of scraping. Working n8n workflow included.

8 min

The n8n HTTP Request node returns empty or partial HTML when scraping JavaScript-rendered directories because it sends a plain HTTP GET request without executing JavaScript. Modern directories built with React, Next.js, or Vue load their content dynamically after the initial page load. The HTTP Request node only sees the skeleton HTML, not the rendered listings.

Why the HTTP Request node fails

When you visit a directory like Clutch, ProductHunt, or G2 in your browser, the page loads in two phases. First, the server sends a minimal HTML shell with JavaScript bundles. Second, the JavaScript executes and fetches the actual listing data from an API, then renders it into the DOM.

The n8n HTTP Request node only completes phase one. It receives the HTML shell, which typically contains empty div containers, loading spinners, and script tags. The directory listings do not exist in this initial HTML because they are loaded by JavaScript that the HTTP Request node never runs.

What the response looks like

<!-- What n8n HTTP Request gets: empty shell -->
<div id="root"></div>
<script src="/static/js/main.abc123.js"></script>

<!-- What a browser sees after JS execution: -->
<div id="root">
  <div class="listing-card">
    <h3>Company Name</h3>
    <p>Rating: 4.8/5</p>
    <p>Location: New York, NY</p>
  </div>
  <!-- ...hundreds more listings -->
</div>

Common workarounds (and their problems)

Workaround 1: n8n Puppeteer/Playwright node

n8n has community nodes for headless browsers, but they require a Chromium installation on your n8n server, consume significant memory, and are slow (3-10 seconds per page). They also fail on Cloudflare-protected directories, which many now are.

Workaround 2: Finding the underlying API

Some directories fetch data from a public API that you can call directly. Open browser DevTools, check the Network tab, and look for XHR/fetch requests returning JSON. If you find one, you can call it from the HTTP Request node. However, these internal APIs are undocumented, change without notice, and often require authentication tokens that expire.

Workaround 3: Search API

Search engines have already crawled and indexed the directory listings. Instead of scraping the directory directly, query a search API for the listings you need. The results come back as structured JSON with titles, snippets, and links. No JavaScript rendering required.

Search API approach in n8n

Replace the HTTP Request node with an HTTP Request node that calls a search API instead of the directory URL:

JSON
{
  "method": "POST",
  "url": "https://api.scavio.dev/api/v1/search",
  "headers": {
    "x-api-key": "={{ $env.SCAVIO_API_KEY }}",
    "Content-Type": "application/json"
  },
  "body": {
    "query": "top web development agencies site:clutch.co",
    "num_results": 20
  }
}

This returns structured data for every indexed listing on the directory, including titles, descriptions, ratings (in snippets), and direct URLs. No JavaScript execution needed.

Python equivalent for comparison

Python
import requests, os

def get_directory_listings(directory_domain, category, num_results=20):
    """Get directory listings via search API instead of scraping."""
    resp = requests.post(
        "https://api.scavio.dev/api/v1/search",
        headers={"x-api-key": os.environ["SCAVIO_API_KEY"]},
        json={
            "query": f"{category} site:{directory_domain}",
            "num_results": num_results,
        },
    )
    results = resp.json().get("organic_results", [])
    return [
        {
            "name": r.get("title", "").split(" - ")[0].strip(),
            "url": r.get("link", ""),
            "description": r.get("snippet", ""),
        }
        for r in results
    ]

# Examples
clutch_agencies = get_directory_listings("clutch.co", "web development agencies")
g2_tools = get_directory_listings("g2.com", "project management software")
ph_launches = get_directory_listings("producthunt.com", "AI writing tools 2026")

Limitations of the search API approach

This method is not a perfect replacement for direct scraping:

  • You get search snippets, not the full directory listing with all metadata fields
  • Results are limited to what Google has indexed (usually covers main listings but may miss deep pages)
  • Sorting and filtering is done via search queries, not the directory's own filters
  • Newly added listings may not appear until Google re-crawls the directory

For most use cases (lead generation, market research, competitive analysis), the search snippet data is sufficient. If you need every field from a directory listing (exact pricing tiers, all review scores, feature matrices), you may need to visit the individual listing URLs with a headless browser or use the directory's own API if available.

n8n workflow pattern

A practical n8n workflow for directory data:

  • Step 1: HTTP Request to search API with site:directory.com query
  • Step 2: Extract structured data from organic_results
  • Step 3: Filter results by relevance (remove non-listing pages like blog posts)
  • Step 4: Enrich each listing with a follow-up search for details if needed
  • Step 5: Write to Google Sheets or your CRM

This workflow runs reliably without browser infrastructure, handles Cloudflare-protected directories, and costs $0.005 per search query rather than requiring a dedicated headless browser setup.