localenrichmentlead-gen

Local Business Enrichment from Multiple Sources

Local lead gen needs data from Google Maps, Yelp, and business directories. Multi-source enrichment pipeline that cross-references and deduplicates.

8 min

Google Maps shows only the most relevant businesses for a given search, missing the long tail. Combining Google local pack data with Yelp and directory listings via SERP queries returns broader coverage than any single source. The multi-source approach yields 3-5x more unique businesses per metro area at $0.005 per query.

Single-Source Limitations

A Google Maps search for "plumbers in Austin" returns 20-40 businesses depending on viewport. Apify's Google Maps Scraper ($49/mo) can capture more, but it still only sees what Maps shows. The businesses that appear are Google's relevance-ranked subset, not the complete market. Newer businesses, those with few reviews, or those outside the default viewport get filtered out.

Yelp surfaces a different subset. Yellow Pages captures businesses that maintain directory listings but may not have Google Business profiles. Each source has its own bias. The union of all three is closer to the real market than any single source.

Multi-Source Query Strategy

Python
import os, requests

API_KEY = os.environ["SCAVIO_API_KEY"]
H = {"x-api-key": API_KEY, "Content-Type": "application/json"}

def search_local(query, location):
    res = requests.post("https://api.scavio.dev/api/v1/search",
        headers=H, json={"query": query, "country_code": "us",
                         "location": location})
    return res.json()

def multi_source_leads(category, city, neighborhoods):
    seen = set()
    leads = []
    query_count = 0

    for hood in neighborhoods:
        queries = [
            f"{category} {hood} {city}",
            f"{category} {hood} {city} yelp",
            f"best {category} near {hood} {city}",
        ]
        for q in queries:
            query_count += 1
            data = search_local(q, city)
            for r in data.get("local_results", []):
                name = r.get("title", "").strip().lower()
                if name and name not in seen:
                    seen.add(name)
                    leads.append({
                        "name": r.get("title"),
                        "rating": r.get("rating"),
                        "reviews": r.get("reviews"),
                        "address": r.get("address"),
                        "phone": r.get("phone"),
                        "source": "local_pack",
                    })
            for r in data.get("organic_results", []):
                title = r.get("title", "").strip().lower()
                if any(d in r.get("link", "") for d in ["yelp.com", "yellowpages.com"]):
                    if title not in seen:
                        seen.add(title)
                        leads.append({
                            "name": r.get("title"),
                            "source": "directory",
                            "url": r.get("link"),
                        })

    print(f"Queries: {query_count} | Cost: ${query_count * 0.005:.2f}")
    print(f"Unique leads: {len(leads)}")
    return leads

neighborhoods = ["Downtown", "South Congress", "East Austin",
                 "North Loop", "Mueller", "Domain"]
leads = multi_source_leads("plumber", "Austin TX", neighborhoods)
for l in leads[:10]:
    print(f"  {l.get('name', 'N/A'):40} | {l.get('rating', 'N/A')} stars | {l['source']}")

Deduplication Strategy

Business names appear differently across sources: "Joe's Plumbing LLC" on Google, "Joe's Plumbing" on Yelp, "Joes Plumbing LLC" on Yellow Pages. Simple exact-match dedup misses these. A practical approach: normalize names by lowercasing, removing common suffixes (LLC, Inc, Corp), and stripping punctuation before comparing. For higher accuracy, also match on phone number or address when available.

When Apify or Outscraper Win

For exhaustive market census (every single plumber in Texas, not just Austin), dedicated scraping tools like Outscraper ($0.002/result for Google Maps) or Apify ($49/mo for 40k results) provide broader coverage because they iterate through Maps' scroll pagination. The SERP-based approach is better for targeted lead generation (top businesses per neighborhood) rather than complete enumeration.

LeadSwift and similar tools that combine Maps, Yelp, and Yellow Pages into one scraping pipeline solve the multi-source problem at a higher level of abstraction. They cost more ($97+/mo) but eliminate the build work. If your team lacks engineering capacity, these tools are the pragmatic choice.

The API approach wins when: you need programmatic access in an existing pipeline (n8n, agent workflow), you want to combine local data with other SERP features (PAA questions, organic rankings), or your lead volume is under 1,000/month where per-query pricing beats subscriptions.