Definition
A data architecture pattern where enrichment requests cascade through multiple data sources in priority order -- primary source first, then secondary, then tertiary -- with validation at each step, maximizing match rates while controlling cost by attempting cheaper sources first.
In Depth
Single-source enrichment typically achieves 40-60% match rates. The waterfall pattern chains multiple sources to reach 80-95% by falling through to secondary sources when the primary returns no match. Architecture: define an ordered list of data sources with cost and expected match rate for each. For each record, query source #1. If it returns valid data, stop. If not, query source #2. Continue until a match is found or all sources are exhausted. Example lead enrichment waterfall: (1) Google search for company domain ($0.005 via Scavio, ~70% match rate for company names), (2) Reddit search for brand mentions ($0.005 via Scavio, ~30% match rate but provides sentiment data), (3) LinkedIn profile search via scraping or API (~$0.02/query, ~60% match rate for people). Average cost per enriched lead: $0.008 (most stop at source 1) vs $0.02 if you queried all sources for every record. Validation layer: after each source returns data, validate that the result actually matches the input record. Company name fuzzy matching, domain verification, and location cross-checking prevent false matches that pollute your dataset. Validation can be rule-based (string similarity > 0.8) or LLM-based ($0.001/validation via Claude Haiku for nuanced matching). Cost optimization: order sources from cheapest to most expensive, assuming similar match rates. If match rates differ significantly, order by (match_rate / cost) to maximize value per dollar spent. Cache results aggressively: if you enriched 'Scavio' yesterday, the company data has not changed today.
Example Usage
The enrichment pipeline runs leads through a 3-source waterfall: Scavio Google search ($0.005, 72% match), Scavio Reddit search ($0.005, 28% match for remaining), then direct website scrape ($0.02, 45% match for remaining). Combined match rate: 89%. Average cost: $0.007/lead.
Platforms
Enrichment Waterfall Pattern is relevant across the following platforms, all accessible through Scavio's unified API:
- Amazon
Related Terms
Multi-Platform Enrichment
The process of enriching a record (lead, product, keyword, or topic) with data from multiple search platforms (Google, A...
Lead Enrichment Schema
A predefined JSON structure that standardizes the output format of lead enrichment data regardless of the upstream data ...
n8n Schema Normalizer
A reusable n8n sub-workflow that sits between a third-party API call and downstream processing, transforming the provide...