Glossary

Enrichment Waterfall Pattern

A data architecture pattern where enrichment requests cascade through multiple data sources in priority order -- primary source first, then secondary, then tertiary -- with validation at each step, maximizing match rates while controlling cost by attempting cheaper sources first.

Definition

A data architecture pattern where enrichment requests cascade through multiple data sources in priority order -- primary source first, then secondary, then tertiary -- with validation at each step, maximizing match rates while controlling cost by attempting cheaper sources first.

In Depth

Single-source enrichment typically achieves 40-60% match rates. The waterfall pattern chains multiple sources to reach 80-95% by falling through to secondary sources when the primary returns no match. Architecture: define an ordered list of data sources with cost and expected match rate for each. For each record, query source #1. If it returns valid data, stop. If not, query source #2. Continue until a match is found or all sources are exhausted. Example lead enrichment waterfall: (1) Google search for company domain ($0.005 via Scavio, ~70% match rate for company names), (2) Reddit search for brand mentions ($0.005 via Scavio, ~30% match rate but provides sentiment data), (3) LinkedIn profile search via scraping or API (~$0.02/query, ~60% match rate for people). Average cost per enriched lead: $0.008 (most stop at source 1) vs $0.02 if you queried all sources for every record. Validation layer: after each source returns data, validate that the result actually matches the input record. Company name fuzzy matching, domain verification, and location cross-checking prevent false matches that pollute your dataset. Validation can be rule-based (string similarity > 0.8) or LLM-based ($0.001/validation via Claude Haiku for nuanced matching). Cost optimization: order sources from cheapest to most expensive, assuming similar match rates. If match rates differ significantly, order by (match_rate / cost) to maximize value per dollar spent. Cache results aggressively: if you enriched 'Scavio' yesterday, the company data has not changed today.

Example Usage

Real-World Example

The enrichment pipeline runs leads through a 3-source waterfall: Scavio Google search ($0.005, 72% match), Scavio Reddit search ($0.005, 28% match for remaining), then direct website scrape ($0.02, 45% match for remaining). Combined match rate: 89%. Average cost: $0.007/lead.

Platforms

Enrichment Waterfall Pattern is relevant across the following platforms, all accessible through Scavio's unified API:

  • Google
  • Reddit
  • Amazon

Related Terms

Frequently Asked Questions

A data architecture pattern where enrichment requests cascade through multiple data sources in priority order -- primary source first, then secondary, then tertiary -- with validation at each step, maximizing match rates while controlling cost by attempting cheaper sources first.

The enrichment pipeline runs leads through a 3-source waterfall: Scavio Google search ($0.005, 72% match), Scavio Reddit search ($0.005, 28% match for remaining), then direct website scrape ($0.02, 45% match for remaining). Combined match rate: 89%. Average cost: $0.007/lead.

Enrichment Waterfall Pattern is relevant to Google, Reddit, Amazon. Scavio provides a unified API to access data from all of these platforms.

Single-source enrichment typically achieves 40-60% match rates. The waterfall pattern chains multiple sources to reach 80-95% by falling through to secondary sources when the primary returns no match. Architecture: define an ordered list of data sources with cost and expected match rate for each. For each record, query source #1. If it returns valid data, stop. If not, query source #2. Continue until a match is found or all sources are exhausted. Example lead enrichment waterfall: (1) Google search for company domain ($0.005 via Scavio, ~70% match rate for company names), (2) Reddit search for brand mentions ($0.005 via Scavio, ~30% match rate but provides sentiment data), (3) LinkedIn profile search via scraping or API (~$0.02/query, ~60% match rate for people). Average cost per enriched lead: $0.008 (most stop at source 1) vs $0.02 if you queried all sources for every record. Validation layer: after each source returns data, validate that the result actually matches the input record. Company name fuzzy matching, domain verification, and location cross-checking prevent false matches that pollute your dataset. Validation can be rule-based (string similarity > 0.8) or LLM-based ($0.001/validation via Claude Haiku for nuanced matching). Cost optimization: order sources from cheapest to most expensive, assuming similar match rates. If match rates differ significantly, order by (match_rate / cost) to maximize value per dollar spent. Cache results aggressively: if you enriched 'Scavio' yesterday, the company data has not changed today.

Enrichment Waterfall Pattern

Start using Scavio to work with enrichment waterfall pattern across Google, Amazon, YouTube, Walmart, and Reddit.