Waterfall Enrichment Explained for Operators

Waterfall enrichment, also called cascading data enrichment, is a contact lookup pattern that sends each row through an ordered stack of data providers and stops on the first one that returns a verified email or phone, so you pay only for the layer that hit. Stacking providers this way raises coverage from the 40 to 60 percent a single source typically returns to 80 percent or higher.

Most teams do not run a real waterfall. They run Apollo first, sigh when half the rows come back empty, paste the leftovers into Hunter, and call it a strategy. The cost model is invisible, the hit rate is whatever it is, and the cascade was never designed. It accreted.

This article is the operator version. What the pattern actually is, the five provider layers worth knowing, the order that minimizes cost without giving up coverage, the geographies that flip the order, and the architecture that runs the whole thing from one config file instead of a node graph.

What waterfall enrichment actually is

Waterfall enrichment is a sequential lookup. You send a row (a name, a company, sometimes a LinkedIn URL) into a stack of data providers in a defined order. Provider one tries to return the email or phone. If it does, the cascade stops and you pay only that provider. If it does not, the row falls through to provider two, then three, until either a verified contact returns or the row exhausts the stack.

The pattern matters because no provider covers the whole market. Apollo is strong on US tech and thin on French manufacturing. ZoomInfo is strong on US enterprise and thin on bootstrapped European SaaS. Hunter is fast, cheap, and only deep on public domain patterns. Stack the right providers in the right order and you compound their strengths without paying for all of them on every row.

The single source alternative is one vendor for every row, accepting whatever coverage that vendor happens to have. FullEnrich's published numbers put single source work email match rates at 40 to 60 percent of a B2B list, with a tuned waterfall reaching 80 percent or higher and bounce rates under 1 percent with triple verification. That gap is the whole game.

If you are new to the broader category, the lead enrichment overview maps the field types (emails, phones, firmographics, intent) before this article narrows to the waterfall pattern specifically.

Why a cascade beats a single source

Three reasons the cascade wins, and they compound.

The first is coverage. A list that gets 55 percent coverage from a single vendor leaves 45 percent of your ICP unreachable. If you spent two weeks building that target list, you just threw away half of it. The waterfall recovers most of that 45 percent by routing to a different provider that happens to hold the row. Datablist models the downstream effect, estimating a 45 percent revenue lift from raising enrichment from 55 percent to 80 percent without touching product, pricing, or conversion rates.

The second is cost discipline. The cascade only charges you for the layer that succeeded. If your cheap first provider returns the email on 60 percent of rows, you never pay your expensive deep search provider for those 60 percent. Most operators assume a waterfall is more expensive because it touches more vendors. The math runs the other way as long as you order the cascade correctly, which is the operator judgment most generalists skip.

The third reason incumbent vendors rarely mention is consensus. When two providers in the cascade return the same email, that agreement is itself a deliverability signal. When they disagree, you have a tie breaker to resolve and you can route the row to verification. A single source cannot give you that second opinion.

There is a real counterargument, and the cleanest statement of it comes from Cognism's pros and cons piece, which warns that data flowing through multiple third party sources makes GDPR and CCPA provenance harder to defend, and that a sloppy cascade can overwrite a verified phone number with a worse one from a less rigorous source. Both objections are valid. The fix is ordering and verification discipline, not abandoning the pattern, and the geography section below shows where the compliance objection bites hardest.

The five provider layers and what each is best at

You do not need fifteen providers. You need five layers, each with a job, and one provider per layer that fits your ICP.

Layer one, cheap pattern based

Hunter, Snov, Prospeo. These tools guess emails from a name plus a domain using public patterns (firstname.lastname, initials, dot variants) and verify the guess with an SMTP check. Cost per find is near zero. Hunter's free tier gives 50 credits a month and its Starter plan runs about 34 euros a month for 2,000 credits on annual billing. Hit rate is real on companies with predictable schemas and collapses on catch all addresses or rotating aliases.

Layer two, bulk B2B database

Apollo, ZoomInfo, Lusha. These vendors maintain large proprietary contact databases. Unit cost per email is small on a seat plan, larger per credit. Coverage is strong on US mid market and varies on EU and APAC by vendor. This is usually the workhorse layer. If Lusha is your current layer two and its hit rate is capping the cascade, the Lusha alternatives ranking compares the stronger single source and waterfall replacements.

Layer three, specialist API

Crustdata, People Data Labs, Datagma. API first contact and firmographic providers, often with stronger international coverage than the bulk databases and built to plug into agent workflows rather than a UI. Unit cost is higher than layer two, and the data is often fresher.

Layer four, bundled waterfall

FullEnrich, BetterContact, Clay's waterfall. These are managed waterfalls that internally cascade through 10 to 20 providers and charge only for verified results. They are themselves a waterfall, which is exactly why they belong as one layer inside your own cascade, not as the whole pipeline. Running a managed waterfall as your entire enrichment is the most common way operators overpay.

Layer five, deep search

ContactOut, mobile phone specialists, hard to reach data vendors. The most expensive lookups, reserved for the 10 to 20 percent of rows nothing else found, or for direct dial mobiles, which are the costliest field by an order of magnitude. The decision rule is firm. Layer five never runs on every row. It runs only on the residual that survived four prior layers, otherwise it eats the budget.

Building the cascade in the right order

The order is cost first, accuracy second, deep search last. Pricing dictates the structure because the entire point is to stop on the cheapest provider that returns a verified result.

Start with layer one. Pattern based finders return results on something like 30 to 50 percent of rows at near zero marginal cost. Skip this layer because the absolute hit rate looks mediocre and you end up paying layer two for rows layer one would have caught for free.

Run layer two next. Apollo or ZoomInfo for US heavy work, Lusha for direct dials in some EU regions. Most of your coverage lands here.

Layer three is specialist routing. If your target list is heavy on European or APAC contacts, swap layers two and three, because the bulk databases weaken and the API specialists strengthen. Crustdata is built for this kind of programmatic cascade and pairs naturally with signal based outbound when you want a hiring or funding trigger on top of the row.

Layer four runs only on rows that survived the first three. This is where you let a managed waterfall do its own waterfall, which sounds redundant and is the cheapest way to consume that service, since it bills only on success.

Layer five runs on the residual, only for fields where the cost is justified. A verified mobile for a target VP sitting on an active buying signal is worth a dollar. A mobile for a row your SDR will never dial is worth nothing.

One mistake worth flagging. Running every layer in parallel and taking the first reply trades cost discipline for speed and only makes sense when the row is time critical, like a live web visitor or a fresh job change. For batch enrichment, sequential is cheaper.

Cost per verified contact, with the math shown

Here is the model nobody publishes. Take 1,000 rows you want enriched for verified work emails. The hit rates below are illustrative planning assumptions, not vendor guarantees, and you should replace them with your own sample numbers (see the sample step later).

Layer	Rows in	Assumed hit rate	Verified found	Unit cost	Spend
1, pattern based	1,000	35%	350	~$0	$0
2, bulk database	650	50%	325	~$0.10	$32.50
3, specialist API	325	40%	130	~$0.20	$26.00
4, bundled waterfall	195	50%	97	~$0.15	$14.55
5, deep search	98	25%	24	~$0.50	$12.00
Total	1,000		926 (92.6%)		~$85

That is roughly $0.09 per verified email across the list, with coverage near 92 percent. The naive comparison is a single source seat at a flat fee returning 55 to 65 percent coverage for similar dollar spend, leaving you out of pocket on every row that returned nothing. The unit cost falls the moment you front load the cheap layers, which is why ordering, not vendor count, is the lever.

Build versus buy lives inside this math. Wiring five separate API contracts yourself costs roughly a week of engineering plus monthly maintenance, and the savings appear only above a few thousand rows a month. Below that, a managed waterfall is cheaper than the engineering time.

FullEnrich as the bundled orchestrator

FullEnrich is the cleanest example of layer four built right. It cascades through more than 15 underlying providers per request, charges only on verified return, and bundles work email, personal email, and mobile phone into one credit ledger so you do not maintain five vendor contracts.

The credit math, fetched from the FullEnrich pricing page on 2026-06-25: 1 credit per verified work email, 3 credits per personal email, 10 credits per mobile phone, and 0.25 credits for standalone person or company enrichment. The free tier is 50 credits, and the entry paid plan runs about 5 euros a month for 1,000 credits, with unused credits rolling over for three months on monthly billing and a year on annual.

In operator terms, that entry plan verifies up to 1,000 work emails for about 5 euros, which lands well under a cent per work email at list scale. Phones cost ten times more because mobile data is ten times harder to source. Personal emails sit in between because personal databases are smaller and refresh slower. Beyond one off list uploads, the same orchestrator drives ongoing plays like turning your LinkedIn network into verified contact data, where the waterfall runs continuously against new connections instead of a single file.

Two cautions. Because FullEnrich is itself a waterfall, dropping it as layer four behind Hunter and Apollo means it does less work than its full price assumes. That is fine economically, since you only pay on success, but be explicit so you never run FullEnrich twice on the same row. Second, a bundled waterfall is opaque about which underlying provider returned a result, which matters little for outbound and a lot for the EU compliance reporting Cognism flagged above.

How geography and ICP rewrite the order

The cascade order is not fixed. ICP and geography rewrite it, and pretending one order fits every list is the fastest way to overspend.

US tech mid market

Apollo or ZoomInfo as layer two is almost always right. Hunter as layer one catches predictable domains. Crustdata or People Data Labs as layer three mops up what the bulk databases miss. Layers four and five as residual cleanup. Expect 85 to 95 percent coverage.

EU mid market

This is where ordering breaks. Apollo and ZoomInfo are weaker on European contacts, Lusha is strong in certain markets, and Cognism is the proprietary single source many GDPR conscious teams default to. A defensible cascade is Hunter, then Cognism or Lusha, then Crustdata, then FullEnrich. Skip the most aggressive deep search vendors for EU outbound where consent and provenance carry legal weight.

APAC mid market

Most US centric databases are thin here. The cascade often starts at layer three and reaches layer five faster than US work does. Coverage tops out lower, typically 65 to 80 percent, because the underlying data simply is not held at US density.

Late stage enterprise

ZoomInfo or Apollo lead, but the deep search layer matters more because senior decision makers use direct dials, not the office switchboard. Expect higher per row cost, and run the cascade only after a tight ICP filter so you are not enriching unqualified accounts at premium pricing.

Bootstrapped or seed stage targets

None of the bulk databases reliably hold these companies yet. Layer one and a web crawling layer three carry most of the load. Layer two is often wasted credits because the company is not in the bulk database.

The honest version is that you do not nail geography on the first run. You sample, measure hit rate per layer, and reorder. Operators who skip the sample pay for a misordered cascade across the whole list.

Five mistakes that blow the cost model

In rough order of how often they happen.

Enriching before deduping

A 5,000 row Sales Navigator export typically carries 20 to 30 percent duplicates and another 15 to 20 percent out of ICP rows. Enriching those spends real money on data you will throw away. Dedupe by company and contact, run the ICP filter, then enrich.

Enriching fields the message never uses

If your sequence does not reference job tenure, do not buy job tenure. The waterfall has a credit for every field you add. Deciding which fields actually drive the message is operator first mile work, and everything else is fat.

Running every provider in parallel for batch work

Parallel is for time critical, signal triggered enrichment. For batch work, sequential is cheaper because the cascade stops on the first hit instead of paying every provider that found the same row.

Treating verification as optional

A verified email is one an SMTP check confirmed deliverable, not just a plausible pattern. Skipping verification ships bounces into HubSpot and into your sending infrastructure. That matters more since the Google and Yahoo bulk sender rules took effect in February 2024, which cap the spam complaint rate at 0.3 percent and require one click unsubscribe (RFC 8058). Bounces and complaints from unverified sends erode the sender reputation those rules now police. Every layer in your cascade should verify before returning.

Skipping the human checkpoint on conflicts

When two providers return different emails for one person and verification likes both, surface the row for a glance. Three seconds of operator attention beats a wrong message in the wrong inbox.

Running the cascade as a skill, not a node graph

Most teams build the cascade as a Make, Zapier, or n8n graph. Provider one node, a conditional, provider two node, another conditional, a deduper, a verifier, a CRM writer. It works until somebody changes a provider API or another teammate needs to edit the flow. At 30 to 40 nodes the graph is unreadable and edits ripple in ways nobody predicts.

The alternative is to run the cascade as a markdown configured skill on top of an operator OS. Each layer is a function. The order is a config file. The CRM write is a function. The pipeline reads like a recipe and edits like one. When Apollo deprecates an endpoint, you change one function in one file. When a row pattern fails, you read the log instead of debugging a graph.

This is the architecture the operator playbook for B2B lead generation sits inside. Humans own first mile (which fields, which ICP slice) and last mile (the call, the reply, the deal). The waterfall is middle mile work, the mechanical orchestration that compounds when an operating system runs it instead of a SaaS UI hosting it.

Yalc is one example of the pattern. Markdown configured, locally installed, talking to Crustdata, FullEnrich, and HubSpot through real APIs, running the cascade from one Claude Code conversation. Every row enriched, every provider tried, every reply tagged sharpens the next run, and the cascade gets cheaper as the config file learns which providers to skip for which ICPs.

The operator template for a thousand contact run

Run this before you start the cascade.

Pull the raw list (Sales Navigator, a Crustdata search, a signal trigger export). Record the raw count.
Dedupe by LinkedIn URL or by company plus name. Expect to cut roughly 20 percent.
Apply the ICP filter (industry, size, geography, signal). Cut another 15 to 25 percent.
Decide which fields the message actually needs. Usually work email plus maybe a LinkedIn URL. Mobile only for the top tier.
Sample 200 rows through the cascade. Measure hit rate per layer.
Reorder if a layer is paying too much for too little. Geographic edits live here.
Run the rest. Verify every result before writing to the CRM.
Log final coverage and cost per verified contact, then use those numbers to refine the next run.

The discipline is in the first four steps, not the cascade itself. The cascade is mechanical once the inputs are clean. Operators who skip steps one through four pay two to three times more per verified contact than operators who do them.

That is what a real waterfall looks like. Not five vendor logins and a Friday of CSV merging. One operator OS, one config file, one prompt, and a coverage number that climbs every time you iterate.

Frequently Asked Questions

What is waterfall enrichment?

Waterfall enrichment is a sequential lookup pattern that sends a contact row through multiple data providers in a defined order. The cascade stops on the first provider that returns a verified email or phone, so you pay only for the layer that succeeded. The pattern raises coverage from the 40 to 60 percent a single source typically returns to 80 percent or higher.

What is cascading data enrichment?

Cascading data enrichment is another name for waterfall enrichment. The two terms describe the same pattern, an ordered cascade of data providers where each row stops on the first provider that returns a verified result. Some teams say cascading and some say waterfall, but the mechanics, the cost model, and the coverage lift from 55 to 80 percent are identical.

How does waterfall enrichment work?

You configure an ordered list of providers. Each row enters at provider one, and if that provider returns a verified result the cascade stops and you are charged for it alone. If not, the row falls through to provider two, then three, until it is enriched or the stack is exhausted. It is usually run in batches and is most cost efficient when the cheapest providers run first.

How many data providers should a waterfall include?

Five layers is the practical ceiling for most B2B operators. One pattern based finder like Hunter, one bulk database like Apollo or ZoomInfo, one API specialist like Crustdata or People Data Labs, one bundled waterfall like FullEnrich, and one deep search for mobiles or hard to reach data. Beyond five, coverage rarely improves and the cascade only slows down.

How much does waterfall enrichment cost?

At a thousand contact volume, a well ordered cascade lands verified work emails for roughly $0.06 to $0.18 each. Mobile numbers cost about ten times that per verified result because phone data is harder to source. A managed waterfall like FullEnrich starts around 5 euros a month for 1,000 credits, where one work email is one credit and one mobile phone is ten.

Should I build a waterfall myself or use a bundled tool?

If you run more than a few thousand enrichments a month and want compliance reporting, geographic routing, or signal triggered runs, building the cascade on an operator OS is cheaper and more flexible. Below that volume, a managed waterfall like FullEnrich is cheaper than the engineering time integration would cost.

What happens when providers return conflicting data?

When two providers return different emails for the same person, route the conflict to verification. If both verify, surface the row for a human review. If one verifies and the other does not, take the verified one. If neither verifies, drop the row. Conflicting data is a signal, not a failure, and the cascade should treat it that way.

Waterfall Enrichment, Explained for Operators

What waterfall enrichment actually is

Why a cascade beats a single source

The five provider layers and what each is best at

Layer one, cheap pattern based

Layer two, bulk B2B database

Layer three, specialist API

Layer four, bundled waterfall

Layer five, deep search

Building the cascade in the right order

Cost per verified contact, with the math shown

FullEnrich as the bundled orchestrator

How geography and ICP rewrite the order

US tech mid market

EU mid market

APAC mid market

Late stage enterprise

Bootstrapped or seed stage targets

Five mistakes that blow the cost model

Enriching before deduping

Enriching fields the message never uses

Running every provider in parallel for batch work

Treating verification as optional

Skipping the human checkpoint on conflicts

Running the cascade as a skill, not a node graph

The operator template for a thousand contact run

Frequently Asked Questions

Run this playbook from Claude Code.

Waterfall Enrichment, Explained for Operators

What waterfall enrichment actually is

Why a cascade beats a single source

The five provider layers and what each is best at

Layer one, cheap pattern based

Layer two, bulk B2B database

Layer three, specialist API

Layer four, bundled waterfall

Layer five, deep search

Building the cascade in the right order

Cost per verified contact, with the math shown

FullEnrich as the bundled orchestrator

How geography and ICP rewrite the order

US tech mid market

EU mid market

APAC mid market

Late stage enterprise

Bootstrapped or seed stage targets

Five mistakes that blow the cost model

Enriching before deduping

Enriching fields the message never uses

Running every provider in parallel for batch work

Treating verification as optional

Skipping the human checkpoint on conflicts

Running the cascade as a skill, not a node graph

The operator template for a thousand contact run

Frequently Asked Questions

Run this playbook from Claude Code.

More from the Yalc blog

How to Automate Outreach With Kimi K3 Without Breaking Your Sending Stack

How to Build a Second Brain With Kimi K3 (and Why a Big Context Is Not Memory)

How to Connect Kimi K3 to Claude Code Without Breaking Your Setup