Quick summary
Summarize this blog with AI
Introduction
Funnel SQL goes wrong for a predictable reason: the raw event log is noisier than the metric sounds. Users can trigger the same step multiple times, later steps can happen before earlier steps in dirty data, and the date you group by is often not the date the business actually cares about.
That is why people struggle with prompts like "daily conversion rate from signup to purchase" even when they already know joins, window functions, and aggregates. The hard part is usually not the division. The hard part is defining the cohort and reducing the event stream to one meaningful row per user per step.
If you anchor the metric correctly, enforce step order, and keep the denominator tied to the first step, funnel queries become much more reliable. This guide shows the default pattern that holds up in both interviews and production analytics work.
Three Rules Before You Write SQL
Start with these rules before touching the keyboard:
- Define the cohort anchor. Is the metric grouped by signup date, install date, trial-start date, or purchase date?
- Reduce each step to one row per user. Raw events multiply rows and inflate counts.
- Enforce step order and time windows. A purchase should not count if it happened before signup or outside the allowed window.
If the prompt still feels vague after that, treat it like a grain-and-metrics problem first. The same framing discipline from defining grain, metrics, and assumptions applies here too.
What a Funnel Metric Is Really Measuring
| Metric shape | Correct anchor | Typical denominator | Typical numerator |
|---|---|---|---|
| 7-day signup-to-purchase conversion | Signup date | Users whose first signup happened that day | Those same users with a purchase within 7 days |
| Activation rate after account creation | Account-creation date | New accounts | New accounts that activated within the rule |
| Operational same-day purchase rate | Calendar day of activity | Users who started the step that day | Users who completed the later step that day |
The first two are cohort metrics. The third is an event-date metric. They are not interchangeable, even if the words sound similar.
Why Raw-Event Joins Inflate Counts
Suppose one user signs up once and purchases three times. A direct self-join on the events table turns that one user into three joined rows.
A naive query often looks like this:
SELECT
CAST(s.event_ts AS DATE) AS signup_date,
COUNT(DISTINCT s.user_id) AS signup_users,
COUNT(DISTINCT p.user_id) AS purchase_users
FROM events s
LEFT JOIN events p
ON p.user_id = s.user_id
AND p.event_name = 'purchase'
WHERE s.event_name = 'signup'
GROUP BY CAST(s.event_ts AS DATE);
This is already fragile:
- It does not guarantee the purchase happened after signup.
- It does not guarantee you are using the user's first signup.
- It still lets repeated signups and repeated purchases distort the logic before the distinct count even runs.
COUNT(DISTINCT user_id) can remove some duplication, but it does not repair a broken cohort definition.
Step 1: Build a One-Row-Per-User Step Table
The cleanest default is to collapse the raw event stream into one relevant timestamp per user per step.
WITH step_times AS (
SELECT
user_id,
MIN(CASE WHEN event_name = 'signup' THEN event_ts END) AS signup_ts,
MIN(CASE WHEN event_name = 'activated' THEN event_ts END) AS activated_ts,
MIN(CASE WHEN event_name = 'purchase' THEN event_ts END) AS purchase_ts
FROM events
WHERE event_name IN ('signup', 'activated', 'purchase')
GROUP BY user_id
)
SELECT *
FROM step_times;
This is usually the real foundation of the funnel. Once you have a user-step table, the final aggregation gets much easier to reason about.
If the business rule is not "first event wins," change the reduction logic accordingly. Sometimes you need first successful payment, first activation after signup, or any purchase within the window. The point is to make the choice explicit.
Pattern 1: Daily 7-Day Signup-to-Purchase Conversion
For a standard cohort metric, anchor the result on signup date and test whether purchase happened within the allowed window.
WITH signups AS (
SELECT
user_id,
MIN(event_ts) AS signup_ts
FROM events
WHERE event_name = 'signup'
GROUP BY user_id
), purchases AS (
SELECT
user_id,
MIN(event_ts) AS purchase_ts
FROM events
WHERE event_name = 'purchase'
GROUP BY user_id
)
SELECT
CAST(s.signup_ts AS DATE) AS signup_date,
COUNT(*) AS signup_users,
COUNT(CASE
WHEN p.purchase_ts >= s.signup_ts
AND p.purchase_ts < s.signup_ts + INTERVAL '7 days'
THEN 1 END) AS converted_users,
ROUND(
COUNT(CASE
WHEN p.purchase_ts >= s.signup_ts
AND p.purchase_ts < s.signup_ts + INTERVAL '7 days'
THEN 1 END
) * 100.0 / COUNT(*),
2
) AS conversion_rate_pct
FROM signups s
LEFT JOIN purchases p
ON p.user_id = s.user_id
GROUP BY CAST(s.signup_ts AS DATE)
ORDER BY signup_date;
This works because the denominator is stable. Each signup user appears once, and the numerator is a filtered subset of that same cohort.
Why the Same-Day Ratio Is Usually the Wrong Answer
People often write something like this instead:
SELECT
DATE(event_ts) AS dt,
COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN user_id END)
/ COUNT(DISTINCT CASE WHEN event_name = 'signup' THEN user_id END)
FROM events
GROUP BY DATE(event_ts);
That ratio compares same-day purchase activity with same-day signup activity. It does not answer "what share of users who signed up on a given day converted within seven days?" Buyers from older cohorts will leak into the numerator, and ordering between steps is not checked at all.
The result may be a useful operational metric, but it is not a signup-cohort funnel.
Pattern 2: Multi-Step Funnel with Ordered Steps
Now suppose the steps are signup, activated, and purchase. You can keep the same one-row-per-user foundation and layer conditions on top:
WITH step_times AS (
SELECT
user_id,
MIN(CASE WHEN event_name = 'signup' THEN event_ts END) AS signup_ts,
MIN(CASE WHEN event_name = 'activated' THEN event_ts END) AS activated_ts,
MIN(CASE WHEN event_name = 'purchase' THEN event_ts END) AS purchase_ts
FROM events
WHERE event_name IN ('signup', 'activated', 'purchase')
GROUP BY user_id
)
SELECT
CAST(signup_ts AS DATE) AS signup_date,
COUNT(*) AS signup_users,
COUNT(CASE
WHEN activated_ts >= signup_ts THEN 1 END) AS activated_users,
COUNT(CASE
WHEN activated_ts >= signup_ts
AND purchase_ts >= activated_ts
THEN 1 END) AS purchased_after_activation_users
FROM step_times
WHERE signup_ts IS NOT NULL
GROUP BY CAST(signup_ts AS DATE)
ORDER BY signup_date;
This is the key difference between funnel SQL and generic distinct counting: each later step is checked against the timestamps of the earlier steps, not counted in isolation.
When You Need Time Windows
Many funnels are really "ordered steps inside a deadline." For example:
- activation within 3 days of signup
- purchase within 14 days of activation
- second session within 7 days of install
Add those windows directly at the user-step level:
COUNT(CASE
WHEN activated_ts >= signup_ts
AND activated_ts < signup_ts + INTERVAL '3 days'
THEN 1 END) AS activated_in_3d
This keeps the cohort definition clean while making the business rule obvious in the query.
COUNT(DISTINCT) Is Helpful, but It Is Not the Fix
COUNT(DISTINCT user_id) is useful when the current input may still contain multiple rows per user after a necessary join. It is not a substitute for:
- anchoring the cohort on the right first step
- reducing repeated events intentionally
- checking that later steps happen after earlier steps
- grouping by the correct date
If those rules are wrong, distinct counting can make the numbers look cleaner while the metric remains wrong.
- Build the funnel on a stable identity key if the product has pre-login and post-login events.
- Use event time for cohort logic and ingestion time for data-quality checks.
- When dashboards refresh during an active backfill, expect recent cohorts to shift until the pipeline catches up.
Another production issue is late-arriving data. Funnel semantics should usually use the business event timestamp, not the ingestion timestamp. Keep ingestion time for freshness monitoring, but do not let delayed backfills silently move users into the wrong cohort date.
Real event data often uses more than one identifier. A user may browse anonymously, sign up with an anonymous_id, and later purchase under a logged-in user_id. If those identifiers are not stitched to a stable person key before funnel logic runs, the numerator and denominator can describe different people even when the SQL looks correct.
Identity Stitching and Late-Arriving Events
This is only safe if users is one row per user or if you have already defined which user snapshot should win. If channel or plan changes over time, you still need an explicit rule for which value belongs to the cohort.
WITH step_times AS (
SELECT
user_id,
MIN(CASE WHEN event_name = 'signup' THEN event_ts END) AS signup_ts,
MIN(CASE WHEN event_name = 'purchase' THEN event_ts END) AS purchase_ts
FROM events
WHERE event_name IN ('signup', 'purchase')
GROUP BY user_id
)
SELECT
CAST(s.signup_ts AS DATE) AS signup_date,
u.acquisition_channel,
COUNT(*) AS signup_users,
COUNT(CASE
WHEN s.purchase_ts >= s.signup_ts
AND s.purchase_ts < s.signup_ts + INTERVAL '7 days'
THEN 1 END) AS converted_users
FROM step_times s
JOIN users u
ON u.user_id = s.user_id
WHERE s.signup_ts IS NOT NULL
GROUP BY CAST(s.signup_ts AS DATE), u.acquisition_channel;
If you need a funnel by acquisition channel, country, plan, or account segment, attach those dimensions after you collapse events to one row per user. Joining extra event tables too early recreates the same row multiplication you were trying to avoid.
Join Dimensions After the Step Table, Not Before
Edge Cases Worth Calling Out Explicitly
- Multiple signups: decide whether to use the first signup ever, the first signup in the reporting window, or the latest reactivation.
- Failed or test events: exclude them before building the step table.
- Timezone rules: be explicit about whether the cohort date is UTC or business-local time.
- Current partial day or week: exclude incomplete cohorts when the business wants mature conversion windows.
- Same-timestamp events: if event order matters and timestamps tie, add a stable secondary ordering field.
These are the same kinds of edge cases that also break retention queries and cohort analysis. Funnel logic is just another version of the same discipline.
A Practical Debug Checklist
- Write down the denominator in plain English first.
- Make sure each user appears once in that denominator.
- Check whether the numerator is a subset of that same cohort.
- Verify step order with timestamps, not assumptions.
- Check that the grouped date matches the metric definition.
- Only then calculate the rate.
If the numbers still look suspicious, inspect a handful of user-level examples before changing the aggregation. Funnel bugs are usually easier to spot at the per-user level than in the final percentage.
FAQ
Should I group by signup date or purchase date?
Group by the date of the cohort-start event when the question is cohort conversion, such as signup-to-purchase within 7 days. Group by purchase date only when the business explicitly wants an event-date metric.
Is COUNT(DISTINCT user_id) enough to fix duplicated event joins?
No. It can reduce duplicate users in the final count, but it does not fix a wrong cohort anchor, wrong step order, or the choice of first versus repeated events. Distinct counting is a tool, not the funnel definition.
Should I use first purchase or any purchase within the window?
Use whichever matches the business rule, but state it explicitly. Many conversion metrics care about the first qualifying purchase within the window because the denominator is one user and the question is whether that user converted at all.