SQL Funnel Queries for Event Data: Daily Conversion

Quick summary

Summarize this blog with AI

Introduction

Funnel SQL goes wrong for a predictable reason: the raw event log is noisier than the metric sounds. Users can trigger the same step multiple times, later steps can happen before earlier steps in dirty data, and the date you group by is often not the date the business actually cares about.

That is why people struggle with prompts like "daily conversion rate from signup to purchase" even when they already know joins, window functions, and aggregates. The hard part is usually not the division. The hard part is defining the cohort and reducing the event stream to one meaningful row per user per step.

If you anchor the metric correctly, enforce step order, and keep the denominator tied to the first step, funnel queries become much more reliable. This guide shows the default pattern that holds up in both interviews and production analytics work.

Three Rules Before You Write SQL

Start with these rules before touching the keyboard:

Define the cohort anchor. Is the metric grouped by signup date, install date, trial-start date, or purchase date?
Reduce each step to one row per user. Raw events multiply rows and inflate counts.
Enforce step order and time windows. A purchase should not count if it happened before signup or outside the allowed window.

If the prompt still feels vague after that, treat it like a grain-and-metrics problem first. The same framing discipline from defining grain, metrics, and assumptions applies here too.

What a Funnel Metric Is Really Measuring

Metric shape	Correct anchor	Typical denominator	Typical numerator
7-day signup-to-purchase conversion	Signup date	Users whose first signup happened that day	Those same users with a purchase within 7 days
Activation rate after account creation	Account-creation date	New accounts	New accounts that activated within the rule
Operational same-day purchase rate	Calendar day of activity	Users who started the step that day	Users who completed the later step that day

The first two are cohort metrics. The third is an event-date metric. They are not interchangeable, even if the words sound similar.

Why Raw-Event Joins Inflate Counts

Suppose one user signs up once and purchases three times. A direct self-join on the events table turns that one user into three joined rows.

A naive query often looks like this:

SELECT
    CAST(s.event_ts AS DATE) AS signup_date,
    COUNT(DISTINCT s.user_id) AS signup_users,
    COUNT(DISTINCT p.user_id) AS purchase_users
FROM events s
LEFT JOIN events p
  ON p.user_id = s.user_id
 AND p.event_name = 'purchase'
WHERE s.event_name = 'signup'
GROUP BY CAST(s.event_ts AS DATE);

This is already fragile:

It does not guarantee the purchase happened after signup.
It does not guarantee you are using the user's first signup.
It still lets repeated signups and repeated purchases distort the logic before the distinct count even runs.

COUNT(DISTINCT user_id) can remove some duplication, but it does not repair a broken cohort definition.

Step 1: Build a One-Row-Per-User Step Table

The cleanest default is to collapse the raw event stream into one relevant timestamp per user per step.

WITH step_times AS (
    SELECT
        user_id,
        MIN(CASE WHEN event_name = 'signup' THEN event_ts END) AS signup_ts,
        MIN(CASE WHEN event_name = 'activated' THEN event_ts END) AS activated_ts,
        MIN(CASE WHEN event_name = 'purchase' THEN event_ts END) AS purchase_ts
    FROM events
    WHERE event_name IN ('signup', 'activated', 'purchase')
    GROUP BY user_id
)
SELECT *
FROM step_times;

This is usually the real foundation of the funnel. Once you have a user-step table, the final aggregation gets much easier to reason about.

If the business rule is not "first event wins," change the reduction logic accordingly. Sometimes you need first successful payment, first activation after signup, or any purchase within the window. The point is to make the choice explicit.

For a standard cohort metric, anchor the result on signup date and test whether purchase happened within the allowed window.

WITH signups AS (
    SELECT
        user_id,
        MIN(event_ts) AS signup_ts
    FROM events
    WHERE event_name = 'signup'
    GROUP BY user_id
), purchases AS (
    SELECT
        user_id,
        MIN(event_ts) AS purchase_ts
    FROM events
    WHERE event_name = 'purchase'
    GROUP BY user_id
)
SELECT
    CAST(s.signup_ts AS DATE) AS signup_date,
    COUNT(*) AS signup_users,
    COUNT(CASE
        WHEN p.purchase_ts >= s.signup_ts
         AND p.purchase_ts < s.signup_ts + INTERVAL '7 days'
        THEN 1 END) AS converted_users,
    ROUND(
        COUNT(CASE
            WHEN p.purchase_ts >= s.signup_ts
             AND p.purchase_ts < s.signup_ts + INTERVAL '7 days'
            THEN 1 END
        ) * 100.0 / COUNT(*),
        2
    ) AS conversion_rate_pct
FROM signups s
LEFT JOIN purchases p
  ON p.user_id = s.user_id
GROUP BY CAST(s.signup_ts AS DATE)
ORDER BY signup_date;

This works because the denominator is stable. Each signup user appears once, and the numerator is a filtered subset of that same cohort.

Why the Same-Day Ratio Is Usually the Wrong Answer

People often write something like this instead:

SELECT
    DATE(event_ts) AS dt,
    COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN user_id END)
    / COUNT(DISTINCT CASE WHEN event_name = 'signup' THEN user_id END)
FROM events
GROUP BY DATE(event_ts);

That ratio compares same-day purchase activity with same-day signup activity. It does not answer "what share of users who signed up on a given day converted within seven days?" Buyers from older cohorts will leak into the numerator, and ordering between steps is not checked at all.

The result may be a useful operational metric, but it is not a signup-cohort funnel.

Pattern 2: Multi-Step Funnel with Ordered Steps

Now suppose the steps are signup, activated, and purchase. You can keep the same one-row-per-user foundation and layer conditions on top:

WITH step_times AS (
    SELECT
        user_id,
        MIN(CASE WHEN event_name = 'signup' THEN event_ts END) AS signup_ts,
        MIN(CASE WHEN event_name = 'activated' THEN event_ts END) AS activated_ts,
        MIN(CASE WHEN event_name = 'purchase' THEN event_ts END) AS purchase_ts
    FROM events
    WHERE event_name IN ('signup', 'activated', 'purchase')
    GROUP BY user_id
)
SELECT
    CAST(signup_ts AS DATE) AS signup_date,
    COUNT(*) AS signup_users,
    COUNT(CASE
        WHEN activated_ts >= signup_ts THEN 1 END) AS activated_users,
    COUNT(CASE
        WHEN activated_ts >= signup_ts
         AND purchase_ts >= activated_ts
        THEN 1 END) AS purchased_after_activation_users
FROM step_times
WHERE signup_ts IS NOT NULL
GROUP BY CAST(signup_ts AS DATE)
ORDER BY signup_date;

This is the key difference between funnel SQL and generic distinct counting: each later step is checked against the timestamps of the earlier steps, not counted in isolation.

When You Need Time Windows

Many funnels are really "ordered steps inside a deadline." For example:

activation within 3 days of signup
purchase within 14 days of activation
second session within 7 days of install

Add those windows directly at the user-step level:

COUNT(CASE
    WHEN activated_ts >= signup_ts
     AND activated_ts < signup_ts + INTERVAL '3 days'
    THEN 1 END) AS activated_in_3d

This keeps the cohort definition clean while making the business rule obvious in the query.

`COUNT(DISTINCT)` Is Helpful, but It Is Not the Fix

COUNT(DISTINCT user_id) is useful when the current input may still contain multiple rows per user after a necessary join. It is not a substitute for:

anchoring the cohort on the right first step
reducing repeated events intentionally
checking that later steps happen after earlier steps
grouping by the correct date

If those rules are wrong, distinct counting can make the numbers look cleaner while the metric remains wrong.

Build the funnel on a stable identity key if the product has pre-login and post-login events.
Use event time for cohort logic and ingestion time for data-quality checks.
When dashboards refresh during an active backfill, expect recent cohorts to shift until the pipeline catches up.

Another production issue is late-arriving data. Funnel semantics should usually use the business event timestamp, not the ingestion timestamp. Keep ingestion time for freshness monitoring, but do not let delayed backfills silently move users into the wrong cohort date.

Real event data often uses more than one identifier. A user may browse anonymously, sign up with an anonymous_id, and later purchase under a logged-in user_id. If those identifiers are not stitched to a stable person key before funnel logic runs, the numerator and denominator can describe different people even when the SQL looks correct.

Identity Stitching and Late-Arriving Events

This is only safe if users is one row per user or if you have already defined which user snapshot should win. If channel or plan changes over time, you still need an explicit rule for which value belongs to the cohort.

WITH step_times AS (
    SELECT
        user_id,
        MIN(CASE WHEN event_name = 'signup' THEN event_ts END) AS signup_ts,
        MIN(CASE WHEN event_name = 'purchase' THEN event_ts END) AS purchase_ts
    FROM events
    WHERE event_name IN ('signup', 'purchase')
    GROUP BY user_id
)
SELECT
    CAST(s.signup_ts AS DATE) AS signup_date,
    u.acquisition_channel,
    COUNT(*) AS signup_users,
    COUNT(CASE
        WHEN s.purchase_ts >= s.signup_ts
         AND s.purchase_ts < s.signup_ts + INTERVAL '7 days'
        THEN 1 END) AS converted_users
FROM step_times s
JOIN users u
  ON u.user_id = s.user_id
WHERE s.signup_ts IS NOT NULL
GROUP BY CAST(s.signup_ts AS DATE), u.acquisition_channel;

If you need a funnel by acquisition channel, country, plan, or account segment, attach those dimensions after you collapse events to one row per user. Joining extra event tables too early recreates the same row multiplication you were trying to avoid.

Join Dimensions After the Step Table, Not Before

Edge Cases Worth Calling Out Explicitly

Multiple signups: decide whether to use the first signup ever, the first signup in the reporting window, or the latest reactivation.
Failed or test events: exclude them before building the step table.
Timezone rules: be explicit about whether the cohort date is UTC or business-local time.
Current partial day or week: exclude incomplete cohorts when the business wants mature conversion windows.
Same-timestamp events: if event order matters and timestamps tie, add a stable secondary ordering field.

These are the same kinds of edge cases that also break retention queries and cohort analysis. Funnel logic is just another version of the same discipline.

A Practical Debug Checklist

Write down the denominator in plain English first.
Make sure each user appears once in that denominator.
Check whether the numerator is a subset of that same cohort.
Verify step order with timestamps, not assumptions.
Check that the grouped date matches the metric definition.
Only then calculate the rate.

If the numbers still look suspicious, inspect a handful of user-level examples before changing the aggregation. Funnel bugs are usually easier to spot at the per-user level than in the final percentage.

FAQ

Should I group by signup date or purchase date?

Group by the date of the cohort-start event when the question is cohort conversion, such as signup-to-purchase within 7 days. Group by purchase date only when the business explicitly wants an event-date metric.

Is `COUNT(DISTINCT user_id)` enough to fix duplicated event joins?

No. It can reduce duplicate users in the final count, but it does not fix a wrong cohort anchor, wrong step order, or the choice of first versus repeated events. Distinct counting is a tool, not the funnel definition.

Should I use first purchase or any purchase within the window?

Use whichever matches the business rule, but state it explicitly. Many conversion metrics care about the first qualifying purchase within the window because the denominator is one user and the question is whether that user converted at all.

SQL Funnel Queries for Event Data: Daily Conversion Rates Without Inflated Counts

Summarize this blog with AI

Introduction

Three Rules Before You Write SQL

What a Funnel Metric Is Really Measuring

Why Raw-Event Joins Inflate Counts

Step 1: Build a One-Row-Per-User Step Table

Why the Same-Day Ratio Is Usually the Wrong Answer

Pattern 2: Multi-Step Funnel with Ordered Steps

When You Need Time Windows

`COUNT(DISTINCT)` Is Helpful, but It Is Not the Fix

Identity Stitching and Late-Arriving Events

Join Dimensions After the Step Table, Not Before

Edge Cases Worth Calling Out Explicitly

A Practical Debug Checklist

FAQ

Should I group by signup date or purchase date?

Is `COUNT(DISTINCT user_id)` enough to fix duplicated event joins?

Should I use first purchase or any purchase within the window?

Begin Your SQL, Python, and R Journey

How to Read SQL EXPLAIN Plans Without Guessing: Costs, Rows, EXPLAIN ANALYZE, and the Bottleneck That Actually Matters

SQL NULL Logic for Interviews: NOT IN, NOT EXISTS, and LEFT JOIN Filters Without Silent Bugs

SQL Date and Timestamp Interview Questions: Parsing, Bucketing, and Filtering Without Off-by-One Mistakes

Why SQL Joins Create Duplicate Rows and How to Fix Them Without Hiding the Problem

SQL Funnel Queries for Event Data: Daily Conversion Rates Without Inflated Counts

Summarize this blog with AI

Introduction

Three Rules Before You Write SQL

What a Funnel Metric Is Really Measuring

Why Raw-Event Joins Inflate Counts

Step 1: Build a One-Row-Per-User Step Table

Pattern 1: Daily 7-Day Signup-to-Purchase Conversion

Why the Same-Day Ratio Is Usually the Wrong Answer

Pattern 2: Multi-Step Funnel with Ordered Steps

When You Need Time Windows

COUNT(DISTINCT) Is Helpful, but It Is Not the Fix

Identity Stitching and Late-Arriving Events

Join Dimensions After the Step Table, Not Before

Edge Cases Worth Calling Out Explicitly

A Practical Debug Checklist

FAQ

Should I group by signup date or purchase date?

Is COUNT(DISTINCT user_id) enough to fix duplicated event joins?

Should I use first purchase or any purchase within the window?

Begin Your SQL, Python, and R Journey

How to Read SQL EXPLAIN Plans Without Guessing: Costs, Rows, EXPLAIN ANALYZE, and the Bottleneck That Actually Matters

SQL NULL Logic for Interviews: NOT IN, NOT EXISTS, and LEFT JOIN Filters Without Silent Bugs

SQL Date and Timestamp Interview Questions: Parsing, Bucketing, and Filtering Without Off-by-One Mistakes

Why SQL Joins Create Duplicate Rows and How to Fix Them Without Hiding the Problem

`COUNT(DISTINCT)` Is Helpful, but It Is Not the Fix

Is `COUNT(DISTINCT user_id)` enough to fix duplicated event joins?