SQL Latest Row Per Group: ROW_NUMBER, QUALIFY

Q: Why not select MAX(timestamp) , MAX(status) , and MAX(amount) together?

Because aggregates operate column by column. The largest timestamp, largest amount, and lexicographically largest status do not have to come from the same record. If the business wants one real row, rank rows and keep the winning row.

Q: When should I use RANK() instead of ROW_NUMBER() ?

Use RANK() when tied rows should remain in the result, such as "top 3 scores including ties." Use ROW_NUMBER() when the result must contain exactly one row or exactly N rows per group.

Q: Is DISTINCT ON better than ROW_NUMBER() in PostgreSQL?

It can be shorter and sometimes faster for one-row-per-group problems, but it is PostgreSQL-specific. If portability, readability across teams, or top-N logic matters, ROW_NUMBER() is usually the safer default.

Quick summary

Summarize this blog with AI

ChatGPT Claude Grok Perplexity

Introduction

The phrase latest row per group sounds simple, but it hides three separate decisions: what defines a group, what defines "latest," and what should happen when two rows tie.

That is why this problem keeps showing up in interviews, dashboards, and production models. People write MAX(timestamp), get a believable answer, and only later realize they did not actually select the full row they wanted. In other cases they join the max timestamp back to the base table and quietly create duplicates.

If you separate the problem into group key, sort rule, and tie rule, the SQL becomes much more mechanical. This guide shows the safest default pattern, when to switch to RANK(), when QUALIFY helps, and when PostgreSQL's DISTINCT ON is worth using.

What the Request Is Really Asking

Before you write any SQL, translate the prompt into three explicit rules:

Question	What you must decide	Example
What is one group?	The partition key	One customer, one account, one device, one product category
What makes a row latest or top-ranked?	The ordered sort keys	`event_ts DESC`, then `event_id DESC`
What should happen on ties?	One row only, or all tied rows	`ROW_NUMBER()` vs `RANK()`

Strong SQL answers make those rules visible. Weak answers leave them implicit and hope the database makes the same choice the business wanted.

Why `MAX(timestamp)` Is Not Enough

This query only answers one part of the problem:

SELECT customer_id, MAX(order_ts) AS latest_order_ts
FROM orders
GROUP BY customer_id;

It tells you the latest timestamp per customer, but it does not tell you which status, amount, channel, or order ID belonged to that row. Aggregates return values, not whole records.

The usual next attempt is to join the max timestamp back to the base table:

WITH latest_ts AS (
    SELECT customer_id, MAX(order_ts) AS latest_order_ts
    FROM orders
    GROUP BY customer_id
)
SELECT o.*
FROM orders o
JOIN latest_ts l
  ON o.customer_id = l.customer_id
 AND o.order_ts = l.latest_order_ts;

This works only if each customer has exactly one row at the latest timestamp. If two rows share that timestamp, you get two rows back. Sometimes that is correct. Often it is the exact bug you were trying to avoid.

If that join-back pattern keeps inflating your results, it is the same family of issue covered in why SQL joins create duplicate rows.

The Safest Default: `ROW_NUMBER()` with a Deterministic Sort

If the business wants exactly one row per group, start here:

WITH ranked_orders AS (
    SELECT
        o.*,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id
            ORDER BY order_ts DESC, order_id DESC
        ) AS rn
    FROM orders o
)
SELECT
    customer_id,
    order_id,
    order_ts,
    status,
    total_amount
FROM ranked_orders
WHERE rn = 1;

This pattern does three important things:

It keeps the full row, not just the maximum timestamp.
It guarantees one row per customer.
It forces you to make the tie-break rule explicit.

The secondary sort key is not optional fluff. If two rows share the same order_ts, then order_id DESC makes the outcome deterministic. Without a stable tie-breaker, different executions may legally return different rows.

Deduplicate to the Right Business Grain First

Ranking works only if the input rows already match the business grain of the question. If your table contains repeated ingestions, multiple status events per second, or more than one row per logical event, rank after you reduce the data to the right grain.

Example: suppose an event stream contains multiple rows for the same order status because the pipeline retried inserts. If the business question is "latest status per order," first reduce duplicate status events, then rank the remaining rows.

WITH deduped_status AS (
    SELECT DISTINCT
        order_id,
        status,
        status_ts,
        event_id
    FROM order_status_events
), ranked_status AS (
    SELECT
        order_id,
        status,
        status_ts,
        ROW_NUMBER() OVER (
            PARTITION BY order_id
            ORDER BY status_ts DESC, event_id DESC
        ) AS rn
    FROM deduped_status
)
SELECT order_id, status, status_ts
FROM ranked_status
WHERE rn = 1;

If you skip the grain step, the ranking itself may be perfectly written and still answer the wrong question.

When to Use `RANK()` or `DENSE_RANK()` Instead

Use ROW_NUMBER() when the result must contain exactly one row per group or exactly N rows per group. Use RANK() when tied rows should all survive.

For example, "top 3 salaries per department" is ambiguous until you decide whether ties at third place should stay in the output.

WITH ranked_salaries AS (
    SELECT
        department,
        employee_id,
        salary,
        RANK() OVER (
            PARTITION BY department
            ORDER BY salary DESC
        ) AS rnk
    FROM payroll
)
SELECT department, employee_id, salary
FROM ranked_salaries
WHERE rnk <= 3;

This may return more than three rows for a department if multiple employees tie at the cutoff. That is not a bug. It is the point of using RANK().

If the request is "exactly the three most recent rows per user," switch back to ROW_NUMBER().

Top N Per Group Is the Same Pattern

Latest-row queries and top-N-per-group queries are the same family of problem. You still partition, sort, and filter on rank.

WITH ranked_events AS (
    SELECT
        user_id,
        event_id,
        event_ts,
        event_type,
        ROW_NUMBER() OVER (
            PARTITION BY user_id
            ORDER BY event_ts DESC, event_id DESC
        ) AS rn
    FROM user_events
)
SELECT user_id, event_id, event_ts, event_type
FROM ranked_events
WHERE rn <= 3
ORDER BY user_id, rn;

The only thing that changed is the filter from rn = 1 to rn <= 3.

`QUALIFY` Is a Shortcut, Not a Different Strategy

In BigQuery, Snowflake, and some other engines, you can filter a window-function result without wrapping it in a CTE:

SELECT
    customer_id,
    order_id,
    order_ts,
    total_amount
FROM orders
QUALIFY ROW_NUMBER() OVER (
    PARTITION BY customer_id
    ORDER BY order_ts DESC, order_id DESC
) = 1;

The logic is the same as the CTE version. QUALIFY only removes a layer of nesting. If your dialect does not support it, keep the ranking logic and move the filter into an outer query.

PostgreSQL Shortcut: `DISTINCT ON`

PostgreSQL gives you a concise alternative for one-row-per-group selection:

SELECT DISTINCT ON (customer_id)
    customer_id,
    order_id,
    order_ts,
    status,
    total_amount
FROM orders
ORDER BY customer_id, order_ts DESC, order_id DESC;

This is compact and often fast, but it is PostgreSQL-specific. If you need portable SQL, interview-friendly SQL, or top-N output beyond one row, ROW_NUMBER() remains the better default.

Even if your engine does not support NULLS LAST, the principle stays the same: do not let single-row selection depend on accidental null ordering.

WITH ranked_orders AS (
    SELECT
        o.*,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id
            ORDER BY order_ts DESC NULLS LAST, order_id DESC
        ) AS rn
    FROM orders o
    WHERE order_ts IS NOT NULL
)
SELECT *
FROM ranked_orders
WHERE rn = 1;

If null timestamps should be excluded, filter them out before ranking.
If they should remain but lose to real timestamps, use an explicit null-ordering rule where supported.
If null means something business-specific, encode that choice directly instead of letting the database default decide for you.

If your ordering column can be null, define the rule before you rank. A null timestamp might mean unknown, not latest.

NULL Timestamps and Missing Sort Keys

One PostgreSQL detail matters a lot: the DISTINCT ON expressions must match the leftmost ORDER BY expressions, and without a full ORDER BY the kept row is unpredictable.

Engine	Best default	Why
PostgreSQL	`DISTINCT ON` or `ROW_NUMBER()`	`DISTINCT ON` is concise for one-row-per-group; `ROW_NUMBER()` is more portable and clearer for top-N
BigQuery / Snowflake	`QUALIFY ROW_NUMBER() ... = 1`	Keeps the ranking filter in one readable query block
SQL Server / MySQL 8+ / most modern engines	`ROW_NUMBER()` in a CTE or subquery	Portable and explicit about the winning row
Older MySQL	Usually upgrade to window functions if possible	Pre-window-function workarounds are harder to reason about and easier to break on ties

Dialect Quick Reference

Common Anti-Patterns to Avoid

Using MAX() on every column. MAX(order_ts), MAX(status), and MAX(amount) do not guarantee those values came from the same row.
Ranking without a stable tie-breaker. If timestamps tie, add a second ordered key such as an ID or ingestion sequence.
Joining the max timestamp back and then hiding duplicates with DISTINCT. That usually masks a logic error instead of fixing it.
Ranking before fixing grain. If the input rows are duplicated or already denormalized, you may rank the wrong thing cleanly.
Confusing one-row selection with top-N-with-ties. Pick the ranking function that matches the business rule.

Performance Notes That Actually Matter

If this query runs often on a large table, support the partition and sort keys with an index aligned to the access pattern. For example, a composite index like (customer_id, order_ts DESC, order_id DESC) can help a lot for latest-order lookups.

Also push filters as early as possible. If you only care about completed orders in the last 90 days, filter that before ranking so the window function works on fewer rows.

For very hot operational paths, some teams materialize a current-state table instead of recomputing the latest row on demand. That is an architectural choice, not a SQL trick, but it is often the correct one when the same latest-state logic powers many dashboards and APIs.

A Reliable Decision Checklist

State the group key out loud.
State the exact ordered sort keys, including the tie-breaker.
Decide whether one row or all ties should survive.
Make sure the input is already at the right business grain.
Only then write the ranking query.

Most wrong answers happen before the SQL starts, because one of those rules was never defined clearly.

FAQ

Why not select `MAX(timestamp)`, `MAX(status)`, and `MAX(amount)` together?

Because aggregates operate column by column. The largest timestamp, largest amount, and lexicographically largest status do not have to come from the same record. If the business wants one real row, rank rows and keep the winning row.

When should I use `RANK()` instead of `ROW_NUMBER()`?

Use RANK() when tied rows should remain in the result, such as "top 3 scores including ties." Use ROW_NUMBER() when the result must contain exactly one row or exactly N rows per group.

Is `DISTINCT ON` better than `ROW_NUMBER()` in PostgreSQL?

It can be shorter and sometimes faster for one-row-per-group problems, but it is PostgreSQL-specific. If portability, readability across teams, or top-N logic matters, ROW_NUMBER() is usually the safer default.

SQL Latest Row Per Group: ROW_NUMBER, QUALIFY, and Tie-Breakers That Keep Results Deterministic

Summarize this blog with AI

Introduction

What the Request Is Really Asking

Why `MAX(timestamp)` Is Not Enough

The Safest Default: `ROW_NUMBER()` with a Deterministic Sort

Deduplicate to the Right Business Grain First

When to Use `RANK()` or `DENSE_RANK()` Instead

Top N Per Group Is the Same Pattern

`QUALIFY` Is a Shortcut, Not a Different Strategy

PostgreSQL Shortcut: `DISTINCT ON`

NULL Timestamps and Missing Sort Keys

Dialect Quick Reference

Common Anti-Patterns to Avoid

Performance Notes That Actually Matter

A Reliable Decision Checklist

FAQ

Why not select `MAX(timestamp)`, `MAX(status)`, and `MAX(amount)` together?

When should I use `RANK()` instead of `ROW_NUMBER()`?

Is `DISTINCT ON` better than `ROW_NUMBER()` in PostgreSQL?

Begin Your SQL, Python, and R Journey

Does SQL LIMIT Read All Rows? ORDER BY, Indexes, and Keyset Pagination

SQL Window Frames Explained: ROWS vs RANGE, Running Totals, and the Default That Changes Your Results

Mastering SQL Row Deletion for Interviews

4 SQL Window Function Patterns You Need for Interviews

SQL Latest Row Per Group: ROW_NUMBER, QUALIFY, and Tie-Breakers That Keep Results Deterministic

Summarize this blog with AI

Introduction

What the Request Is Really Asking

Why MAX(timestamp) Is Not Enough

The Safest Default: ROW_NUMBER() with a Deterministic Sort

Deduplicate to the Right Business Grain First

When to Use RANK() or DENSE_RANK() Instead

Top N Per Group Is the Same Pattern

QUALIFY Is a Shortcut, Not a Different Strategy

PostgreSQL Shortcut: DISTINCT ON

NULL Timestamps and Missing Sort Keys

Dialect Quick Reference

Common Anti-Patterns to Avoid

Performance Notes That Actually Matter

A Reliable Decision Checklist

FAQ

Why not select MAX(timestamp), MAX(status), and MAX(amount) together?

When should I use RANK() instead of ROW_NUMBER()?

Is DISTINCT ON better than ROW_NUMBER() in PostgreSQL?

Begin Your SQL, Python, and R Journey

Does SQL LIMIT Read All Rows? ORDER BY, Indexes, and Keyset Pagination

SQL Window Frames Explained: ROWS vs RANGE, Running Totals, and the Default That Changes Your Results

Mastering SQL Row Deletion for Interviews

4 SQL Window Function Patterns You Need for Interviews

Why `MAX(timestamp)` Is Not Enough

The Safest Default: `ROW_NUMBER()` with a Deterministic Sort

When to Use `RANK()` or `DENSE_RANK()` Instead

`QUALIFY` Is a Shortcut, Not a Different Strategy

PostgreSQL Shortcut: `DISTINCT ON`

Why not select `MAX(timestamp)`, `MAX(status)`, and `MAX(amount)` together?

When should I use `RANK()` instead of `ROW_NUMBER()`?

Is `DISTINCT ON` better than `ROW_NUMBER()` in PostgreSQL?