Quick summary
Summarize this blog with AI
Introduction
The phrase latest row per group sounds simple, but it hides three separate decisions: what defines a group, what defines "latest," and what should happen when two rows tie.
That is why this problem keeps showing up in interviews, dashboards, and production models. People write MAX(timestamp), get a believable answer, and only later realize they did not actually select the full row they wanted. In other cases they join the max timestamp back to the base table and quietly create duplicates.
If you separate the problem into group key, sort rule, and tie rule, the SQL becomes much more mechanical. This guide shows the safest default pattern, when to switch to RANK(), when QUALIFY helps, and when PostgreSQL's DISTINCT ON is worth using.
What the Request Is Really Asking
Before you write any SQL, translate the prompt into three explicit rules:
| Question | What you must decide | Example |
|---|---|---|
| What is one group? | The partition key | One customer, one account, one device, one product category |
| What makes a row latest or top-ranked? | The ordered sort keys | event_ts DESC, then event_id DESC |
| What should happen on ties? | One row only, or all tied rows | ROW_NUMBER() vs RANK() |
Strong SQL answers make those rules visible. Weak answers leave them implicit and hope the database makes the same choice the business wanted.
Why MAX(timestamp) Is Not Enough
This query only answers one part of the problem:
SELECT customer_id, MAX(order_ts) AS latest_order_ts
FROM orders
GROUP BY customer_id;
It tells you the latest timestamp per customer, but it does not tell you which status, amount, channel, or order ID belonged to that row. Aggregates return values, not whole records.
The usual next attempt is to join the max timestamp back to the base table:
WITH latest_ts AS (
SELECT customer_id, MAX(order_ts) AS latest_order_ts
FROM orders
GROUP BY customer_id
)
SELECT o.*
FROM orders o
JOIN latest_ts l
ON o.customer_id = l.customer_id
AND o.order_ts = l.latest_order_ts;
This works only if each customer has exactly one row at the latest timestamp. If two rows share that timestamp, you get two rows back. Sometimes that is correct. Often it is the exact bug you were trying to avoid.
If that join-back pattern keeps inflating your results, it is the same family of issue covered in why SQL joins create duplicate rows.
The Safest Default: ROW_NUMBER() with a Deterministic Sort
If the business wants exactly one row per group, start here:
WITH ranked_orders AS (
SELECT
o.*,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_ts DESC, order_id DESC
) AS rn
FROM orders o
)
SELECT
customer_id,
order_id,
order_ts,
status,
total_amount
FROM ranked_orders
WHERE rn = 1;
This pattern does three important things:
- It keeps the full row, not just the maximum timestamp.
- It guarantees one row per customer.
- It forces you to make the tie-break rule explicit.
The secondary sort key is not optional fluff. If two rows share the same order_ts, then order_id DESC makes the outcome deterministic. Without a stable tie-breaker, different executions may legally return different rows.
Deduplicate to the Right Business Grain First
Ranking works only if the input rows already match the business grain of the question. If your table contains repeated ingestions, multiple status events per second, or more than one row per logical event, rank after you reduce the data to the right grain.
Example: suppose an event stream contains multiple rows for the same order status because the pipeline retried inserts. If the business question is "latest status per order," first reduce duplicate status events, then rank the remaining rows.
WITH deduped_status AS (
SELECT DISTINCT
order_id,
status,
status_ts,
event_id
FROM order_status_events
), ranked_status AS (
SELECT
order_id,
status,
status_ts,
ROW_NUMBER() OVER (
PARTITION BY order_id
ORDER BY status_ts DESC, event_id DESC
) AS rn
FROM deduped_status
)
SELECT order_id, status, status_ts
FROM ranked_status
WHERE rn = 1;
If you skip the grain step, the ranking itself may be perfectly written and still answer the wrong question.
When to Use RANK() or DENSE_RANK() Instead
Use ROW_NUMBER() when the result must contain exactly one row per group or exactly N rows per group. Use RANK() when tied rows should all survive.
For example, "top 3 salaries per department" is ambiguous until you decide whether ties at third place should stay in the output.
WITH ranked_salaries AS (
SELECT
department,
employee_id,
salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS rnk
FROM payroll
)
SELECT department, employee_id, salary
FROM ranked_salaries
WHERE rnk <= 3;
This may return more than three rows for a department if multiple employees tie at the cutoff. That is not a bug. It is the point of using RANK().
If the request is "exactly the three most recent rows per user," switch back to ROW_NUMBER().
Top N Per Group Is the Same Pattern
Latest-row queries and top-N-per-group queries are the same family of problem. You still partition, sort, and filter on rank.
WITH ranked_events AS (
SELECT
user_id,
event_id,
event_ts,
event_type,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY event_ts DESC, event_id DESC
) AS rn
FROM user_events
)
SELECT user_id, event_id, event_ts, event_type
FROM ranked_events
WHERE rn <= 3
ORDER BY user_id, rn;
The only thing that changed is the filter from rn = 1 to rn <= 3.
QUALIFY Is a Shortcut, Not a Different Strategy
In BigQuery, Snowflake, and some other engines, you can filter a window-function result without wrapping it in a CTE:
SELECT
customer_id,
order_id,
order_ts,
total_amount
FROM orders
QUALIFY ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_ts DESC, order_id DESC
) = 1;
The logic is the same as the CTE version. QUALIFY only removes a layer of nesting. If your dialect does not support it, keep the ranking logic and move the filter into an outer query.
PostgreSQL Shortcut: DISTINCT ON
PostgreSQL gives you a concise alternative for one-row-per-group selection:
SELECT DISTINCT ON (customer_id)
customer_id,
order_id,
order_ts,
status,
total_amount
FROM orders
ORDER BY customer_id, order_ts DESC, order_id DESC;
This is compact and often fast, but it is PostgreSQL-specific. If you need portable SQL, interview-friendly SQL, or top-N output beyond one row, ROW_NUMBER() remains the better default.
Even if your engine does not support NULLS LAST, the principle stays the same: do not let single-row selection depend on accidental null ordering.
WITH ranked_orders AS (
SELECT
o.*,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_ts DESC NULLS LAST, order_id DESC
) AS rn
FROM orders o
WHERE order_ts IS NOT NULL
)
SELECT *
FROM ranked_orders
WHERE rn = 1;
- If null timestamps should be excluded, filter them out before ranking.
- If they should remain but lose to real timestamps, use an explicit null-ordering rule where supported.
- If null means something business-specific, encode that choice directly instead of letting the database default decide for you.
If your ordering column can be null, define the rule before you rank. A null timestamp might mean unknown, not latest.
NULL Timestamps and Missing Sort Keys
One PostgreSQL detail matters a lot: the DISTINCT ON expressions must match the leftmost ORDER BY expressions, and without a full ORDER BY the kept row is unpredictable.
| Engine | Best default | Why |
|---|---|---|
| PostgreSQL | DISTINCT ON or ROW_NUMBER() |
DISTINCT ON is concise for one-row-per-group; ROW_NUMBER() is more portable and clearer for top-N |
| BigQuery / Snowflake | QUALIFY ROW_NUMBER() ... = 1 |
Keeps the ranking filter in one readable query block |
| SQL Server / MySQL 8+ / most modern engines | ROW_NUMBER() in a CTE or subquery |
Portable and explicit about the winning row |
| Older MySQL | Usually upgrade to window functions if possible | Pre-window-function workarounds are harder to reason about and easier to break on ties |
Dialect Quick Reference
Common Anti-Patterns to Avoid
- Using
MAX()on every column.MAX(order_ts),MAX(status), andMAX(amount)do not guarantee those values came from the same row. - Ranking without a stable tie-breaker. If timestamps tie, add a second ordered key such as an ID or ingestion sequence.
- Joining the max timestamp back and then hiding duplicates with
DISTINCT. That usually masks a logic error instead of fixing it. - Ranking before fixing grain. If the input rows are duplicated or already denormalized, you may rank the wrong thing cleanly.
- Confusing one-row selection with top-N-with-ties. Pick the ranking function that matches the business rule.
Performance Notes That Actually Matter
If this query runs often on a large table, support the partition and sort keys with an index aligned to the access pattern. For example, a composite index like (customer_id, order_ts DESC, order_id DESC) can help a lot for latest-order lookups.
Also push filters as early as possible. If you only care about completed orders in the last 90 days, filter that before ranking so the window function works on fewer rows.
For very hot operational paths, some teams materialize a current-state table instead of recomputing the latest row on demand. That is an architectural choice, not a SQL trick, but it is often the correct one when the same latest-state logic powers many dashboards and APIs.
A Reliable Decision Checklist
- State the group key out loud.
- State the exact ordered sort keys, including the tie-breaker.
- Decide whether one row or all ties should survive.
- Make sure the input is already at the right business grain.
- Only then write the ranking query.
Most wrong answers happen before the SQL starts, because one of those rules was never defined clearly.
FAQ
Why not select MAX(timestamp), MAX(status), and MAX(amount) together?
Because aggregates operate column by column. The largest timestamp, largest amount, and lexicographically largest status do not have to come from the same record. If the business wants one real row, rank rows and keep the winning row.
When should I use RANK() instead of ROW_NUMBER()?
Use RANK() when tied rows should remain in the result, such as "top 3 scores including ties." Use ROW_NUMBER() when the result must contain exactly one row or exactly N rows per group.
Is DISTINCT ON better than ROW_NUMBER() in PostgreSQL?
It can be shorter and sometimes faster for one-row-per-group problems, but it is PostgreSQL-specific. If portability, readability across teams, or top-N logic matters, ROW_NUMBER() is usually the safer default.