Quick summary
Summarize this blog with AI
Window functions get easier once you separate two jobs that look similar: grouping rows for the final answer and grouping rows for a calculation.
GROUP BY changes the grain of the result by collapsing many rows into one row per group. PARTITION BY defines the rows a window function can see while keeping the original rows in the output.
That distinction is the key to ranking, running totals, latest-row logic, deduplication, and percent-of-total questions.
The Short Version
| Feature | What it does | Output row count | Best for |
|---|---|---|---|
GROUP BY |
Collapses rows into one row per group | Usually fewer rows | Summaries, totals, counts by category |
PARTITION BY |
Groups rows for a window calculation without collapsing them | Usually the same number of rows as the input | Ranking, running totals, comparisons, percent of group, latest row |
If the final answer should have one row per customer, one row per department, or one row per month, reach for GROUP BY. If the answer should keep each order, employee, or event while adding group-level context, use a window function with PARTITION BY.
Start With One Small Table
Use this small order table:
orders
order_id | customer_id | order_date | amount
---------+-------------+------------+-------
1 | 10 | 2026-05-01 | 30
2 | 10 | 2026-05-04 | 45
3 | 20 | 2026-05-02 | 25
4 | 20 | 2026-05-06 | 80
5 | 20 | 2026-05-08 | 15
If the question is "What is each customer's total spend?", the answer should be one row per customer:
select
customer_id,
sum(amount) as customer_total
from orders
group by customer_id;
The result is collapsed:
customer_id | customer_total
------------+---------------
10 | 75
20 | 120
That is what GROUP BY is for. It changes the result from one row per order to one row per customer.
PARTITION BY Keeps the Order Rows
Now change the question: "Show every order, plus the customer's total spend."
A grouped query alone cannot answer that cleanly because it removes the order-level rows. A window aggregate keeps those rows and adds the group-level value:
select
order_id,
customer_id,
order_date,
amount,
sum(amount) over (
partition by customer_id
) as customer_total
from orders
order by customer_id, order_date;
The output still has one row per order:
order_id | customer_id | order_date | amount | customer_total
---------+-------------+------------+--------+---------------
1 | 10 | 2026-05-01 | 30 | 75
2 | 10 | 2026-05-04 | 45 | 75
3 | 20 | 2026-05-02 | 25 | 120
4 | 20 | 2026-05-06 | 80 | 120
5 | 20 | 2026-05-08 | 15 | 120
The partition is the set of rows with the same customer_id. The window function can see that set, but the query still returns the original order rows.
PARTITION BY Is Not the Same as GROUP BY
A useful shortcut is: PARTITION BY is GROUP BY without collapsing rows. That shortcut helps at first, but the more precise version is better.
GROUP BY defines the grain of the result. Once you group by customer, columns like order_id no longer make sense unless they are aggregated.
PARTITION BY defines the scope of a calculation. It does not decide the final grain by itself. The surrounding query still controls which rows are returned.
This is why interview answers should start with grain. If the prompt expects row-level detail and you aggregate too early, you lose information. If the prompt expects a summary table and you use a window function without filtering or grouping, you may return too many rows.
Add ORDER BY When Sequence Matters
PARTITION BY answers "which rows belong together?" The ORDER BY inside OVER (...) answers "in what sequence should the function read them?"
For a customer running total, both pieces matter:
select
order_id,
customer_id,
order_date,
amount,
sum(amount) over (
partition by customer_id
order by order_date, order_id
rows between unbounded preceding and current row
) as customer_running_total
from orders
order by customer_id, order_date, order_id;
The result changes row by row inside each customer partition:
order_id | customer_id | amount | customer_running_total
---------+-------------+--------+-----------------------
1 | 10 | 30 | 30
2 | 10 | 45 | 75
3 | 20 | 25 | 25
4 | 20 | 80 | 105
5 | 20 | 15 | 120
The explicit frame clause makes the running-total behavior clear. For more detail, read SQL window frames explained.
Ranking Is Where the Mental Model Clicks
Ranking functions are often the easiest place to see why PARTITION BY exists. They answer questions like:
- What is each customer's largest order?
- Who are the top three users in each region?
- Which event is first in each session?
- Which duplicate row should we keep?
This query ranks each customer's orders from largest to smallest:
select
order_id,
customer_id,
amount,
row_number() over (
partition by customer_id
order by amount desc, order_id
) as order_rank
from orders;
PARTITION BY customer_id restarts the ranking for each customer. ORDER BY amount desc, order_id defines the ranking order and gives ties a stable rule.
To keep only each customer's largest order, filter the ranked result:
with ranked_orders as (
select
order_id,
customer_id,
amount,
row_number() over (
partition by customer_id
order by amount desc, order_id
) as rn
from orders
)
select
order_id,
customer_id,
amount
from ranked_orders
where rn = 1;
This pattern is the foundation for latest-row, top-N, and deduplication questions. For tie behavior, read ROW_NUMBER vs RANK vs DENSE_RANK.
When You Need Both GROUP BY and PARTITION BY
Real queries often use both concepts in stages. First you aggregate to the grain you need. Then you use a window function over that aggregated result.
For example, calculate daily revenue and each day's share of monthly revenue:
with daily_revenue as (
select
date_trunc('day', order_date) as order_day,
date_trunc('month', order_date) as order_month,
sum(amount) as revenue
from orders
group by
date_trunc('day', order_date),
date_trunc('month', order_date)
)
select
order_day,
revenue,
revenue * 1.0 / sum(revenue) over (
partition by order_month
) as pct_of_month
from daily_revenue
order by order_day;
The GROUP BY creates one row per day. The window function compares each daily row to the monthly total without collapsing the daily rows.
Common Mistakes
Using GROUP BY too early. If you aggregate before preserving the row-level detail the question needs, you may not be able to recover the right answer.
Leaving ties nondeterministic. If two rows have the same sort value, add a stable tie-breaker such as an ID column. Otherwise ROW_NUMBER() can choose a row you did not expect.
Forgetting that ORDER BY changes aggregate windows. sum(amount) over (partition by customer_id) gives a whole-customer total. Add order by order_date and many databases produce a running total unless you specify a different frame.
Filtering window functions in the wrong place. In many databases, you cannot use a window function directly in WHERE. Put it in a CTE or subquery, then filter the outer result. Some dialects support QUALIFY, but not all do.
Interview Checklist
- State the result grain before writing SQL.
- Use
GROUP BYwhen the answer should be one row per group. - Use
PARTITION BYwhen each row needs group-level context. - Add window
ORDER BYfor ranking, running totals, first/last rows, and previous/next comparisons. - Add deterministic tie-breakers to ranking queries.
- Use a CTE or subquery when you need to filter on a window result.
FAQ
Can I use PARTITION BY without GROUP BY?
Yes. That is common. A window function can calculate group-aware values while preserving every row from the input.
Can I use GROUP BY and window functions in the same query?
Yes, but remember the order of operations. The window function operates on the rows available after grouping. When the logic gets hard to read, use a CTE to separate the grouped step from the window step.
Is PARTITION BY required for every window function?
No. If you omit PARTITION BY, the window is the entire result set. That is useful for overall ranks, grand totals, and percent-of-total calculations across all rows.
The Main Takeaway
Choose the syntax after you know the grain. If the answer should collapse rows, use GROUP BY. If the answer should preserve rows and add context, use a window function with PARTITION BY.
Once that distinction is clear, window functions stop feeling like a separate advanced topic. They become a practical way to keep row-level detail while answering group-level questions.