SQL Cookbook Lesson
How to select the first row in each group by group with SQL
Learn how to select the first row in each group by group with SQL with examples and explanations from SQLPad.
Problem
How can I select the first row of each group in a SQL query?
Sample data
Consider a sales table as below:
| id | product | sale_date |
|---|---|---|
| 1 | A | 2020-01-01 |
| 2 | A | 2020-01-02 |
| 3 | B | 2020-01-01 |
| 4 | B | 2020-01-03 |
| 5 | C | 2020-01-01 |
| 6 | C | 2020-01-04 |
Here, we have multiple sales records for each product.
MySQL Solution
SELECT s1.*
FROM sales s1
LEFT JOIN sales s2
ON s1.product = s2.product AND s1.sale_date > s2.sale_date
WHERE s2.id IS NULL;
Explanation (MySQL)
In MySQL, the problem of selecting the first row in each group can be solved using a self join. The query joins the sales table to itself where product matches and the first table's sale date is later than the second's. The WHERE clause excludes rows where there is no smaller sale date, which means it only keeps the earliest sale for each product.
PostgreSQL Solution
SELECT DISTINCT ON (product) *
FROM sales
ORDER BY product, sale_date;
Explanation (PostgreSQL)
PostgreSQL provides a DISTINCT ON statement that returns the first record for each group of rows, where the groups are defined by the columns listed inside the parentheses. In this case, it will return the first record for each group of rows that have the same product. The ORDER BY clause after the DISTINCT ON statement is mandatory and it will determine which row is considered the "first" in each group. Here, we are ordering by sale_date, so the first sale for each product will be returned.