Retail SQL Interview Case Study: Product and Category

Quick summary

Summarize this blog with AI

Retail SQL interviews often feel different from generic SQL practice. The interviewer may not ask for an abstract ranking query. They may ask how sales changed by category, whether promotions helped, which products are carrying revenue, or why a dashboard number looks wrong.

That is why retail cases are useful for analyst interview prep. They test SQL mechanics, but they also test metric judgment. You need to define revenue, handle cancelled orders, avoid duplicate joins, compare periods fairly, and explain what the business should do next.

This case study gives you a realistic practice scenario with schema, questions, SQL patterns, and interpretation notes.

The business scenario

You are supporting a retail merchandising team. The team wants to understand product and category performance for the last eight weeks. They care about revenue, units sold, average order value, margin, top products, and inventory risk.

The stakeholder asks:

Which categories are performing well, which are weakening, and where should we investigate pricing, promotion, or inventory issues?

Your job is not only to write queries. Your job is to produce numbers the team can trust.

Schema

Assume these tables:

orders
------
order_id
customer_id
order_ts
status              -- completed, cancelled, refunded
channel             -- web, store, marketplace

order_items
-----------
order_id
product_id
quantity
unit_price
unit_cost
discount_amount

products
--------
product_id
product_name
category
subcategory

inventory_snapshots
-------------------
snapshot_date
product_id
on_hand_units

The useful analytical grain is usually one row per order item. Revenue and margin live at item level, while customer and order counts require distinct counts.

Start with a clean fact table

Before answering business questions, create a clean base query. This makes the rest of the analysis safer.

with fact_order_items as (
    select
        o.order_id,
        o.customer_id,
        o.order_ts,
        date_trunc('week', o.order_ts)::date as week_start,
        o.channel,
        p.product_id,
        p.product_name,
        p.category,
        p.subcategory,
        oi.quantity,
        oi.unit_price,
        oi.unit_cost,
        oi.discount_amount,
        oi.quantity * oi.unit_price - coalesce(oi.discount_amount, 0) as net_revenue,
        oi.quantity * oi.unit_cost as cost,
        oi.quantity * oi.unit_price - coalesce(oi.discount_amount, 0)
            - oi.quantity * oi.unit_cost as gross_margin
    from orders o
    join order_items oi
        on o.order_id = oi.order_id
    join products p
        on oi.product_id = p.product_id
    where o.status = 'completed'
)
select *
from fact_order_items;

Notice the explicit status filter. If refunded orders should reduce revenue instead of being excluded, document that assumption and adjust the logic. In a real interview, asking this question is a strength.

Question 1: Category sales and margin

First, summarize category performance.

with fact_order_items as (
    select
        o.order_id,
        o.customer_id,
        o.order_ts,
        p.category,
        oi.quantity,
        oi.quantity * oi.unit_price - coalesce(oi.discount_amount, 0) as net_revenue,
        oi.quantity * oi.unit_cost as cost,
        oi.quantity * oi.unit_price - coalesce(oi.discount_amount, 0)
            - oi.quantity * oi.unit_cost as gross_margin
    from orders o
    join order_items oi on o.order_id = oi.order_id
    join products p on oi.product_id = p.product_id
    where o.status = 'completed'
      and o.order_ts >= current_date - interval '8 weeks'
)
select
    category,
    count(distinct order_id) as orders,
    count(distinct customer_id) as customers,
    sum(quantity) as units_sold,
    sum(net_revenue) as revenue,
    sum(gross_margin) as gross_margin,
    sum(gross_margin) / nullif(sum(net_revenue), 0) as gross_margin_rate
from fact_order_items
group by category
order by revenue desc;

Interpretation: high revenue with low or falling margin is a pricing or discounting question. Low revenue with high margin may be a distribution, awareness, or inventory question. Do not stop at "category A is highest revenue." Ask whether it is profitable and whether it is growing.

Question 2: Week-over-week category growth

Retail teams often care about trend, not just total volume.

with weekly_category as (
    select
        date_trunc('week', o.order_ts)::date as week_start,
        p.category,
        sum(oi.quantity * oi.unit_price - coalesce(oi.discount_amount, 0)) as revenue,
        sum(oi.quantity) as units_sold
    from orders o
    join order_items oi on o.order_id = oi.order_id
    join products p on oi.product_id = p.product_id
    where o.status = 'completed'
      and o.order_ts >= current_date - interval '8 weeks'
    group by 1, 2
),
with_growth as (
    select
        week_start,
        category,
        revenue,
        units_sold,
        lag(revenue) over (
            partition by category
            order by week_start
        ) as previous_week_revenue
    from weekly_category
)
select
    week_start,
    category,
    revenue,
    previous_week_revenue,
    revenue - previous_week_revenue as revenue_change,
    (revenue - previous_week_revenue)
        / nullif(previous_week_revenue, 0) as revenue_growth_rate
from with_growth
order by week_start, category;

Interpretation: a growth rate with a tiny base can mislead. A category growing from $100 to $300 has 200% growth, but may not matter as much as a large category declining by 8%. Show both absolute and percentage change.

Question 3: Top products within each category

This is a common interview pattern: top N per group. Use a window function.

with product_sales as (
    select
        p.category,
        p.product_id,
        p.product_name,
        sum(oi.quantity) as units_sold,
        sum(oi.quantity * oi.unit_price - coalesce(oi.discount_amount, 0)) as revenue
    from orders o
    join order_items oi on o.order_id = oi.order_id
    join products p on oi.product_id = p.product_id
    where o.status = 'completed'
      and o.order_ts >= current_date - interval '8 weeks'
    group by 1, 2, 3
),
ranked as (
    select
        *,
        row_number() over (
            partition by category
            order by revenue desc, product_id
        ) as product_rank
    from product_sales
)
select
    category,
    product_id,
    product_name,
    units_sold,
    revenue,
    product_rank
from ranked
where product_rank <= 5
order by category, product_rank;

Use a deterministic tie-breaker such as product_id. In interviews, this shows attention to repeatable results.

Question 4: Average order value by channel

Average order value should be calculated at order grain, not item grain. This is a classic denominator trap.

with order_revenue as (
    select
        o.order_id,
        o.channel,
        sum(oi.quantity * oi.unit_price - coalesce(oi.discount_amount, 0)) as order_revenue
    from orders o
    join order_items oi on o.order_id = oi.order_id
    where o.status = 'completed'
      and o.order_ts >= current_date - interval '8 weeks'
    group by o.order_id, o.channel
)
select
    channel,
    count(*) as orders,
    sum(order_revenue) as revenue,
    avg(order_revenue) as average_order_value
from order_revenue
group by channel
order by revenue desc;

If you calculate avg(quantity * unit_price) directly from order items, you are calculating average item revenue, not average order value. Interviewers often look for this distinction.

Question 5: Products with sales momentum and low inventory

Retail analysis becomes more useful when sales are connected to operational constraints. Here, compare recent units sold to latest inventory.

with recent_product_sales as (
    select
        p.product_id,
        p.product_name,
        p.category,
        sum(oi.quantity) as units_sold_14d
    from orders o
    join order_items oi on o.order_id = oi.order_id
    join products p on oi.product_id = p.product_id
    where o.status = 'completed'
      and o.order_ts >= current_date - interval '14 days'
    group by 1, 2, 3
),
latest_inventory as (
    select product_id, snapshot_date, on_hand_units
    from (
        select
            product_id,
            snapshot_date,
            on_hand_units,
            row_number() over (
                partition by product_id
                order by snapshot_date desc
            ) as rn
        from inventory_snapshots
    ) inv
    where rn = 1
)
select
    s.category,
    s.product_id,
    s.product_name,
    s.units_sold_14d,
    coalesce(i.on_hand_units, 0) as on_hand_units,
    coalesce(i.on_hand_units, 0)::numeric / nullif(s.units_sold_14d, 0) as inventory_to_sales_ratio
from recent_product_sales s
left join latest_inventory i
    on s.product_id = i.product_id
where s.units_sold_14d > 0
order by inventory_to_sales_ratio asc, units_sold_14d desc;

Interpretation: low inventory-to-sales ratio can indicate stockout risk. But it is not proof by itself. You would want replenishment dates, lead times, and forecast demand before making a final buying decision.

Question 6: Did discounts help or hurt?

Discount analysis is where candidates often overstate causation. Start with a descriptive comparison.

with item_metrics as (
    select
        p.category,
        case
            when coalesce(oi.discount_amount, 0) > 0 then 'discounted'
            else 'not_discounted'
        end as discount_status,
        oi.quantity,
        oi.quantity * oi.unit_price as gross_revenue,
        coalesce(oi.discount_amount, 0) as discount_amount,
        oi.quantity * oi.unit_price - coalesce(oi.discount_amount, 0) as net_revenue,
        oi.quantity * oi.unit_cost as cost
    from orders o
    join order_items oi on o.order_id = oi.order_id
    join products p on oi.product_id = p.product_id
    where o.status = 'completed'
      and o.order_ts >= current_date - interval '8 weeks'
)
select
    category,
    discount_status,
    sum(quantity) as units_sold,
    sum(net_revenue) as net_revenue,
    sum(discount_amount) as total_discount,
    sum(net_revenue - cost) as gross_margin,
    sum(net_revenue - cost) / nullif(sum(net_revenue), 0) as margin_rate
from item_metrics
group by category, discount_status
order by category, discount_status;

Interpretation: discounted items may sell more units and still reduce margin. They may also be discounted because demand was already weak. Treat this as a starting point, then ask for campaign timing, product seasonality, and a control group.

What to say to the interviewer

A strong answer is not just SQL. After showing the result, summarize the business interpretation:

Category revenue is highest in electronics, but margin rate is falling and the decline is concentrated in discounted items. Home goods has lower revenue but healthier margin and steady week-over-week growth. I would investigate whether electronics discounts are clearing excess inventory or simply training customers to wait for promotions. I would also check stockout risk for the top five home goods products because recent sales are high relative to inventory.

This kind of response connects metrics to decisions. That is what many retail analyst roles need.

Common mistakes in retail SQL cases

Counting item rows as orders. Use count(distinct order_id) when the table is at item grain.
Calculating AOV from item rows. Aggregate to order level first.
Ignoring cancelled or refunded orders. Ask how they should be handled.
Ranking without tie-breakers. Add deterministic ordering for repeatable results.
Comparing incomplete time periods. Make sure week or month comparisons are fair.
Claiming discounts caused growth. SQL can describe the pattern; causation needs stronger design.

How to practice this case

Practice in three passes:

Pass 1: write each query until it runs.
Pass 2: explain the grain, denominator, and filters out loud.
Pass 3: turn the output into a business recommendation.

If you can do all three, you are preparing for the real interview, not just memorizing SQL patterns.

FAQ

Which SQL concepts does this case test?

It tests joins, aggregation, distinct counts, window functions, date filtering, CTEs, NULL handling, and metric grain. It also tests whether you can explain revenue, margin, AOV, and trend correctly.

Should refunded orders be excluded?

It depends on the business definition. Some teams exclude refunded orders from completed sales. Others include negative revenue adjustments. In an interview, state your assumption and ask how the company defines net revenue.

What if the interviewer wants a dashboard?

Use the SQL queries to create metric tables first. Then build dashboard views for category trend, top products, channel AOV, margin rate, and inventory risk. The dashboard should tell a decision story, not just show every metric available.

How do I make this harder?

Add product category changes over time, partial refunds, multiple currencies, delayed inventory snapshots, or promotion campaign dates. Each addition forces you to clarify grain and business rules.

Retail SQL Interview Case Study: Product and Category Performance