Quick summary
Summarize this blog with AI
Introduction
SQL interview questions about missing data are dangerous because the wrong query usually still runs. You do not get a syntax error. You get a believable answer that is quietly wrong.
That is why people keep getting burned by NULL, NOT IN, LEFT JOIN, and aggregate behavior. An interviewer asks for users with no purchase, customers without a completed order, or rows with missing values. The candidate writes something that looks reasonable, but one NULL in the wrong place changes the result set.
The good news is that these are not random traps. There are a few repeatable rules. If you know how SQL evaluates NULL, how anti-joins work, and where join filters belong, you can solve these prompts cleanly and explain your reasoning without sounding memorized.
Why NULL Logic Breaks Good Candidates
Most languages train you to think in two states: TRUE and FALSE. SQL adds a third state: UNKNOWN.
That one difference explains most NULL confusion.
- NULL means unknown or missing, not zero and not an empty string.
- Comparisons against NULL usually evaluate to UNKNOWN.
- The WHERE clause keeps only rows where the predicate is TRUE.
The last line matters most. WHERE does not keep FALSE, and it does not keep UNKNOWN either. So once a predicate becomes UNKNOWN, the row disappears.
This is why col = NULL fails, why NOT IN can collapse when a subquery contains NULL, and why a LEFT JOIN can stop behaving like a left join if you filter the right table in WHERE.
Rule 1: Use IS NULL and IS NOT NULL Explicitly
If a column can contain NULL, do not compare it with equality operators.
-- Wrong
SELECT *
FROM orders
WHERE shipped_at = NULL;
That predicate never becomes TRUE. The correct versions are explicit:
SELECT *
FROM orders
WHERE shipped_at IS NULL;
SELECT *
FROM orders
WHERE shipped_at IS NOT NULL;
This sounds basic, but strong candidates still miss it when the prompt layers NULL logic inside a larger join or subquery.
COUNT(*) and COUNT(column) Are Not the Same Metric
Another common interview mistake is assuming all counts behave the same.
COUNT(*)counts rows.COUNT(column)counts non-null values in that column.
SELECT
COUNT(*) AS total_rows,
COUNT(shipped_at) AS rows_with_ship_time
FROM orders;
If shipped_at is null for unshipped orders, those rows still contribute to COUNT(*) but not to COUNT(shipped_at).
That matters anytime the question uses language like row count, completed-user count, or rows with a valid timestamp. If the business question is row count, use COUNT(*). If the business question is populated-value count, COUNT(column) may be right. Say the distinction out loud in the interview.
Why NOT IN Breaks When NULL Appears
Suppose the prompt is: find customers who have never placed an order.
SELECT c.customer_id
FROM customers c
WHERE c.customer_id NOT IN (
SELECT o.customer_id
FROM orders o
);
That works only if the subquery returns non-null customer IDs. If the subquery includes even one NULL, the logic becomes unsafe.
Conceptually, SQL has to evaluate something like this:
c.customer_id <> 101
AND c.customer_id <> 205
AND c.customer_id <> NULL
The last comparison is UNKNOWN, so the whole predicate can no longer cleanly evaluate to TRUE.
If you insist on NOT IN, remove nulls inside the subquery:
SELECT c.customer_id
FROM customers c
WHERE c.customer_id NOT IN (
SELECT o.customer_id
FROM orders o
WHERE o.customer_id IS NOT NULL
);
That version can be valid. But in interview settings, there is usually a more durable pattern.
Why NOT EXISTS Is Usually the Safer Anti-Join
If the real question is "give me rows from table A with no matching row in table B," think anti-join first.
NOT EXISTS is usually the safest and clearest expression of that logic.
SELECT c.customer_id
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
This reads like the business question: start from customers, look for a matching order, and keep the customer only when no match exists.
It avoids the NULL list-membership problem because you are testing for the existence of a matching row, not comparing against a set that might contain NULL.
This is a strong default for prompts like users with no purchase, customers without a completed order, employees with no manager record, and accounts with no successful login in the last 30 days.
How LEFT JOIN Filters Accidentally Become Inner Joins
Another classic failure mode looks like this:
SELECT c.customer_id, o.order_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
WHERE o.status = 'completed';
Candidates write this because they want all customers plus completed orders. But WHERE o.status = 'completed' removes rows where o.status is NULL, which are exactly the unmatched rows introduced by the left join.
So the query no longer preserves all customers. It behaves like an inner join over completed orders.
If your intention is "keep every customer, attach completed orders when they exist," keep the right-side filter in the ON clause:
SELECT c.customer_id, o.order_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
AND o.status = 'completed';
Now unmatched customers remain, with null order columns.
This distinction matters because these prompts are often asking one of two different questions:
- Return all customers, plus a matching completed order if one exists.
- Return only customers who have at least one completed order.
Those are different business questions, so they need different SQL.
Dialect Notes for Null-Safe Comparisons
If the conversation shifts from interview logic to dialect specifics, these are useful details to know:
- PostgreSQL supports
IS DISTINCT FROMandIS NOT DISTINCT FROMfor null-safe comparison. - MySQL has the null-safe equality operator
<=>. - Many warehouses support
IS DISTINCT FROM, but function names and edge behavior still vary.
Example in PostgreSQL:
SELECT *
FROM prices
WHERE old_price IS DISTINCT FROM new_price;
That can be cleaner than manual NULL handling when the question is whether a value changed. But if the interviewer is testing core SQL reasoning, IS NULL, EXISTS, and careful join logic matter more than dialect trivia.
Worked Example: Customers Without a Completed Order
Suppose the prompt is: "Find customers who do not have any completed order."
The fragile version is:
SELECT c.customer_id
FROM customers c
WHERE c.customer_id NOT IN (
SELECT o.customer_id
FROM orders o
WHERE o.status = 'completed'
);
That can fail if orders.customer_id is nullable.
A safer answer is:
SELECT c.customer_id
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.status = 'completed'
);
You can also express the anti-join with LEFT JOIN and IS NULL:
SELECT c.customer_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
AND o.status = 'completed'
WHERE o.order_id IS NULL;
That is valid too, but it is easier to break if you move the status filter into WHERE. That is why NOT EXISTS is often the safer interview answer.
What to Say While You Solve It
A strong explanation is short and precise:
"This is an anti-join problem. I want rows from customers where no completed order exists, so I am using NOT EXISTS instead of NOT IN because nulls in the subquery can break NOT IN. If I used a left join version, I would keep the order-status filter in the ON clause so unmatched customers stay in the result."
That explanation sounds senior because it ties the SQL pattern to the business meaning.
Validation Checklist Before You Finalize the Query
- Can the subquery return NULL values?
- Am I counting rows or non-null values?
- Did a right-table filter accidentally turn a LEFT JOIN into an inner join?
- Am I using anti-join logic because the prompt is really "no matching row exists"?
- If I replaced nulls with COALESCE, did I preserve the real business meaning?
Those checks catch most silent bugs in this topic area.
Related Reading
- How to Answer Vague SQL Interview Questions by Defining Grain, Metrics, and Assumptions
- Why SQL Joins Create Duplicate Rows and How to Fix Them Without Hiding the Problem
- SQL Date and Timestamp Interview Questions: Parsing, Bucketing, and Filtering Without Off-by-One Mistakes
FAQ
Is NOT IN always wrong?
No. NOT IN is fine when the subquery cannot return NULL. The problem is that people often do not verify that assumption.
When is LEFT JOIN ... IS NULL a good answer?
It is good when you want an anti-join and you keep the right-table match conditions inside the ON clause. It becomes fragile when those filters drift into WHERE.
Should I use COALESCE everywhere to avoid null problems?
No. Use COALESCE only when the fallback value matches the business meaning. Replacing unknown with zero too early often hides the real logic problem instead of solving it.