Quick summary
Summarize this blog with AI
Introduction
A slow SQL query often triggers the same reflex: add an index. Sometimes that helps. Just as often, the index already exists and the query still scans far more rows than expected.
That mismatch confuses people because the problem sounds binary. Either the table has an index or it does not. In real systems the optimizer is making a cost-based choice from the query you wrote, the statistics it has, and the access paths available. If your predicate hides the indexed values, if the index columns are ordered for a different query shape, or if the filter is not selective enough to justify the extra work, the engine may choose a scan anyway.
This is why strong SQL work is not just about knowing that indexes matter. It is about writing predicates the optimizer can use, understanding what the index is sorted by, and checking the execution plan before you guess at the fix.
If You Only Remember Four Rules
- Keep the indexed column as untouched as possible in the predicate. Functions, casts, and arithmetic on the column side are common reasons a seek turns into a scan.
- Composite indexes help only when the query lines up with their leftmost columns and predicate shape.
- A scan is not automatically wrong. If the filter is broad or the table is small, a scan may be cheaper.
- The plan matters more than folklore. Before changing the schema, confirm whether the filter, estimates, join shape, or sort is the real bottleneck.
What Sargability Actually Means
A predicate is usually called sargable when the database can use an index to narrow the search space from the indexed values themselves instead of evaluating an expression row by row. You do not need the jargon to write better SQL, but the idea matters because it explains why two queries that look equivalent on paper can behave very differently.
If you compare the raw indexed column to a compatible constant or range, the engine has a clear starting point. If you wrap the column in DATE(), LOWER(), arithmetic, or an awkward cast, you often force the engine to compute a transformed value before it can decide whether the row qualifies.
The exact behavior varies by database and by index type, but the habit is stable: keep the indexed column as untouched as possible on the predicate side of the comparison.
A Worked Example: Why DATE(created_at) Often Defeats the Index
Suppose you have an orders table with an index on created_at. You want all rows from April 1, 2026.
SELECT *
FROM orders
WHERE DATE(created_at) = DATE '2026-04-01';
This version looks tidy, but it often prevents efficient index use because the function is applied to every created_at value before the comparison. A safer version is to filter on the raw timestamp range.
SELECT *
FROM orders
WHERE created_at >= TIMESTAMP '2026-04-01 00:00:00'
AND created_at < TIMESTAMP '2026-04-02 00:00:00';
The second query gives the optimizer a direct range on the indexed values. It is also more explicit about boundaries, which reduces off-by-one mistakes when timestamps include hours, minutes, and seconds.
This pattern is not limited to dates. The same idea applies when people write LOWER(email), amount + 10, CAST(user_id AS text), or any other expression that shifts work onto the column side.
A Second Worked Example: Why LOWER(email) Is Not the Same as email
Consider this query on a table that only has a normal index on email.
SELECT *
FROM users
WHERE LOWER(email) = '[email protected]';
The intent is fine, but the predicate is now asking the engine to compare on LOWER(email), not on the stored email values that the index is sorted by. Depending on the engine, the cleaner fix may be normalized stored data, a case-insensitive collation, or a matching expression index. The right answer depends on the workload. The principle does not: the index and the predicate have to line up.
Why an Index Can Exist and Still Not Help
An index is one possible access path, not a promise. The optimizer still has to estimate whether that path is cheaper than scanning the table, scanning another index, or building an intermediate structure later in the plan.
If your filter keeps most of the rows, a scan may be the right answer. If the table is small, a scan may be the right answer. If the predicate shape makes the index hard to use, a scan may again be the right answer from the optimizer's perspective.
That is why the practical question is not, "Do I have an index?" The better question is, "Did I write this query so the optimizer can use the index I have, and would that actually be cheaper?"
Composite Index Column Order Matters More Than People Expect
Multi-column indexes only help when the query matches how the index is organized. Suppose you have an index on (user_id, created_at).
SELECT *
FROM events
WHERE created_at >= DATE '2026-04-01'
AND created_at < DATE '2026-04-02';
That query does not constrain the leading column user_id, so the index may be far less helpful than people expect. The leftmost column matters because the index is sorted starting from that column. If the real workload starts by filtering on date, you usually need an index that starts with date.
People often remember this as the leftmost-prefix rule. The name matters less than the design habit: build composite indexes around the actual filter order, join keys, and sort requirements that dominate the workload.
Range Predicates Limit What Later Indexed Columns Can Do
Composite indexes also become less powerful once the predicate hits a range condition on an earlier indexed column. Imagine an index on (user_id, created_at, status).
WHERE user_id = 10
AND created_at = TIMESTAMP '2026-04-01 09:00:00'
AND status = 'active'
That pattern is index-friendly because the engine can navigate tightly through equality conditions. Now change the middle predicate to a range:
WHERE user_id = 10
AND created_at >= TIMESTAMP '2026-04-01 00:00:00'
AND status = 'active'
The engine may still use the index, but it often has to scan a broader slice of matching rows for that user and apply status later. The exact mechanics differ by engine, yet the practical lesson is stable: equality predicates on leading columns are usually more selective and more index-friendly than broad ranges.
Statistics and Data Distribution Can Change the Decision
Sometimes the query is perfectly sargable, the index order is reasonable, and the engine still makes a disappointing choice. The missing piece is often estimates. Query optimizers rely on statistics to predict how many rows each predicate will match and how expensive different paths will be.
If the statistics are stale, if the data distribution is highly skewed, or if a value is much rarer or much more common than the planner expects, the chosen plan can be worse than what you would pick by inspection. This is one reason to compare estimated rows and actual rows in the execution plan instead of stopping at the access path label.
If the estimates are far off, the fix may be better statistics, a different predicate shape, or a plan that isolates the selective condition earlier. Adding another index is not always the right first move.
A Wrong-vs-Better Rewrite Table
| Anti-pattern | Why it hurts | Safer pattern |
|---|---|---|
DATE(created_at) = '2026-04-01' | Transforms the indexed column before comparison | Use a raw timestamp range |
LOWER(email) = '[email protected]' | Index on email may not match LOWER(email) | Normalize data, use case-insensitive comparison support, or add a matching expression index |
amount + 10 > 100 | Pushes arithmetic onto the column side | Rewrite as amount > 90 |
email LIKE '%@gmail.com' | No useful starting prefix for a normal B-tree index | Store and index email domain separately if that filter matters |
Index on (user_id, created_at) for a date-only filter | Leading column does not match actual access pattern | Create an index that starts with the real selective column |
When a Full Scan Is Actually Correct
A full scan is not automatically a bug. If you filter on a low-cardinality column like status and half the table is active, the optimizer may correctly decide that reading the table once is cheaper than many index lookups plus extra heap access.
The same thing happens on small tables. Developers sometimes panic because they expected an index seek, but the table has only a few thousand rows and the scan is genuinely cheaper.
The goal is not to force index usage everywhere. The goal is to avoid accidental scans caused by query shape when you actually wanted selective access.
How to Read the Plan Without Overcomplicating It
You do not need to become a query-optimizer specialist to get value from execution plans. Focus on a few concrete questions.
- Was the engine doing a table scan, index scan, index seek, bitmap scan, or partition-pruned read?
- How many rows did it expect at each step, and how many did it actually process?
- Did the engine filter rows during index access, or did it pull a broad set first and filter later?
- Did a later sort, hash join, or repeated nested-loop lookup become the expensive step?
If the row estimates are wildly wrong, stale or weak statistics may be part of the problem. If the filter is cheap but a join explodes the row count later, the index was never the primary issue. That is why execution plans matter more than folklore.
A Practical Debugging Workflow
- Read the predicate and look for functions, casts, arithmetic, wildcard patterns, or mismatched data types.
- Check whether the available index starts with the columns your filter actually constrains.
- Ask whether the predicate is selective enough that an index should help at all.
- Open the plan and inspect the chosen access path, row estimates, actual rows, and where the filter is applied.
- Verify whether the real cost is the filter, or whether a later join, sort, or aggregation is dominating the runtime.
This workflow is much more reliable than reacting to the mere presence of the word "scan" in the plan.
Engine-Specific Notes
PostgreSQL: PostgreSQL can use expression indexes, partial indexes, bitmap index scans, and EXPLAIN ANALYZE with buffers for excellent visibility. If you repeatedly filter on DATE(created_at) or LOWER(email), a matching expression index may be appropriate, but only if that access pattern is stable and worth the write overhead.
MySQL: MySQL follows the leftmost-prefix idea closely for composite B-tree indexes. Generated columns can help when you need to index a transformed value. EXPLAIN is essential because predicates that look harmless often widen range scans more than expected.
SQL Server: SQL Server plans often reveal the problem clearly as a seek versus scan issue, and implicit conversions are a common culprit. Included columns can reduce extra lookups, but they do not fix a fundamentally non-sargable predicate.
BigQuery and Snowflake: These warehouses do not use user-managed B-tree indexes in the same way as row-store databases. The same thinking still applies, but the knobs are partition pruning, clustering, micropartition elimination, and query shape rather than classic secondary indexes. If you work across engines, do not cargo-cult row-store indexing advice into warehouse systems.
How to Explain This in an Interview
If someone asks why a query is not using an index, the strongest short answer is not "we need an index." A stronger answer is: first I would check whether the predicate is sargable, whether the available index order matches the filter, whether the filter is selective enough to justify indexed access, and what the execution plan says about estimates and bottlenecks. That answer shows you understand both the SQL and the diagnostic process.
Related Reading
If the real issue is duplicate rows after a join rather than index usage, read Why SQL Joins Create Duplicate Rows and How to Fix Them Without Hiding the Problem. If your filter logic involves dates and timestamps, read SQL Date and Timestamp Interview Questions: Parsing, Bucketing, and Filtering Without Off-by-One Mistakes. If you are using a top-N or latest-row pattern as part of the query, read SQL Latest Row Per Group: ROW_NUMBER, QUALIFY, and Tie-Breakers That Keep Results Deterministic.
FAQ
Should I always add a new index when a query is slow?
No. First check whether the query shape is blocking efficient use of an existing index, whether the filter is selective enough to justify an index, and whether the real bottleneck is later in the plan.
Is forcing an index a good fix?
Usually not as a first move. Hints can lock you into a plan that looks good for one data distribution and degrade later. Rewrite the predicate and read the plan before you reach for hints.
Do CTEs automatically stop indexes from working?
Not automatically. The real issue is how your database optimizes the final logical plan. Some engines inline CTEs aggressively, others materialize more often, and the only trustworthy answer is the plan in your engine.
Final Takeaway
Indexes get ignored when the query shape makes efficient lookup hard or when a scan is honestly cheaper. Keep predicates sargable, match composite index order to the real access pattern, and inspect the execution plan before you decide whether the fix is a rewrite, a different index, or something else entirely.