Database Indexes Explained · The Mental Model That Sticks

01The mental model

Think of a database table as a book without an index. To find every chapter about "javascript," you'd read every page. That's a sequential scan — slow, but the database does it because it has no other choice.

Now think of the index at the back of a textbook. It maps keywords to page numbers. Looking up "javascript" in the index takes maybe 20 comparisons (binary search through alphabetized terms), gives you a list of page numbers, and you go directly to those pages.

That's exactly what a database index does. It's a separate data structure stored alongside the table, containing only the indexed columns and pointers to the actual rows. The database checks the index first; if the index can answer the query, it uses the row pointers to fetch only the matching rows.

The trade-off: indexes take disk space and slow down writes. Every INSERT, UPDATE, or DELETE has to update every index on that table. So you don't index everything — you index the columns that actually appear in your WHERE clauses.

02B-tree indexes — the default

99% of indexes you'll create are B-tree (balanced tree) indexes. Postgres and MySQL default to B-tree when you write CREATE INDEX. They're great at:

Equality lookups: WHERE id = 42
Range scans: WHERE created_at BETWEEN '2026-01-01' AND '2026-02-01'
Ordering: ORDER BY name
Prefix matching: WHERE email LIKE 'foo%' (but NOT '%foo')

✓ basic B-tree index

-- Create the index
CREATE INDEX idx_users_email ON users(email);

-- Now queries on email are O(log n) instead of O(n)
SELECT * FROM users WHERE email = 'alice@example.com';

A B-tree on 10 million rows finds a value in about 23 comparisons. A sequential scan does 10 million. The difference is so large that running queries without indexes feels broken once you've seen the indexed version.

03Composite indexes — order matters

A composite index covers multiple columns. The order of columns in the index definition determines which queries it can serve.

✓ composite index

CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

This index can serve:

WHERE user_id = ? — uses the index (leftmost column)
WHERE user_id = ? AND created_at = ? — uses the index (both columns)
WHERE user_id = ? AND created_at > ? — uses the index (range on the last column is fine)
ORDER BY user_id, created_at — uses the index for sorting

But it can't efficiently serve WHERE created_at = ? alone — the index is ordered by user_id first, so created_at values are scattered. For that query you'd need a separate index on (created_at) or (created_at, user_id).

The rule of thumb: put the most-equality-filtered column first, then range-filtered columns. So (status, created_at) is usually better than (created_at, status) because you filter status by equality more often than created_at.

04Covering indexes — skip the table

A covering index includes every column the query needs, so the database can answer the query using only the index without touching the table itself. This is a major optimization for read-heavy queries.

✓ Postgres covering index with INCLUDE

-- The query
SELECT user_id, total FROM orders WHERE status = 'paid';

-- The covering index
CREATE INDEX idx_orders_status_covering
  ON orders(status)
  INCLUDE (user_id, total);

Now the database walks the B-tree to find status = 'paid' entries, reads user_id and total directly from the index, and never touches the actual table. EXPLAIN will report this as an "Index Only Scan."

In MySQL, you achieve the same effect by including all needed columns in the index key: CREATE INDEX ... ON orders(status, user_id, total). Postgres's INCLUDE is cleaner because the trailing columns don't affect index ordering — they're just along for the ride.

05Partial indexes — index only what matters

A partial index covers only rows matching a WHERE clause. This is enormously useful when most of your queries care about a small subset of the data.

✓ partial index

-- 99% of queries care about active users
-- Soft-deleted users are rarely queried by anything other than ID

CREATE INDEX idx_users_email_active
  ON users(email)
  WHERE deleted_at IS NULL;

The index now contains only rows where deleted_at IS NULL. It's smaller (so it fits in memory better), faster to scan, and faster to update — but the query planner only uses it when your query also includes WHERE deleted_at IS NULL.

Common partial index use cases: indexing only "active" records, indexing only "unprocessed" rows in a job queue, indexing only rows from the last 90 days.

06Expression indexes — index computed values

Sometimes you query on a computed value, not the raw column. Example: case-insensitive email lookup.

✗ index is useless — function on column

CREATE INDEX idx_users_email ON users(email);

-- This query CANNOT use the above index:
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

✓ index the expression directly

CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Now this query uses the index:
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

Any function in a WHERE clause prevents the index from being used unless you index the function itself. This catches developers constantly — they wrap a column in LOWER(), DATE(), or CAST() and wonder why the query is slow.

07Reading EXPLAIN — the parts that matter

EXPLAIN ANALYZE shows you exactly what the database does. Here's how to read it:

✓ Postgres EXPLAIN output

EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'paid';

Index Scan using idx_orders_status on orders
  (cost=0.42..8.44 rows=1 width=64)
  (actual time=0.020..0.021 rows=1 loops=1)
  Index Cond: (status = 'paid')
  Buffers: shared hit=4
Planning Time: 0.123 ms
Execution Time: 0.045 ms

Key things to look for:

"Seq Scan" — the database read every row. Bad on large tables; add an index.
"Index Scan" — used an index but still read rows from the table. Good for selective queries.
"Index Only Scan" — answered entirely from the index. Best case; means your index is "covering."
"Bitmap Heap Scan" — combined multiple indexes. Often a sign you should consider a composite index instead.
"rows= " in actual — actual rows returned. If "estimated rows" differs wildly from actual, run ANALYZE to update statistics.

08The five mistakes that destroy performance

1. Indexing every column "just in case." Every index slows writes and consumes RAM. Index columns that appear in WHERE, JOIN, or ORDER BY. Skip the rest.

2. Wrong column order in composite indexes. (name, status) can serve WHERE name = ? but not WHERE status = ?. Plan the order based on your most common queries.

3. Functions on indexed columns. Any function call prevents index use. Either rewrite the query or create an expression index.

4. Low-selectivity indexes. An index on a boolean column where 95% of rows are true is useless — the database might as well scan. Use partial indexes instead.

5. Forgetting to ANALYZE after bulk loads. The query planner uses statistics. After inserting a million rows, the statistics are stale and the planner makes bad decisions. Run ANALYZE table_name after major data changes.

09The other index types — briefly

Postgres has additional index types for specific cases. You'll rarely need them, but knowing they exist helps:

Hash: Faster than B-tree for equality-only queries on large tables. Can't handle ranges or sorting. Rarely worth it.
GIN (Generalized Inverted Index): For arrays, JSONB, full-text search. If you're searching JSONB columns or doing WHERE tags @> '{x,y}', this is what you want.
GiST: For geographic data, ranges, and similarity searches. PostGIS uses GiST heavily.
BRIN (Block Range Index): Tiny indexes for naturally ordered large tables. If you have a time-series table where rows are naturally clustered by timestamp, a BRIN index on the timestamp column is 1000x smaller than a B-tree and almost as effective.

∞The discipline

Index design is a craft. The teams that ship fast databases didn't add indexes randomly. They profiled queries with EXPLAIN, identified the hot ones, designed indexes specifically for those queries, and dropped indexes that weren't being used. pg_stat_user_indexes tells you exactly which indexes have been read since the database started. If an index has zero reads after a month in production, it's not earning its weight in disk space and write overhead.

Drop unused indexes. Add intentional ones. Profile before optimizing. The database will reward you for the discipline.