This is the pattern the series was named after, and the one I overused for years. So it's a little strange that I now think it's the simplest pattern to explain, once you refuse the usual framing. The usual framing is a grid you fill in some clever order, and it is, in my opinion, the single worst way the subject is taught.
DP is recursion with a cache. The cache is there because the recursion calls itself on the same subproblem more than once, and recomputing is wasteful. That's all. If you can write the recursion, you can write the DP — the upgrade is one line.
When you reach for DP at all
The series test, from the intro, is do the choices interact? If the best move now can constrain a future move, past decisions matter, and you have to remember where you've been — that's DP. If the local best is globally safe, you don't need to remember anything and one pass suffices — that's greedy, where the whole game is proving the choices don't compound. DP is the other branch: the choices do compound, so you keep state. Kadane is the degenerate case — DP with a single one-bit decision (carry the running sum or restart) and a state so small it collapses to two variables.
So DP is what's left when the decisions are entangled and you can't pass them off to a one-liner. Good. Now the actual work begins, and it is not the recursion.
The state is the whole game
Here is the claim I want to defend: in a DP problem, the recurrence is rarely the hard part. I'm a reasonably strong recursive thinker, and I got longest increasing subsequence, longest common subsequence, and edit distance clean on first contact — not because the recursion was clever, but because once I'd named the state correctly, the recurrence was forced. The difficulty was never how do smaller answers combine. It was what is a subproblem, exactly.
So I teach every DP example in two beats, in this order:
- What does
solve(...)MEAN? A precise English sentence describing the subproblem its arguments name. Not code — a sentence. - The recurrence. How
solveof a bigger state is built fromsolveof smaller ones.
If you can't write the sentence in beat one, you don't have a state, and writing code is premature. Most DP failures I've watched are people writing beat two before beat one exists.
Edit distance, state first
Let me walk one example all the way, the way I'd do it cold. Edit distance: given two strings, the minimum number of single-character edits — insert, delete, replace — to turn one into the other. I picked this one because it has three transitions and people find it intimidating, and I want to show that the intimidation is misplaced once the state is named.
Beat one. What's the subproblem? I'm comparing two strings, so I'm consuming both, one character at a time. The natural state is two positions: how much of each string is left to deal with. Define it:
solve(i, j) = the edit distance between the suffix of a starting at i and the suffix of b starting at j.
That sentence is the entire insight. Read it twice. i and j together name a subproblem, and solve(i, j) always means the same thing regardless of how I got there — which is exactly the property that lets me cache it.
Beat two falls out. Look at the first character of each remaining suffix, a[i] and b[j]:
- If they match, there's nothing to pay for this position. Advance both.
solve(i, j) = solve(i+1, j+1). This is the diagonal move — consume from both strings. - If they mismatch, I must spend one edit, and I pick the cheapest of three repairs:
- replace
a[i]withb[j], then both advance:1 + solve(i+1, j+1) - delete
a[i]:1 + solve(i+1, j) - insert
b[j]in front ofa[i]:1 + solve(i, j+1)
- replace
Take the min of those three. The base cases are when one string is exhausted: if i is past the end of a, the answer is whatever's left of b (all inserts), and symmetrically for j.
I want to name the geometry, because it's the thing that transfers to every 2D sequence DP. The diagonal means "consume from both." The edges mean "consume from one." Match buys you the diagonal for free. Mismatch forces a paid step and you choose your direction — diagonal (replace), down (delete), or right (insert). LCS is the same picture with the payment moved around: match takes the diagonal and gains one; mismatch takes the better of the two edges. Once you see the grid as "which direction do I consume," LCS and edit distance stop being two problems and become one problem with two scoring rules.
b →
"" b0 b1 b2
"" . → → → edge = consume only from b (insert)
a a0 ↓ ↘ diagonal = consume from both (match / replace)
↓ a1 ↓ ↘
a2 ↓ ↘ edge ↓ = consume only from a (delete)
Here's the recursion, before any cache:
func editDistance(_ a: [Character], _ b: [Character]) -> Int {
func solve(_ i: Int, _ j: Int) -> Int {
if i == a.count { return b.count - j } // a exhausted: insert the rest of b
if j == b.count { return a.count - i } // b exhausted: delete the rest of a
if a[i] == b[j] { return solve(i + 1, j + 1) }
let replace = solve(i + 1, j + 1)
let delete = solve(i + 1, j)
let insert = solve(i, j + 1)
return 1 + min(replace, min(delete, insert))
}
return solve(0, 0)
}
That is a correct, complete, honest description of edit distance. It is also exponential, because solve(i, j) gets recomputed through many different call paths. Which is the cue for the one-line upgrade.
Memoization is the one-line upgrade
The subproblems overlap — solve(i, j) is the same answer no matter which branch asked for it — so compute each once and remember it. That's the cache, and adding it is mechanical: a dictionary (or a 2D array) keyed by the state, checked on entry and written on exit.
func editDistance(_ a: [Character], _ b: [Character]) -> Int {
var memo = [[Int]:Int]() // key = the state (i, j)
func solve(_ i: Int, _ j: Int) -> Int {
if i == a.count { return b.count - j }
if j == b.count { return a.count - i }
if let hit = memo[[i, j]] { return hit } // already solved this exact subproblem
let result: Int
if a[i] == b[j] {
result = solve(i + 1, j + 1)
} else {
result = 1 + min(solve(i + 1, j + 1), min(solve(i + 1, j), solve(i, j + 1)))
}
memo[[i, j]] = result
return result
}
return solve(0, 0)
}
The logic did not change. I added a lookup and a store. That is the difference between exponential and O(m·n), and it is the entire difference between "recursion" and "dynamic programming." Note that the cache key is the state — which is the cleanest possible reason to get the state right: it's literally what you're indexing on.
On bottom-up tabulation
Now the opinion, stated as one. People will tell you the "real" DP is the bottom-up version: allocate a table, fill it in dependency order with nested loops, no recursion in sight. I find that form unintuitive and I don't teach it as the goal. Top-down recursion mirrors how you actually reason about the problem — you ask for the answer you want, and it asks for the smaller answers it needs. Bottom-up inverts that: you have to figure out, in advance, an order in which every subproblem is ready before anything depends on it. That ordering is a real puzzle you've imposed on yourself, and it buys you nothing the cache didn't already give you, except a constant-factor win on space and the elimination of call-stack overhead.
So I treat bottom-up as an implementation detail — a space optimization you reach for when a profiler tells you to, not the destination. When you only ever need the previous row of the table, you can drop from O(m·n) space to O(n), and that's a genuinely nice trick. But you reach for it after you have a correct top-down solution, as a deliberate optimization, not as the way you first think about the problem. I'm not apologizing for this preference. The top-down form is the honest one: it's the shape of the reasoning, not a clever re-encoding of it.
- form
- bottom-up table
- you must invent
- a fill order where deps come first
- mirrors your reasoning?
- no — it inverts it
- when
- as a space optimization, after correctness
- form
- top-down recursion + cache
- you must invent
- the state (you'd need this anyway)
- mirrors your reasoning?
- yes — ask for what you want
- when
- first, always
The Int.max sentinel bug
Top-down DP has a failure mode that's worth a scar, because it cost me one. Consider min-coins change: fewest coins summing to a target amount, or report it's impossible. The recurrence is solve(amount) = 1 + min over coins c of solve(amount - c), with solve(0) = 0 and unreachable amounts marked with a sentinel meaning "impossible."
The obvious sentinel is Int.max. It's wrong, and the bug is beautiful because it isn't in the logic. When you take an impossible sub-state and add a coin to it — 1 + Int.max — you overflow, and in Swift that traps. The recurrence is correct. The transitions are correct. The state is correct. The sentinel can't survive the increment that the recurrence performs on it.
let IMPOSSIBLE = amount + 1 // > any real answer (≤ amount coins), and IMPOSSIBLE + 1 still doesn't overflow
A real answer never exceeds amount coins (you'd use that many 1-coins at worst), so amount + 1 is safely larger than any valid result, and it tolerates the 1 + the recurrence applies to it. You compare against it at the end to decide reachability.
Int.max as a DP "infinity" overflows the moment your recurrence adds to it — and your recurrence's whole job is to add to it. Pick a sentinel that's larger than every real answer yet survives the increment: amount + 1, n + 1, problem-bound + 1.Top-down exposes a bug bottom-up hides
One more, and it's a clean argument for top-down beyond taste. Word break: can a string be split into a sequence of words all drawn from a given dictionary? State first: solve(i) = can the suffix starting at index i be fully segmented? Recurrence: for each dictionary word that matches starting at i, recurse on the index past that word; true if any branch is true; solve(length) is the base case (the empty suffix is trivially segmentable).
I wrote that, memoized it, and it was correct — every transition right, the cache keyed on i, the base case right. And it failed. The bug was the entry point: I called solve from the wrong starting index. The recurrence was perfect and the program was still wrong, because in top-down DP the initial call is a separate decision from the recurrence, and it's easy to get one right while botching the other.
That's the part I want to flag, because it's a property of the form. In bottom-up, the entry point is implicit — you read the answer out of a fixed cell — so this class of bug is hidden; you just never make the wrong call because there's no call. Top-down makes the entry point explicit, which means it's a thing you can get wrong. I count that as a feature. The recurrence and the question-you're-asking are genuinely two different things, and a form that forces you to write both down separately is a form that lets you debug them separately.
solve the question you actually want answered?). Bottom-up fuses these; top-down separates them, which is why a perfect recurrence can still return the wrong answer.solve(args) means, and the recurrence — match takes the diagonal, mismatch pays for one of three neighbors — follows. Write it top-down because that's the shape of your reasoning; reach for bottom-up only as a space optimization. And watch the two things the recurrence can't protect you from: a sentinel that overflows when you add to it, and an entry point that asks the wrong question.Problems to practice
- LC 300 Longest Increasing Subsequence — define the state first
- LC 1143 Longest Common Subsequence — match goes diagonal
- LC 72 Edit Distance — three transitions: replace / insert / delete
- LC 322 Coin Change — the Int.max sentinel overflow trap
- LC 139 Word Break — correct recurrence, watch the entry point
- LC 198 House Robber — choices interact, so you remember state
- LC 416 Partition Equal Subset Sum — 0/1 knapsack in disguise
- LC 62 Unique Paths — grid DP, state is the cell