Avismara Hugoppalu
← All musings

Data Structures & Algorithms

A heap is an array pretending to be a tree

Parent/child is pure index arithmetic; two repair operations derive everything else. When a heap beats sorting, and two bugs that looked helpful while quietly breaking the invariant.

May 16, 2026 · 9 min read

TL;DR
A heap is an array pretending to be a tree. There are no nodes and no pointers — parent and child are pure index arithmetic, and the entire structure is one flat array plus a single invariant (parent dominates its children). That invariant is maintained by exactly two local repairs, siftUp and siftDown, and everything else — insert, removeMin, top-K — is three lines wrapped around them. So you don't memorize a 40-line struct. You memorize two functions cold and derive the rest. Heaps earn their keep in one situation: you need the running min or max repeatedly while the data keeps changing. If the data were static you'd just sort once and be done.

Most heap explanations open by showing you the struct, and that's backwards — the struct is the least interesting part and the easiest to forget under time pressure. The load-bearing idea is smaller and it survives: a binary heap is a complete binary tree that has been flattened into an array, and the tree is a fiction maintained entirely by index math. Once you believe that, the two operations that actually matter fall out, and the rest is boilerplate. This is a sibling to the DP post — the pattern only sticks once you can derive it from one principle instead of recalling it whole.

The array is the data structure; the tree is a story we tell

Draw a complete binary tree — every level full except possibly the last, which fills left to right. Now read the nodes top to bottom, left to right, and drop them into an array in that order. That's a heap. There is no separate tree in memory. There are no node objects, no left/right pointers. There is one array, and the parent-child relationships are recovered by arithmetic on indices:

func parent(_ i: Int) -> Int { (i - 1) / 2 }
func left(_ i: Int)   -> Int { 2 * i + 1 }
func right(_ i: Int)  -> Int { 2 * i + 2 }

This only works because the tree is complete. A complete tree has no gaps, so the array has no gaps, so the position of every node is forced — and forced positions are exactly what makes the index formula exact. The completeness isn't a nice-to-have; it's the thing that lets the array impersonate a tree at all.

Index i maps to children 2i+1 and 2i+2. Same data, two readings.
array1437801234tree143

The invariant, and why two repairs cover everything

The only rule a heap enforces is local: every parent dominates its children. For a min-heap, "dominates" means "is less than or equal to." Note what this does not say — siblings are unordered, and a node deep on the left can be larger than a node high on the right. The heap is not sorted. It guarantees exactly one thing: the root is the global minimum, because domination is transitive up every path to the top.

That weak invariant is the whole trick. It's cheap to maintain because any single mutation can only break it in one place. Insert a value at the end, and the only relationship that could now be wrong is that new element versus its parent. Remove the root and backfill, and the only thing that could be wrong is the new root versus its children. Each of the two ways you can disturb a heap has exactly one local fault, and so there are exactly two repairs.

siftUp: bubble up until your parent deserves to be above you

You just appended at the last index. Everything below you is still fine; the one suspect relationship is you against your parent. So compare, and if the parent doesn't dominate you, swap up and repeat. Stop when the parent dominates — or when you reach the root.

mutating func siftUp(_ start: Int) {
  var i = start
  while i > 0 && heap[i] < heap[parent(i)] {  // child outranks parent: violation
    heap.swapAt(i, parent(i))
    i = parent(i)
  }
}

The mental model: bubble up until your parent deserves to be above you. One path, one comparison per level, O(log n).

siftDown: sink by always swapping with your dominant child

You just moved the last element to the root (this is how removeMin backfills). Now the suspect relationship is the root against its children. The instinct is "swap with a child that beats me" — but which child, and why it has to be a specific one, is the single non-obvious idea in the entire data structure.

You must swap with the dominant child — for a min-heap, the smaller of the two. Here is why, and it's worth slowing down for. When you push the offending value down into a child slot, that value becomes the new parent of the other child as well. If you swap with the larger child, you've just installed a value that is larger than the smaller child you left behind — and you've re-broken the invariant on a branch you weren't even repairing. Swapping with the smaller child is the only move that guarantees the promoted value dominates both of its new siblings. It's not an optimization; picking the wrong child produces a heap that is silently invalid.

mutating func siftDown(_ start: Int) {
  var i = start
  let n = heap.count
  while true {
    var best = i                                       // assume parent is fine
    let l = left(i), r = right(i)
    if l < n && heap[l] < heap[best] { best = l }
    if r < n && heap[r] < heap[best] { best = r }      // best = dominant of {self, children}
    if best == i { break }                             // parent already dominates: done
    heap.swapAt(i, best)
    i = best
  }
}
The "swap with the dominant child" rule is the one place a heap implementation goes subtly wrong rather than loudly wrong. If you swap with whichever child happens to be smaller-than-the-parent first, without comparing the two children to each other, you can promote a value that violates the sibling branch. The structure still looks like a heap, removeMin still returns a value, and your tests on tiny inputs pass. Compare both children to each other every time. The asymmetry between the two repairs is real: siftUp has one candidate to compare against (the parent), siftDown has two (the children), and that second comparison is the whole correctness story.

Derive the rest

With the two repairs memorized, the public operations are three lines each. There is nothing to recall here — you reconstruct them from the invariant.

mutating func insert(_ value: Int) {
  heap.append(value)        // new slot is the only possible violation
  siftUp(heap.count - 1)
}

mutating func removeMin() -> Int? {
  guard !heap.isEmpty else { return nil }
  heap.swapAt(0, heap.count - 1)  // move min to the end so we can pop it cheaply
  let min = heap.removeLast()
  siftDown(0)
  return min
}
Before
What you memorize
siftUp and siftDown, ~8 lines each, cold
What you derive
insert / removeMin / peek, ~3 lines each
What you skip
the generic struct, the comparator plumbing, the resizing ceremony
After
Why
they are the irreducible repairs; every fault reduces to one of them
Why
they are just a mutation plus the matching repair
Why
boilerplate you can write without thinking once the core is solid

How I broke it: two clean, repeatable bugs

These are the two failures I hit building a min-heap from scratch — Swift's standard library has no Heap, so you write it yourself, and you get to make these in person.

The first wasn't even heap-specific; it was a read-the-API-contract bug. I wrote a loop as for (num, i) in nums.enumerated() because that reads like "number, index." But enumerated() yields (offset, element) — index first. So num was holding the index and i was holding the value, and the code ran fine, did the wrong thing, and said nothing. The lesson is dull and permanent: when a sequence-producing API hands you a tuple, look up which slot is which instead of trusting how the names read.

The second was self-inflicted helpfulness. Inside insert, after appending the new value (correct), I added a swapAt(0, count - 1) before calling siftUp — some half-remembered muscle memory that the new element belongs near the root. It doesn't. That swap hurls an arbitrary old value to the end of the array and then sifts up the wrong element entirely, quietly corrupting the invariant. The fix is a rule with no exceptions: append, then siftUp from the last index, full stop. Any extra "helpful" line in a heap mutation is almost certainly breaking an invariant the two-repair model already handles.

Both bugs share a shape: the program kept running. Neither crashed, neither threw. A heap with a broken invariant is a heap that returns plausible-looking wrong answers. There's no exception to catch — your only defense is keeping each mutation to exactly one repair and never adding a line "to help."

When to reach for it: the recognition signals

A heap is the answer when you need the current extreme repeatedly while the dataset is in motion — elements arriving, leaving, or being replaced. Sorting gives you order once; a heap gives it to you continuously, paying O(log n) per change instead of O(n log n) per re-sort. The signals I've locked in:

Signal in the problemWhy a heap fits
"top-K" / "Kth largest"maintain a size-K heap; the root is the boundary element
"merge K sorted lists"a K-element heap always surfaces the next-smallest head
"median of a data stream"two heaps (a max-heap and a min-heap) split the stream at the middle
"streaming / online order statistics"the extreme must stay available as data flows in
"greedy with a changing priority"re-pick the best candidate after each step without re-sorting

The unifying tell: you need the running min or max again and again as elements come and go. If you only need order once, you don't have a heap problem — you have a sort.

Top-K, and why the "easy" guard is a correctness trap

Take "Kth largest element." There are three approaches, and the point isn't that one is right — it's knowing the crossover where each wins.

  1. Sort everything — O(n log n)
    Sort descending, index into position k - 1. Simplest to write and read. It's the right call when n is small or you also need the rest of the order anyway. You're paying to order the whole array when you only wanted a sliver of it.
  2. Size-K min-heap — O(n log K)
    Keep a min-heap holding only the K largest seen so far. Its root is the smallest of those K, i.e. the current Kth-largest. Each new element costs log K, not log n. This wins decisively when K is much smaller than n — you never hold more than K elements.
  3. Quickselect / bucket — O(n) average
    Partition around a pivot and recurse into only the side that contains rank K, never sorting the other side. Optimal asymptotically, but it mutates the array, has an O(n²) worst case without good pivoting, and is fiddlier to get right. Reach for it when n is large, K is not small, and the constant factor matters.

The size-K min-heap has a subtlety I flagged myself while writing it, because it's the kind of mistake that's wrong, not merely slow. Once the heap holds K elements, each new value is compared against the root. The replace condition has to be exactly right: a new value enters (evicting the current root) only if it strictly beats the root — i.e. it belongs in the top K. Get the comparison backwards or off by an equals sign and you don't get a slower correct answer; you evict the wrong element and the root stops being the Kth largest. With a heap, a sloppy guard isn't a performance bug. It's a correctness bug wearing a performance bug's clothes.

The takeaway

A heap is an array impersonating a complete binary tree, held together by one local invariant — parent dominates children — and that invariant is restored by exactly two repairs. siftUp bubbles a new element up against its parent; siftDown sinks a displaced root down, and it must swap with the dominant child, because that is the only swap that leaves both branches valid. Memorize those two functions cold and derive insert, removeMin, and top-K from them — don't memorize the struct. Reach for a heap precisely when you need the running min or max repeatedly while the data changes; if the data is static, sort once instead. And watch the guards: a heap with a broken invariant doesn't crash, it lies.

Problems to practice

Problems to practice