Make a malformed deep link impossible to crash on

Three hand-rolled entry points, inconsistent auth, crashes on attacker URLs. Separating resolution from execution — the engine returns a decision as data, never navigates — collapsed three bug classes into one testable gate.

TL;DR

Three hand-rolled deep-link entry points caused crashes on bad input, inconsistent auth, and lost work. Separating resolution (untrusted URL → typed decision) from execution (navigation) collapsed those bug classes into one pure, testable gate — and shipped in with a strangler-fig migration, no rewrite. (The deeper architecture behind it is in the companion piece on routing as architecture.)

Deep links arrive from three places — custom URL schemes, Universal Links, push notifications — and on this codebase each one entered through its own hand-rolled if/else branch. Three separate doors, accreted at three different times, each with its own vocabulary for the same destination (product in one, p in another, an embedded deeplink string in the third). That arrangement caused a pile of problems that looked unrelated until you saw the shape underneath:

Crashes on malformed input. A deep link URL is attacker-controllable. Some branches force-subscripted the URL and assumed well-formed input, so a crafted link could crash the app — a recurring crash class, supplied from outside the binary.
Inconsistent auth. The cart handler checked login before routing; the others didn't. Whether a link required a session depended on which door happened to handle it — so a user could reach a gated screen through the wrong entry point.
A velocity bottleneck. Marketing couldn't ship a campaign link without an app release, because the route table lived in code, in two files, edited by hand.
Silent data loss. A push deep link could land on a screen with a dirty form and blow away unsaved input, because nothing mediated the transition.
Dropped cold-start links. A link delivered on cold launch arrived before the UI existed and simply vanished.
No observability. An unknown route was a print or a no-op. Nobody knew which links failed, or how often.

The reason this stayed invisible is that nobody owned "deep linking" as a whole. It was scattered across entry points and features, so no single feature felt the full cost. The work wasn't a hard algorithm. It was naming the scattered failure as one coherent reliability + security + velocity problem, and then owning it end to end.

Resolution is data; navigation is a side effect

The move that fixes all of these at once is to separate resolution from execution.

Turning an untrusted URL into a decision is resolution. Performing the navigation is execution. Keep them apart and the engine becomes a pure function over untrusted input — testable, validatable, and the single place policy lives.

The pipeline — and the coordinator-first trap to avoid

The engine takes an untrusted URL and returns a RouteResolution — a typed decision as data. It does not navigate. It doesn't touch UIKit. It answers one question: given this URL, what should happen, and is it allowed?

enum RouteResolution {
    case proceed(Route)              // resolved, allowed
    case gated(Precondition, resume: Route)  // stash this, satisfy the gate, resume
    case notFound(RouteError)        // typed error — never a crash
}

// One gate. Every entry point — scheme, Universal Link, push — calls this.
let resolution = engine.resolve(url, context: context)

Inside resolve the steps are: match against a declared manifest → extract parameters → validate → evaluate precondition policy from injected context. Every one of those steps returns a value rather than performing an effect. So the malformed-input crash class disappears — a URL the engine can't parse returns .notFound(RouteError) instead of trapping. Auth stops being a per-branch accident, because the precondition policy lives in the engine and every entry point inherits the same rules. And because resolution is a pure function over its inputs, it's trivially unit-testable: feed it URLs and a context, assert on the decision, no navigation stack and no simulator required.

Execution lives downstream. A coordinator takes the decision and performs it — pushes, presents, selects a tab. It imports no UIKit directly; it depends on protocols a factory builds, so the whole flow is testable with a mock factory and a spy navigator that records navigation operations as data.

The tempting mistake is to build the coordinator first and let it parse the URL on the way to navigating — which is exactly the fused arrangement we started with, just relocated. Build the pure resolver first. The executor is the dumb part.

Three bugs, one missing idea: the gate

Auth, unsaved work, and cold start looked like three unrelated problems. They are the same thing. A precondition is a gate, and gates are symmetric:

Auth gates entering a destination.
Unsaved work gates leaving the current screen.
Readiness gates launching — dispatch waits until the app exists.

All three are one stash-and-resume loop, not three special cases. The engine emits .gated(precondition, resume:); something stashes the original intent, satisfies the precondition, then re-dispatches the same intent against fresh context. On the re-run the gate is now satisfied and resolution proceeds.

The re-entrant loop — a satisfied gate re-dispatches the original intent against fresh context

Here are the three bugs, mapped onto the loop.

Auth bypass (a mistake I made and caught). I first enforced preconditions at execution, inside the coordinator. The in-app navigation path didn't go through that check, so it bypassed the gate and users reached gated screens. I moved policy into the engine — which fixed the bypass and turned auth-gating into a pure, headless unit test. Owning that publicly is what made the rest of the design credible. This is the enter-gate.

A deep link destroying unsaved work. A push deep link tore down a screen behind an open form. The engine can't know a screen is dirty — that's view-model state — so the screen declares it, and the dispatcher inspects the plan-as-data before applying it: if the plan would tear down a dirty screen, it stashes the target and prompts Discard / Cancel / Save. This is the leave-gate — unsaved work as a precondition on leaving.

Cold-start links vanishing. On a cold launch the link is delivered before the root flow exists; immediate dispatch pushes onto nothing. A ready-gate in the dispatcher stashes the link, lets start() stand up the root, then drains the stash. This is the launch-gate.

The takeaway

These are not three patches. Enter, leave, launch — the same stash-and-resume loop. Once you see the gate, a whole class of interruption bugs becomes structurally impossible instead of individually patched.

A note on representation: a gate is a Result variant, .gated(resume:), not a thrown error. A gate isn't a failure — it has a destination (where to send the user to satisfy it) and a continuation (what to resume afterward). throw would lose both.

Migrate without a rewrite

You don't get to stop the world and replace the routing layer. So this went in route by route using the strangler-fig pattern: the engine shipped behind a facade, handled the routes it knew, and anything it didn't recognize fell through to the legacy path. No big-bang cutover, no frozen feature work, two code paths coexisting only for as long as the migration ran.

The thing that actually drove adoption wasn't a mandate — I had no authority to force every feature onto one contract. It was making the new way cheaper. Registering a route through the engine was less work than hand-rolling another branch, so people reached for it on their own. A migration that's easier than not-migrating finishes itself.

Before

Three hand-rolled entry points, each with its own parsing

Auth checked in some branches, skipped in others

Add a route = edit central code in two files + ship a release

Unknown route = print or no-op, invisible

Malformed URL = crash

After

One gate: every entry point calls engine.resolve

Precondition policy lives in the engine, applied uniformly

Add a route = declare data in the server manifest

Every resolution is one typed result you can measure

Malformed URL = typed .notFound, logged no-op

The server-driven manifest closed the velocity problem: routes became data shipped from the backend, so a campaign link no longer needed an app release. The cost is that the boundary has to stay tolerant — an unknown route from the server is handled, not trusted, and never allowed to crash the client.

And because every resolution is a single function returning a typed result, every one of them is a place you can measure. For the first time we knew which links failed and why — the routing layer became observable instead of a black box of scattered conditionals.

Tradeoffs I'd defend

Decision	I chose	Over	Why / what it cost
Layering	Resolution / execution split	One object that parses and navigates	Testability + one policy site; cost is indirection and a downcast at the UIKit seam
Precondition policy	In the engine, from context	In the coordinator	Consistent gating + headless tests; cost is in-app nav also routing through resolve
Gate representation	`.gated(resume:)` variant	`throw` an error	A gate has a destination and a continuation; it isn't a failure
Migration	Strangler-fig + facade fallback	Big-bang rewrite	No team had to stop; cost is two code paths coexisting during migration
Route source	Server-driven manifest	Compiled-in routes	Ship routes without a release; cost is a tolerant boundary
Concurrency	Serialized dispatch (actor/queue)	Fire-and-forget	`execute` awaits on modals; concurrent dispatch would interleave — an ordering bug, not a coverage gap

The concurrency line is worth dwelling on. execute is async and awaits on modal presents, so two deep links arriving at once would reconcile against a stale snapshot or present while a transition is in flight. That's an ordering bug — you close it with a serialization invariant (dispatch through an actor or a serial queue), not with more test coverage. A working part doesn't imply a working whole.

This essay is the incident: three bugs collapsing into one gate. The coordinator-and-lifecycle layer underneath — how execution actually reconciles a navigation stack across tabs and containers — is its own thing, covered in the companion piece on routing coordination.

The general lesson

The reusable idea here has nothing to do with deep links specifically: when a piece of logic both decides and acts, split the deciding from the acting. The decision becomes pure, typed, testable, and consistent; the action becomes a thin, dumb executor. Three bug classes — crashes, inconsistent auth, lost form data — stopped being three bugs and became one gate you could test. That's usually what "fix the architecture" actually means: find the place where a decision and a side effect got fused, and pull them apart.