Deep links arrive from three places — custom URL schemes, Universal Links, push notifications — and on this codebase each one entered through its own hand-rolled if/else branch. Three separate doors, accreted at three different times, each with its own vocabulary for the same destination (product in one, p in another, an embedded deeplink string in the third). That arrangement caused a pile of problems that looked unrelated until you saw the shape underneath:
- Crashes on malformed input. A deep link URL is attacker-controllable. Some branches force-subscripted the URL and assumed well-formed input, so a crafted link could crash the app — a recurring crash class, supplied from outside the binary.
- Inconsistent auth. The cart handler checked login before routing; the others didn't. Whether a link required a session depended on which door happened to handle it — so a user could reach a gated screen through the wrong entry point.
- A velocity bottleneck. Marketing couldn't ship a campaign link without an app release, because the route table lived in code, in two files, edited by hand.
- Silent data loss. A push deep link could land on a screen with a dirty form and blow away unsaved input, because nothing mediated the transition.
- Dropped cold-start links. A link delivered on cold launch arrived before the UI existed and simply vanished.
- No observability. An unknown route was a
printor a no-op. Nobody knew which links failed, or how often.
The reason this stayed invisible is that nobody owned "deep linking" as a whole. It was scattered across entry points and features, so no single feature felt the full cost. The work wasn't a hard algorithm. It was naming the scattered failure as one coherent reliability + security + velocity problem, and then owning it end to end.
Resolution is data; navigation is a side effect
The move that fixes all of these at once is to separate resolution from execution.
Turning an untrusted URL into a decision is resolution. Performing the navigation is execution. Keep them apart and the engine becomes a pure function over untrusted input — testable, validatable, and the single place policy lives.
The engine takes an untrusted URL and returns a RouteResolution — a typed decision as data. It does not navigate. It doesn't touch UIKit. It answers one question: given this URL, what should happen, and is it allowed?
enum RouteResolution {
case proceed(Route) // resolved, allowed
case gated(Precondition, resume: Route) // stash this, satisfy the gate, resume
case notFound(RouteError) // typed error — never a crash
}
// One gate. Every entry point — scheme, Universal Link, push — calls this.
let resolution = engine.resolve(url, context: context)
Inside resolve the steps are: match against a declared manifest → extract parameters → validate → evaluate precondition policy from injected context. Every one of those steps returns a value rather than performing an effect. So the malformed-input crash class disappears — a URL the engine can't parse returns .notFound(RouteError) instead of trapping. Auth stops being a per-branch accident, because the precondition policy lives in the engine and every entry point inherits the same rules. And because resolution is a pure function over its inputs, it's trivially unit-testable: feed it URLs and a context, assert on the decision, no navigation stack and no simulator required.
Execution lives downstream. A coordinator takes the decision and performs it — pushes, presents, selects a tab. It imports no UIKit directly; it depends on protocols a factory builds, so the whole flow is testable with a mock factory and a spy navigator that records navigation operations as data.
The tempting mistake is to build the coordinator first and let it parse the URL on the way to navigating — which is exactly the fused arrangement we started with, just relocated. Build the pure resolver first. The executor is the dumb part.
Three bugs, one missing idea: the gate
Auth, unsaved work, and cold start looked like three unrelated problems. They are the same thing. A precondition is a gate, and gates are symmetric:
- Auth gates entering a destination.
- Unsaved work gates leaving the current screen.
- Readiness gates launching — dispatch waits until the app exists.
All three are one stash-and-resume loop, not three special cases. The engine emits .gated(precondition, resume:); something stashes the original intent, satisfies the precondition, then re-dispatches the same intent against fresh context. On the re-run the gate is now satisfied and resolution proceeds.
Here are the three bugs, mapped onto the loop.
Auth bypass (a mistake I made and caught). I first enforced preconditions at execution, inside the coordinator. The in-app navigation path didn't go through that check, so it bypassed the gate and users reached gated screens. I moved policy into the engine — which fixed the bypass and turned auth-gating into a pure, headless unit test. Owning that publicly is what made the rest of the design credible. This is the enter-gate.
A deep link destroying unsaved work. A push deep link tore down a screen behind an open form. The engine can't know a screen is dirty — that's view-model state — so the screen declares it, and the dispatcher inspects the plan-as-data before applying it: if the plan would tear down a dirty screen, it stashes the target and prompts Discard / Cancel / Save. This is the leave-gate — unsaved work as a precondition on leaving.
Cold-start links vanishing. On a cold launch the link is delivered before the root flow exists; immediate dispatch pushes onto nothing. A ready-gate in the dispatcher stashes the link, lets start() stand up the root, then drains the stash. This is the launch-gate.
These are not three patches. Enter, leave, launch — the same stash-and-resume loop. Once you see the gate, a whole class of interruption bugs becomes structurally impossible instead of individually patched.
A note on representation: a gate is a Result variant, .gated(resume:), not a thrown error. A gate isn't a failure — it has a destination (where to send the user to satisfy it) and a continuation (what to resume afterward). throw would lose both.
Migrate without a rewrite
You don't get to stop the world and replace the routing layer. So this went in route by route using the strangler-fig pattern: the engine shipped behind a facade, handled the routes it knew, and anything it didn't recognize fell through to the legacy path. No big-bang cutover, no frozen feature work, two code paths coexisting only for as long as the migration ran.
The thing that actually drove adoption wasn't a mandate — I had no authority to force every feature onto one contract. It was making the new way cheaper. Registering a route through the engine was less work than hand-rolling another branch, so people reached for it on their own. A migration that's easier than not-migrating finishes itself.
- Three hand-rolled entry points, each with its own parsing
- Auth checked in some branches, skipped in others
- Add a route = edit central code in two files + ship a release
- Unknown route = print or no-op, invisible
- Malformed URL = crash
- One gate: every entry point calls engine.resolve
- Precondition policy lives in the engine, applied uniformly
- Add a route = declare data in the server manifest
- Every resolution is one typed result you can measure
- Malformed URL = typed .notFound, logged no-op
The server-driven manifest closed the velocity problem: routes became data shipped from the backend, so a campaign link no longer needed an app release. The cost is that the boundary has to stay tolerant — an unknown route from the server is handled, not trusted, and never allowed to crash the client.
And because every resolution is a single function returning a typed result, every one of them is a place you can measure. For the first time we knew which links failed and why — the routing layer became observable instead of a black box of scattered conditionals.
Tradeoffs I'd defend
| Decision | I chose | Over | Why / what it cost |
|---|---|---|---|
| Layering | Resolution / execution split | One object that parses and navigates | Testability + one policy site; cost is indirection and a downcast at the UIKit seam |
| Precondition policy | In the engine, from context | In the coordinator | Consistent gating + headless tests; cost is in-app nav also routing through resolve |
| Gate representation | .gated(resume:) variant | throw an error | A gate has a destination and a continuation; it isn't a failure |
| Migration | Strangler-fig + facade fallback | Big-bang rewrite | No team had to stop; cost is two code paths coexisting during migration |
| Route source | Server-driven manifest | Compiled-in routes | Ship routes without a release; cost is a tolerant boundary |
| Concurrency | Serialized dispatch (actor/queue) | Fire-and-forget | execute awaits on modals; concurrent dispatch would interleave — an ordering bug, not a coverage gap |
The concurrency line is worth dwelling on. execute is async and awaits on modal presents, so two deep links arriving at once would reconcile against a stale snapshot or present while a transition is in flight. That's an ordering bug — you close it with a serialization invariant (dispatch through an actor or a serial queue), not with more test coverage. A working part doesn't imply a working whole.
This essay is the incident: three bugs collapsing into one gate. The coordinator-and-lifecycle layer underneath — how execution actually reconciles a navigation stack across tabs and containers — is its own thing, covered in the companion piece on routing coordination.
The general lesson
The reusable idea here has nothing to do with deep links specifically: when a piece of logic both decides and acts, split the deciding from the acting. The decision becomes pure, typed, testable, and consistent; the action becomes a thin, dumb executor. Three bug classes — crashes, inconsistent auth, lost form data — stopped being three bugs and became one gate you could test. That's usually what "fix the architecture" actually means: find the place where a decision and a side effect got fused, and pull them apart.