One UI, two platforms: a shared Rust core for server-driven UI

iOS and Android were re-implementing the same parse-and-validate logic and drifting. A UniFFI Rust core makes the rule set single-source, ships flows without app releases, and old clients degrade instead of crashing.

TL;DR

iOS and Android were hand-writing the same parse-and-validate logic and drifting. We moved it into one Rust core shared via UniFFI — the server decides what, the core decides parsing and rules, native decides how it looks. Flows ship from the server with no app release, the platforms stay in parity by construction, and old clients degrade to a safe no-op instead of crashing. The win wasn't Rust; it was deleting two copies of the same logic.

Two platforms, one feature, two implementations of the same logic — that's the quiet tax on every cross-platform SDK. iOS and Android each hand-built the chat UI parsing and the form validation, separately. Same rule, written twice, drifting apart asynchronously the moment one side shipped a fix the other didn't. "Passport country is required," "this field is min 6 characters" — written in Swift, written again in Kotlin. A rule fixed on iOS didn't land on Android, and nobody could see the disagreement until a customer hit it. And because the parsing lived in the clients, any change to a flow meant an SDK release, gated on every customer's deploy schedule. Conversation design couldn't iterate faster than the slowest integrator.

The SDK was an embedded chat surface inside large enterprise apps — multi-step intake forms, quick-replies, structured cards — and it changed constantly, because conversation design is iterative. Every change was a native code change, then an SDK release, then adoption on each customer's schedule, often weeks to months, across dozens of customers. The drift had many authors too: multiple platform teams consumed the SDK, so "the same rule, twice" was really "the same rule, N times."

The usual answer is server-driven UI: ship the screen as data, let the client render it. That's right, but it has a trap. If each client interprets that data with its own parsing and its own validation, you've just moved the duplication, not removed it. The drift comes back wearing a different hat.

Resolve once, render natively

The split that actually works is resolution vs execution.

Resolution — turning the server payload into a typed, validated view model, and running the validation rules — happens once, in a shared core.
Execution — laying out and drawing native views, measuring text, honoring Dynamic Type and RTL — stays fully native, because that's the part platforms genuinely do differently and shouldn't share.

I built the shared core in Rust and exported it to both Swift and Kotlin through UniFFI. The core has exactly two jobs, and exposes exactly two functions:

// One core, shared across platform teams. UniFFI exports it to BOTH
// Swift (iOS) and Kotlin (Android) from this single Rust crate.
#[uniffi::export]
fn parse_screen(json: String) -> Result<ScreenVm, SduiError>;     // JSON -> typed view models

#[uniffi::export]
fn validate(json: String, answers: Map<String, String>) -> ValidationOutcome;  // runs the backend's rules

UniFFI generates idiomatic Swift and Kotlin bindings from the one crate. iOS links a staticlib packaged as an XCFramework; Android links a .so. One implementation, every platform. Each platform maps the typed tree one-to-one onto native views. The native side never sees JSON, and owns no rule.

The result: parse and validation logic is written once. A rule change synchronizes across iOS and Android automatically, because there's only one implementation. The platforms can't drift on a rule, because they don't each own a copy of it.

The runtime loop

It isn't request/response — it's an event loop over a native-owned WebSocket, driven from two triggers that enter at different points. The server can push a frame onto the socket (the agent "speaks"). The user taps a rendered view — which enters at the view, goes straight to the view model, and only reaches the socket as an outbound send from the VM's effect runtime. Both converge at one pure transform: parse_screen.

The runtime loop — server push enters at the socket, a user tap enters at the view and goes direct to the VM; both converge on parse_screen, and effects loop back to the top

Two arrows carry the whole design. Orange is interaction in — a view calls vm.sink directly, bypassing the view controller. Green is the effect loop — the runtime performs the typed action (a WebSocket send, an HTTP call) and the response re-enters at the top. The view controller sits only on the output side: it observes the VM to re-render, and to perform view-effects (navigate, present) the VM can't do without UIKit.

Validation rules are data the backend ships

The core doesn't hard-code rules — it interprets them. The backend ships the rule inside the payload; the core evaluates it on-device. The server authors the rule; the core enforces it the same way on every platform.

// The rule travels INSIDE the payload. The core interprets it on-device.
{ "type": "textField", "properties": {
    "fieldId": "passportCountry",
    "validation": { "required": true, "message": "Passport country is required." } } }

This is what kills the drift at the root. The "required" rule isn't compiled into Swift and Kotlin; it's a value the server sends and the one core evaluates. Client validation here is for instant UX and shared logic — no round trip, identical on both platforms. The server stays the authoritative trust boundary and re-validates on submit; client validation can always be bypassed, so it's fast feedback, not the gate.

Actions are a typed contract; effects run at the edge

An interaction is data too, not a callback the native side invents. Each component carries a typed action the core owns; the native runtime is a bounded executor.

// The core owns the behaviour contract, not just the data shapes.
enum ButtonAction {
    Submit,
    OpenUrl { url: String },
    Subscribe { channel: String },     // native opens/joins the WS
    Network { request: RequestSpec },  // server composes ANY call — no release
    Unknown,                           // forward-compat for server skew
}
struct RequestSpec {
    method: HttpMethod,   // closed set: GET/POST/PUT/PATCH/DELETE
    path: String,         // RELATIVE — joined to a client-pinned base URL
    include: Vec<String>, // which client state to attach
}

The Network variant carries a constrained request spec, so the server can compose new calls without an app release while the client stays an executor, not an open proxy. The discipline that keeps a server-named request safe: a relative path against a client-pinned base URL (no arbitrary hosts), a closed verb set, and client-injected auth (the server can't set Authorization). So a new interaction is just data — only a brand-new capability the runtime can't perform needs a build.

And because the action is a typed sum type, native dispatch is an exhaustive switch: add a kind in Rust and it won't compile until every platform handles it, while an unknown kind from a newer server decodes to Unknown and becomes a safe no-op. Two boundaries, opposite policies: strict core-to-native (lockstep, exhaustive) and tolerant server-to-core (skew-safe via Unknown).

Forward compatibility is a design requirement, not a nice-to-have

The instant flows ship from the server, you have clients in the wild older than the flows they're being handed. A new component type will reach a core that predates it. If that crashes, server-driven UI is a liability, not a feature. With enterprise customers this is sharper: once a customer pins an SDK version into their app train, that version can stay frozen for a very long time. The floor is effectively permanent.

So the contract has an Unknown variant baked in: a core that meets a component it doesn't understand degrades gracefully instead of trapping. The native mapper renders it as a no-op view, never a crash. And the backend version-gates new components — it only sends a component to cores known to support it. The compiled core is the compile-time guard; Unknown is the runtime guard for the install base you can't upgrade.

// The native side never sees JSON and owns no rule.
switch vm.kind {
case let .textField(fieldId, label, _, value, _, error):
    return TextFieldView(fieldId, label, value, error)
case let .picker(fieldId, label, options, value, _, error):
    return PickerView(fieldId, label, options, value, error)
case let .unknown(type):
    return UnknownView(type)   // forward-compat: a no-op, never a crash
}

The crate was owned by a platform team; feature teams pulled a pinned, versioned artifact (XCFramework or AAR) from an internal repo, not source. Additive changes were easy. A breaking change to the view-model vocabulary was a build failure on every consumer — which is exactly the property you want, because it means drift can't sneak in. The view-model vocabulary was the public contract, held to API-review discipline, and getting it right early is the decision that ossifies.

Render model vs view model, and the UIKit boundary

The trap is the word "view model." The Rust output is not an MVVM ViewModel — it's a render model: an immutable element tree, like a React element or a SwiftUI View value, and it legitimately carries layout, type, and children. The MVVM ViewModel is the one stateful object the view controller binds to. Separate them and the unease evaporates: exactly one stateful object (the store), a tree of immutable render nodes (data), and a tree of dumb views. The mapper is a stateless translator; child nodes are data, never per-component view models.

The dependency direction does the enforcing: the VM imports Foundation, not UIKit — only the view layer (VC plus mapper) touches UIKit. Split those into modules and the compiler enforces it, because the store literally cannot import UIKit when it isn't linked. The payoff is a clean test split (VM gets fast logic tests, the mapper gets snapshot tests) and the same core plus store could drive a SwiftUI or Android renderer unchanged.

Layers, the two channels, and the UIKit boundary — dependencies point inward; the VM/store imports Foundation, not UIKit; interactions flow down via vm.sink, renders flow up via observation

How does a deeply nested interaction reach the VM? The mapper threads the same sink down through the recursion — every leaf, at any depth, captures it. You don't bubble the event up the view tree; you inject the destination down. A button six levels deep calls vm.sink directly; containers relay nothing.

func makeView(_ node, sink: @escaping (Event) -> Void) -> UIView {
  switch node.kind {
  case let .container(_, _, children):
      children.forEach { stack.addArrangedSubview(makeView($0, sink: sink)) }  // same sink, down
  case let .button(_, action):  v.onTap = { sink(.action(action)) }     // the LEAF captures it
  case let .textField(fieldId, ...): v.onChange = { sink(.input(fieldId, $0)) }
  }
}
// wired once in the VC:   makeView(spec.root, sink: vm.sink)
// one handler — vm.handle — every leaf in the tree funnels into it.

Layout: declared in the core, computed natively

This is the part of the resolution/execution boundary that earns its keep. The core sends a semantic layout tree — containers (stack, grid, inset) with an axis, a spacing token, an alignment, and per-child sizing semantics (hug content, fill available, or weighted). No points, no frames, no coordinates. That half is easy.

The hard half is turning it into pixels, and it's native because every input is native: text measurement (font metrics, line breaking), the Dynamic Type scale, the available width, safe areas, RTL. The crux is width-dependent height — a chat bubble's height depends on the width it's given, because text wraps. So layout is a recursive measure-and-place pass, parameterized by available width — a constrained flexbox over the semantic tree:

func layout(node, availableWidth) -> (size, childFrames) {
  switch node.kind {
  case .text(s):     measureText(s, font, maxWidth: availableWidth)   // native text stack
  case .vStack(gap): var y = 0
                    for c in children { (cs,_) = layout(c, availableWidth)
                                    place c at (0, y); y += cs.height + gap }
                    size = (availableWidth, y - gap)
  case .hStack(gap): split availableWidth by hug / fill / weight,
                    layout each child in its slice, height = max(childHeights)
  case .image(aspect): height = availableWidth / aspect
  }
}  // top-down for width, bottom-up for height — resolved once per node

Measurement is native because only the platform can measure its own text, so the layout computation is native; the core owns the spec (the rules), not the math. A hybrid where the core runs the flexbox and calls native to measure each text run is possible, but that's an FFI hop per text node — far too chatty for a scrolling transcript — so the whole pass stayed native.

Keeping it inside the frame budget

A chat transcript is a fast-scrolling list of rich bubbles, so where this runs matters more than the algorithm. Self-sizing Auto Layout cells don't scale here: Auto Layout solves constraints on the main thread, per cell, and deep nested stacks make every pass expensive enough to drop frames. A 60fps scroll gives you roughly 16ms per frame to work with, and constraint solving in the scroll hot path eats it.

The fix is to compute the expensive part off the main thread and cache it. The measure-and-place pass runs on a background queue — TextKit measurement is safe off-main — and produces a flat list of frames cached per (message id, width). By the time a cell is on screen, its layout is already computed; the cell just sets frames, with no constraint solving in the hot path. The cache invalidates on a width change (rotation, split view) and on a Dynamic Type change. Static, non-scrolling screens used plain UIStackView plus Auto Layout — right tool per surface. That's the AsyncDisplayKit / LayoutKit pattern high-volume chat and feed apps converge on: never measure on the main thread when you can pre-measure off it.

One more boundary this settles: the view models are immutable values copied across FFI, so where does the user's typed input live? The view model is a render snapshot. The native input view writes the user's value into a native FormStore, keyed by fieldId; on submit those answers go to validate() and the server. The core is stateless — in-flight input lives natively, and the server stays the authoritative source of truth. That statelessness is also why calling the core off-main is safe by construction: there's no global mutable state to race.

Trade-offs I'd raise before being asked

A shared Rust core is not free. The honest costs:

Before

Parse + validation built twice (iOS + Android)

A rule change shipped twice, drifts

A UI change = SDK release × N customers

Old SDK + new component = crash risk

After

Parse + validation: one Rust crate, shared

A rule change: one crate → both platforms

A UI change: server payload, no release

Old SDK + new component: Unknown → safe no-op

Build complexity. Multi-toolchain: cargo plus per-Apple-target builds plus XCFramework packaging (plus NDK/AAR for Android). swift build alone no longer ships the app. Real operational cost; the payoff is deleting two native parser/validator stacks.
Binary size and launch. The Rust staticlib links into the host app. Mitigated with panic=abort, LTO, size-opt plus strip, and a dependency-light crate. It's off the cold-start path — the core only runs when a chat screen renders, not at launch.
Debuggability. Panics and symbolication cross the boundary. Result mapped to Swift throws keeps malformed payloads catchable; uploading the Rust symbols makes mixed Swift/Rust stacks readable in the crash reporter. Getting Rust symbols into the iOS crash pipeline was the genuinely fiddly part, and the first thing I'd budget time for again.
Marshalling. UniFFI copies values across the C ABI — no shared references. Fine for a screen-sized tree; you would not stream large data this way. The native side gets plain value types owned by ARC, with no lifetime management.
Team and skill. The org now maintains Rust. Contained deliberately: Rust lives in one bounded core with a small API; feature teams consume generated Swift/Kotlin and never write Rust. A scoped bet, not "rewrite everything in Rust."

A reasonable challenge: why Rust and not just an IDL like protobuf or FlatBuffers with per-language codegen? An IDL gives you typed shapes in every language with no FFI — and if shapes were all we shared, that's the lighter, correct tool. But we shared behavior: the parse logic and the validation-rule interpreter. Sharing types points to an IDL; sharing brains points to a real shared implementation. We were sharing brains.

And why a client core at all — why not have the server send fully-rendered view models so the client never parses? Because you still need client-side behavior the server can't do: instant validation with no round trip, value echoing, offline. A thin per-platform "JSON to struct" mapper just re-introduces the drift we were killing. The core is where shared client behavior lives, written once.

What I'd take to the next one

Separate the data decision from the side effect. "What should this screen be" is pure and shareable; "draw it" is platform-specific and shouldn't be. Most cross-platform pain comes from sharing the wrong half. Rust owns what and rules; native owns how it looks and feels — gestures, accessibility, Dynamic Type, design-system fidelity. SDUI moves composition and logic to shared code, not capability; a genuinely new interaction still needs a native component.
Design the unknown case first. With server-driven anything, the old-client-meets-new-payload path is the one that decides whether the system is safe to operate. An Unknown variant plus backend version-gating turns "ship a flow" from a risky release into a routine one.
Single-source the rules, not just the rendering. Sharing the renderer but duplicating the validation is the subtle version of the same drift you were trying to kill.

The takeaway

The win wasn't Rust — it was deleting two implementations of the same logic. iOS and Android had each hand-built parsing and form validation, and they drifted. We wrote parse plus validation once, in a core both platforms bind via UniFFI, and mapped the typed result 1:1 to native views. Server decides what, the core decides parsing and rules, native decides how it looks — three layers, each owning exactly one decision.

The headline outcome is the boring one, which is the point: flows ship from the server without an app release, iOS and Android stay in parity by construction, and old clients never crash on something new. The win was deleting an entire class of cross-platform inconsistency, not posting a number.