Shipping a release without dropping a live call: the version-retention pattern
How we kept WebRTC video sessions alive across deploys on a real-time consultation platform during the COVID demand spike.
For two years I worked on a real-time video consultation platform — banking advisors, telemedicine, retail apparel demos, all running over WebRTC. The MVP version of the product worked at small scale, but COVID drove demand sharply, and our deployment cadence kept hitting the same wall.
The wall was deceptively simple: a real-time call is the worst place to lose a request mid-flight. Standard zero-downtime deploys (rolling restart, load balancer drain, etc.) all have a window where in-flight sessions either hang, retry, or drop. For a 2-minute support call, that's an inconvenience. For a 30-minute banking advisory session, it's a lost customer.
What didn't work
The first instinct was to make deploys instant — reduce the rollout window. We tried it. The window got shorter; dropped calls got rarer; but the failure mode stayed the same. Some user, somewhere, was going to land in the bad window.
The second instinct was sticky sessions: route a user to whichever version they connected against. That works in theory but breaks the moment your application's signaling state (database row, cache entry, lock) gets touched by a different version. Sticky routing doesn't make state coherent.
The pattern that worked
We stopped treating "previous version" as something to retire. Instead, every release leaves the previous build live for a retention window. Old sessions finish on the version they started on. New sessions get the new version. No call sees a mid-flight version swap.
In practice that meant:
- Two builds running in parallel during the retention window
- A small router upstream that maps existing sessions to their original version
- A connection-drain that waits for active sessions to end naturally, capped at a max retention duration
It's not zero-downtime in the literal sense — there are two versions running. It's something better: zero-disruption. The user-visible system has no version concept at all.
What I'd do differently
The retention window is a parameter you have to pick carefully. Too short and you're back to dropped calls. Too long and old versions accumulate.
We landed on a value that worked for our session-length distribution, but I'd build that into the deploy tooling next time — making the window adapt to recent P95 session length instead of being a hard constant. The right number isn't a constant; it's a function of how your users actually use the product, and that drifts.