Raised Multiplayer Game Capacity from ~80 to 400+ Users on GKE

A real-time multiplayer music-guessing game that crashed past 70–80 players, re-architected on GKE to hold 400+.

DJs host a jam, play a set of songs, and players race to guess them in a bingo-like pattern. The product worked. The infrastructure didn't: sessions fell over once 70–80 players joined. I came in as the DevOps engineer to make it hold real audiences. It looked like a plain capacity problem — add resources as players climb — but the spike was tied to a gameplay event, not a smooth curve.

~80 → 400+

Concurrent users on GKE, without crashes

6 min → 2 min

Next.js CI build time

100–150

Players per pod — the load-tested sweet spot

Before

Crashes at 70–80 players. API and Hocuspocus share one pod, so a collaboration-server failure takes the API down with it.

After

400+ players stable. API and Hocuspocus isolated in their own pods, Next.js scaled by KEDA on an event-driven player-threshold metric.

From crash-at-80 to 400+ stable

Re-architecting for the real spike

The fix wasn't more pods. It was tying the scaling rule to the actual product event, splitting the coupled backend, and letting load tests decide pod size.

1The shape of it

The scaled architecture

Three moving parts, cleanly separated. The frontend scales on a signal that reflects what players are actually doing; the backend services no longer share a failure domain; and every pod is scraped so scaling decisions run on real numbers, not guesses.

Architecture diagram of a multiplayer game on GKE: players reach Next.js frontend pods scaled by KEDA against a custom player-threshold metric API, with the API service and the Hocuspocus collaboration server split into separate single-container pods; kube-prometheus-stack scrapes all three for the metrics that drive scaling. — GKE topology after the split

Player count crosses the waiting-room threshold, the custom metric API reports it, and KEDA scales the Next.js pods for the transition — ahead of the load, where CPU-based HPA reacts only after the resource curve moves.

2Why HPA didn't fit

The spike wasn't linear

Watching the crashes, the pattern was consistent: the frontend peaked the moment a room transitioned from the waiting room into gameplay. HPA on CPU/memory reacts after the resource curve moves, which was already too late for a synchronized join. So I put a metric API in front of it that polls player count every few seconds, and let KEDA scale the Next.js pods the instant the waiting-room threshold is crossed — ahead of the load, not behind it.

Splitting the backend came out of the same observability pass. Hocuspocus was sharing a pod with the API; when it fell over under socket load it took the API with it. One container per pod fixed that, and stability jumped immediately. kube-prometheus-stack and Headlamp are what made all of this visible in the first place.

Many small Hocuspocus pods each broadcasting to every other pod produces n*(n-1) inter-pod overhead; fewer, larger pods sized at 100–150 players avoid it.

3The counterintuitive one

The n*(n-1) trap

My first instinct was to scale Hocuspocus out horizontally with a Redis extension. The load tests said no. Because each pod broadcasts updates to all the others, n pods mean n*(n-1) interactions — the overhead grew faster than the capacity, so scaling out actively hurt. The answer was fewer, bigger pods sized to the sweet spot the tests kept pointing at: 100–150 players each.

Engineering decision

Chose

KEDA with a custom player-threshold metric

Over

Standard HPA on CPU/memory

Why: the resource spike was tied to a specific gameplay event (waiting room → gameplay), not proportional to CPU/memory

Engineering decision

Chose

Fewer, larger Hocuspocus pods (100–150 players each)

Over

Many small pods scaled horizontally with a Redis extension

Why: each pod broadcast updates to every other pod — n*(n-1) inter-pod overhead that made scaling worse, not better

I didn't treat 400+ as a vanity number. The load tests reproduced the exact flow that broke production — players joining the waiting room, moving into gameplay together, then generating socket-heavy Hocuspocus traffic — and I watched three signals through each run: frontend behavior across the waiting-room-to-gameplay transition, Hocuspocus CPU and memory as socket activity climbed, and whether API stability held when collaboration traffic got noisy.

§Faster CI: 6 min → 2 min

Scaling the infrastructure was half the job. The Next.js build ran over 6 minutes, which throttled iteration. Four changes removed the avoidable work:

Turbo prune — each build pulled only the dependencies it needed instead of the whole monorepo.
Next.js standalone output + multi-stage Docker — runtime images carried only what they ran, without the extra layers.
GitHub Actions caching — unchanged layers and dependencies stopped rebuilding on every run.
Turborepo remote caching — repeated builds across branches and pipelines reused prior work.

The Docker context stopped copying the entire monorepo into every image, and the feedback loop went from wait-heavy to something the team could run and review often without blocking product work.

§What I'd watch in production

The setup holds under the tested flow, but a few things are worth keeping an eye on as real traffic arrives:

KEDA scale-up latency vs. join spikes — if a room fills faster than pods come up, the transition can still outrun the metric.
Hocuspocus memory ceiling per pod — 100–150 is the tested band; a longer or heavier session could push a pod past it.
Socket reconnect storms — a network blip that reconnects many players at once looks a lot like the join spike KEDA is tuned for.
Pod cold-start during the transition — a cold Next.js pod arriving mid-transition is the worst-case timing to design against.

§Key results

Raised capacity from ~80 to 400+ concurrent players on GKE, without crashes.
Event-driven autoscaling with KEDA on a custom player-threshold metric, tied to the real gameplay event.
API and Hocuspocus isolated into separate pods, ending the shared failure domain.
Load-tested pod sizing at 100–150 players, chosen against n*(n-1) broadcast overhead.
Next.js CI build time down from 6 min to 2 min.