Why A/B Testing in Games Is Harder Than You Think

Most game studios A/B test the wrong things, measure them wrong, and ship the wrong conclusions.

The standard playbook — split users 50/50, measure day-7 retention, ship the winner — misses so much that it’s often worse than intuition.

The interference problem

In social and multiplayer games, users affect each other. A/B test a matchmaking change and your control group experiences the treatment indirectly. Your p-value is wrong before you start.

What to measure

Retention is a lagging indicator. By the time it moves, you’ve already lost the signal. Early session depth, tutorial completion variance, and monetization conversion rate are better leading indicators for most interventions.

The solution

Stratified assignment, holdout groups, and a clear pre-registration of your primary metric before you look at the data. Not rocket science, but rare in practice.