Talk | May 25, 2015

Boom Beach Data Science

A day in the life of a Boom Beach data scientist – talk given at Helsinki Data Science meetup

Outline

Boom Beach Background
Data science in a game team
- 90% of the daily work
Task force operation difficulty predictor
- 5% of the daily work

I Boom Beach

Come with a plan or leave in defeat!

Combat strategy game set in tropical archipelago
Attack hundreds of unique island bases
Explore a huge tropical archipelago
Raid other players’ bases
Defeat the evil Blackguard

II Data science in a game team

One data scientist in each (live) game team
Support the game development and maintenance work by data and stats
Very important: responsible for interpreting the statistics
Work is fast-paced, mostly short-term (same day) tasks

What’s a data scientist

Data scientist is an overloaded term

Game team data scientist: descriptive/exploratory work

What’s happening out there in the world among our players?
How are the players responding to the game product?

Support decision making

Fact checking before decisions
Learning after decisions

Practically “ad hoc” work

Write-only programming :)
Interactive querying and visualization
Writing reports
Relatively little A/B testing
(Almost) no data products

What’s a game team

Game team is a small (less than 20 member) team responsible for the game product

Our games are services that are continuously being improved

Game design principles

Making the games fun
Long lasting
Listening to players
Playing the game yourself

Game development is an artisanal craft

Everything comes from the experience and skill of the team
FALSE: “Supercell games are based on a data-driven formula”
Instead: The best people make the best games
Data scientist is only needed after a game goes to beta
- A game has to exist, and have players, to be able to generate any data

Customers of analytics

Game teams
Finance
Marketing
Player support
Community
Leadership
Everybody in the company!

Example KPI’s

So LTV increased because D7–180 retentions have improved while ARPDAU has stayed flat, or what’s the reason?
– someone somewhere

Retention: how many players return to the game d days after installation

Day-1 retention, day-7 retention, …
40%, 20%, 10% retention for D1, D7, D30 used to be considered “good enough”
Compare: Clash of Clans has 10% D720 retention

LTV: lifetime value, essentially ARPU at 180 or 360 days after installation

ARPU: average revenue per user, over d days after installation

Day-1 ARPU, day-7 ARPU, …

ARPDAU: average revenue per DAU

DAU: daily active users

Plus others, ad infinitum

MAU: monthly active users
Revenue: daily, weekly, monthly, …
Session count, length
New players: how many players installed the game
Concurrent players: how many players are logged in simultaneously
PLTV: predicted LTV, expected value of LTV in the future

Sliced and diced

By country, state, platform, marketing network, language, …

SELECT SUM(x), AVG(y) FROM table GROUP BY z

Some warnings

KPI’s are defined differently in different organizations
KPI’s are computed differently in different organizations
Data pipeline will influence how KPI’s will turn out

Example game analytics questions

What are our DAU and revenue going to be six months ahead?
Are the Dr. Terror levels and Task Force operations challenging enough for endgame players?
Was the $1 price point worth introducing?
What are the win rates by troops and troop combinations, and their recent trends?
How many players churn after game updates?
Is the PvP matchmaking working as intended?
How many bases do the players have on their maps?
How is the Arabic localization usage picking up?
Could you sample nice task force attacks to be replayed in the company lobby daily?
Could you generate a leaderboard of top Chinese players?
How important is player-vs-player (PvP) compared to player-vs-environment (PvE)?
Which troop combinations are being used the most and the least?
Are new troops replacing some older ones?
How many players are playing the in-game events (Dr. Terror, Gearheart, Hammerman attack)?
Are the Power Bases well-balanced?
Is the tutorial funnel working or is there a problem?
What’s the outcome of the recent TV campaigns?
How many riflemen were deployed during first year of Boom?
- 118,000,000,000 (of which only 36% survived)
How much resources do players have, gain, and consume by HQ level?
- One integer overflow bug was first found by staring at data
How many players logged in between 11:55–15:35 EET?
Are the push notifications valuable?

You probably got the point

…that the list is endless

Meta-questions

Why is metric X changing?
- Usually asked during the update cycle
Why is metrix X not changing?
- Usually asked after a game update
Why is metric X so good/bad in market Y?

III Task Force operation difficulty predictor

Currently the only in-game data product

Task Forces

Collaborative gameplay feature
Players can form Task Forces with up to 50 members in each
Task Forces organize collaborative attacks against the evil Blackguard
Each operation is run by one task force against one target

The problem

The solution overview

Inputs:

XP levels of all Task Force members: list of up to 50 integers, each between 12–62
Operation tier: integer between 1–20

There are in total more than $10^{15}$ different task force combinations.

Output:

Success probability (win rate), as a label: too easy, easy, normal, hard, impossible

Algorithm:

Logistic regression

The solution, with details

Inputs:

Encode operation tier $t={1, \ldots, 20}$ and player XP level $p={12, \ldots, 62}$ into a feature vector:

$\begin{aligned} \mathbf{x} = [&b, \ &x_{\textrm{tier}=1}, \ &\ldots, \ &x_{\textrm{tier}=20}, \ &\ldots, \ &x_{\textrm{xp}=p}, \ &\ldots, \ &x_{\textrm{tier}=t,, \textrm{xp}=p}] \end{aligned}$

Both the operation tier and XP levels are encoded as one-hot vectors and concatenated together.
Also, we add 20 × 51 interaction features for all combinations of operation tiers and XP levels
Note also the linear regressor bias term $b$ .

Labels come from whether the task force operations were successful or not:

$y \in {-1, 1}$

Output:

The logistic regressor predicts the win rate of a given task force in each operation:

$\hat{y} = \frac{1}{1 + \exp(-\mathbf{w}^T\mathbf{x})}$

Algorithm:

Logistic regression, loss function: $L(\mathbf{w}) = \sum_i \log (1 + \exp (-y_i \mathbf{w}^T\mathbf{x}_i)) + \frac{\lambda}{2} \mathbf{w}^T\mathbf{w}$

Choose weights $\mathbf{w}$ that minimize loss $L$
Set regularizer parameter $\lambda$ with held-out validation
Plug and play:

from sklearn.linear_model import LogisticRegression
model = LogisticRegression(C=1e3)
model.fit(X, y)

Done!?

Show to team, who are not happy

Instead, additional requirements emerge

The problem, take two

The difficulty goes up with the tiers (tier 1 > … > tier 20)
The difficulty goes down with XP levels (XP 1 < … < XP 62)
The difficulty goes down with more players (1 < … < 50)

The solution, take two

Transform the solution for constrained optimization.

Inputs

Design feature vector encoding for nonnegative weight vector values:

$\begin{aligned} \mathbf{x} = [&b, \ &x_{\textrm{tier}<2}, \ &\ldots, \ &x_{\textrm{tier}<20}, \ &x_{\textrm{xp}>11}, \ &\ldots, \ &x_{\textrm{xp}>61}, \ &x_{\textrm{tier}<t,, \textrm{xp}>p}] \end{aligned}$

Note that win rates should go up with decreasing tier and increasing XP.

For example,

Weight of $x_{\textrm{tier}<2}$ feature represents how much easier tier 1 is compared to tier 2
Weight of $x_{\textrm{xp}>61}$ feature represents how much better XP 62 player is compared to XP 61

Outputs

Same training labels, same predictions (win rate)

Algorithm

Same logistic regression, but with constraints: $w_j \ge 0$

There is no such thing as constrained logistic regression in scikit-learn
scipy.optimize module has high-quality constrained optimization routines, and we need the gradient of the loss function for those:

$\frac{\partial L}{\partial \mathbf{w}} = \sum_i \frac{-y_i\mathbf{x}_i \exp (-y_i \mathbf{w}^T\mathbf{x}_i)}{1 + \exp(-y_i \mathbf{w}^T\mathbf{x}_i)} + \lambda \mathbf{w}$

Then, run L-BFGS-B with the above constraints and the logistic loss and gradient functions.

Done!!

Operation difficulty labels — © Supercell

Practicalities

Training examples extracted from game events using Pig on EMR
Model fitting done on laptop
Weight vectors deployed as a hardcoded Java class, compiled into the game server
A simple web page/javascript implementation is good for development, testing, and also selling the result

Bonus question

What to do when the team wants to introduce two new operation tiers?

Slides