Talk |

Boom Beach Data Science

A day in the life of a Boom Beach data scientist – talk given at Helsinki Data Science meetup
Boom Beach
© Supercell

Outline

  1. Boom Beach Background
  2. Data science in a game team
    • 90% of the daily work
  3. Task force operation difficulty predictor
    • 5% of the daily work

I Boom Beach

Come with a plan or leave in defeat!

  • Combat strategy game set in tropical archipelago
  • Attack hundreds of unique island bases
  • Explore a huge tropical archipelago
  • Raid other players’ bases
  • Defeat the evil Blackguard

II Data science in a game team

  • One data scientist in each (live) game team
  • Support the game development and maintenance work by data and stats
  • Very important: responsible for interpreting the statistics
  • Work is fast-paced, mostly short-term (same day) tasks

What’s a data scientist

Data scientist is an overloaded term

Game team data scientist: descriptive/exploratory work

  • What’s happening out there in the world among our players?
  • How are the players responding to the game product?

Support decision making

  • Fact checking before decisions
  • Learning after decisions

Practically “ad hoc” work

  • Write-only programming :)
  • Interactive querying and visualization
  • Writing reports
  • Relatively little A/B testing
  • (Almost) no data products

What’s a game team

Game team is a small (less than 20 member) team responsible for the game product

  • Our games are services that are continuously being improved
Game team

Game design principles

  • Making the games fun
  • Long lasting
  • Listening to players
  • Playing the game yourself

Game development is an artisanal craft

  • Everything comes from the experience and skill of the team
  • FALSE: “Supercell games are based on a data-driven formula”
  • Instead: The best people make the best games
  • Data scientist is only needed after a game goes to beta
    • A game has to exist, and have players, to be able to generate any data

Customers of analytics

  • Game teams
  • Finance
  • Marketing
  • Player support
  • Community
  • Leadership
  • Everybody in the company!

Example KPI’s

So LTV increased because D7–180 retentions have improved while ARPDAU has stayed flat, or what’s the reason?
– someone somewhere

Retention: how many players return to the game d days after installation

  • Day-1 retention, day-7 retention, …
  • 40%, 20%, 10% retention for D1, D7, D30 used to be considered “good enough”
  • Compare: Clash of Clans has 10% D720 retention

LTV: lifetime value, essentially ARPU at 180 or 360 days after installation

ARPU: average revenue per user, over d days after installation

  • Day-1 ARPU, day-7 ARPU, …

ARPDAU: average revenue per DAU

DAU: daily active users

Plus others, ad infinitum

  • MAU: monthly active users
  • Revenue: daily, weekly, monthly, …
  • Session count, length
  • New players: how many players installed the game
  • Concurrent players: how many players are logged in simultaneously
  • PLTV: predicted LTV, expected value of LTV in the future

Sliced and diced

By country, state, platform, marketing network, language, …

SELECT SUM(x), AVG(y) FROM table GROUP BY z

Some warnings

  1. KPI’s are defined differently in different organizations
  2. KPI’s are computed differently in different organizations
  3. Data pipeline will influence how KPI’s will turn out

Example game analytics questions

  • What are our DAU and revenue going to be six months ahead?
  • Are the Dr. Terror levels and Task Force operations challenging enough for endgame players?
  • Was the $1 price point worth introducing?
  • What are the win rates by troops and troop combinations, and their recent trends?
  • How many players churn after game updates?
  • Is the PvP matchmaking working as intended?
  • How many bases do the players have on their maps?
  • How is the Arabic localization usage picking up?
  • Could you sample nice task force attacks to be replayed in the company lobby daily?
  • Could you generate a leaderboard of top Chinese players?
  • How important is player-vs-player (PvP) compared to player-vs-environment (PvE)?
  • Which troop combinations are being used the most and the least?
  • Are new troops replacing some older ones?
  • How many players are playing the in-game events (Dr. Terror, Gearheart, Hammerman attack)?
  • Are the Power Bases well-balanced?
  • Is the tutorial funnel working or is there a problem?
  • What’s the outcome of the recent TV campaigns?
  • How many riflemen were deployed during first year of Boom?
    • 118,000,000,000 (of which only 36% survived)
  • How much resources do players have, gain, and consume by HQ level?
    • One integer overflow bug was first found by staring at data
  • How many players logged in between 11:55–15:35 EET?
  • Are the push notifications valuable?

You probably got the point

…that the list is endless

Meta-questions

  1. Why is metric X changing?
    • Usually asked during the update cycle
  2. Why is metrix X not changing?
    • Usually asked after a game update
  3. Why is metric X so good/bad in market Y?

III Task Force operation difficulty predictor

  • Currently the only in-game data product

Task Forces

  • Collaborative gameplay feature
  • Players can form Task Forces with up to 50 members in each
  • Task Forces organize collaborative attacks against the evil Blackguard
  • Each operation is run by one task force against one target

The problem

Operations map
© Supercell

The solution overview

Inputs:

  1. XP levels of all Task Force members: list of up to 50 integers, each between 12–62
  2. Operation tier: integer between 1–20

There are in total more than 101510^{15} different task force combinations.

Output:

Success probability (win rate), as a label: too easy, easy, normal, hard, impossible

Algorithm:

Logistic regression

The solution, with details

Inputs:

Encode operation tier t={1,,20}t={1, \ldots, 20} and player XP level p={12,,62}p={12, \ldots, 62} into a feature vector:

x=[b,xtier=1,,xtier=20,,xxp=p,,xtier=t,xp=p] \begin{aligned} \mathbf{x} = [&b, \ &x_{\textrm{tier}=1}, \ &\ldots, \ &x_{\textrm{tier}=20}, \ &\ldots, \ &x_{\textrm{xp}=p}, \ &\ldots, \ &x_{\textrm{tier}=t,, \textrm{xp}=p}] \end{aligned}

  • Both the operation tier and XP levels are encoded as one-hot vectors and concatenated together.
  • Also, we add 20 × 51 interaction features for all combinations of operation tiers and XP levels
  • Note also the linear regressor bias term bb .

Labels come from whether the task force operations were successful or not:

y{1,1}y \in {-1, 1}

Output:

The logistic regressor predicts the win rate of a given task force in each operation:

y^=11+exp(wTx) \hat{y} = \frac{1}{1 + \exp(-\mathbf{w}^T\mathbf{x})}

Algorithm:

Logistic regression, loss function: L(w)=ilog(1+exp(yiwTxi))+λ2wTw L(\mathbf{w}) = \sum_i \log (1 + \exp (-y_i \mathbf{w}^T\mathbf{x}_i)) + \frac{\lambda}{2} \mathbf{w}^T\mathbf{w}

  • Choose weights w\mathbf{w} that minimize loss LL
  • Set regularizer parameter λ\lambda with held-out validation
  • Plug and play:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(C=1e3)
model.fit(X, y)

Done!?

Show to team, who are not happy

Instead, additional requirements emerge

The problem, take two

  1. The difficulty goes up with the tiers (tier 1 > … > tier 20)
  2. The difficulty goes down with XP levels (XP 1 < … < XP 62)
  3. The difficulty goes down with more players (1 < … < 50)

The solution, take two

Transform the solution for constrained optimization.

Inputs

Design feature vector encoding for nonnegative weight vector values:

x=[b,xtier<2,,xtier<20,xxp>11,,xxp>61,xtier<t,xp>p] \begin{aligned} \mathbf{x} = [&b, \ &x_{\textrm{tier}<2}, \ &\ldots, \ &x_{\textrm{tier}<20}, \ &x_{\textrm{xp}>11}, \ &\ldots, \ &x_{\textrm{xp}>61}, \ &x_{\textrm{tier}<t,, \textrm{xp}>p}] \end{aligned}

Note that win rates should go up with decreasing tier and increasing XP.

For example,

  • Weight of xtier<2x_{\textrm{tier}<2} feature represents how much easier tier 1 is compared to tier 2
  • Weight of xxp>61x_{\textrm{xp}>61} feature represents how much better XP 62 player is compared to XP 61

Outputs

Same training labels, same predictions (win rate)

Algorithm

Same logistic regression, but with constraints: wj0 w_j \ge 0

  • There is no such thing as constrained logistic regression in scikit-learn
  • scipy.optimize module has high-quality constrained optimization routines, and we need the gradient of the loss function for those:

Lw=iyixiexp(yiwTxi)1+exp(yiwTxi)+λw \frac{\partial L}{\partial \mathbf{w}} = \sum_i \frac{-y_i\mathbf{x}_i \exp (-y_i \mathbf{w}^T\mathbf{x}_i)}{1 + \exp(-y_i \mathbf{w}^T\mathbf{x}_i)} + \lambda \mathbf{w}

Then, run L-BFGS-B with the above constraints and the logistic loss and gradient functions.

Done!!

Operation difficulty labels
© Supercell

Practicalities

  • Training examples extracted from game events using Pig on EMR
  • Model fitting done on laptop
  • Weight vectors deployed as a hardcoded Java class, compiled into the game server
  • A simple web page/javascript implementation is good for development, testing, and also selling the result

Bonus question

  • What to do when the team wants to introduce two new operation tiers?

Slides

Title slide