Grokking SQL for Tech Interviews
Ask Author
Back to course home

0% completed

Vote For New Content
4. Game Play Analysis V
Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Problem

Table: Activity

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| player_id    | int     |
| device_id    | int     |
| event_date   | date    |
| games_played | int     |
+--------------+---------+
(player_id, event_date) is the primary key (combination of columns with unique values) of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.

Problem Definition

The install date of a player is the first login day of that player.

We define day one retention of some date x to be the number of players whose install date is x and they logged back in on the day right after x, divided by the number of players whose install date is x, rounded to 2 decimal places.

Write a solution to report for each install date, the number of players that installed the game on that day, and the day one retention.

Example

Image

Output

Image

Try It Yourself

MYSQL
MYSQL

. . . .

Solution

To determine the day one retention for each install date in the Activity table, we need to analyze player login patterns. Specifically, we want to identify players who installed the game on a particular day (their first login day) and then check if they returned to log in the day immediately following their install date. We can efficiently accomplishes this by leveraging window functions and aggregation techniques. Below is a comprehensive breakdown of the approach, step-by-step execution, and detailed explanations for each line of the SQL query.

Approach Overview

  1. Identify Install Dates:

    • Determine the first login day (install_dt) for each player.
  2. Calculate Installations per Date:

    • Count the number of players who installed the game on each install_dt.
  3. Determine Day One Retention:

    • For each install_dt, calculate the proportion of players who logged in again on the day immediately after their install date (install_dt + 1 day).
  4. Present the Results:

    • Display each install_dt alongside the number of installations and the corresponding day one retention rate, rounded to two decimal places.

SQL Query

SELECT install_dt, COUNT(DISTINCT player_id) AS installs, ROUND( SUM(CASE WHEN event_date = install_dt + INTERVAL 1 DAY THEN 1 ELSE 0 END) / COUNT(DISTINCT player_id), 2 ) AS Day1_retention FROM ( SELECT *, MIN(event_date) OVER (PARTITION BY player_id) AS install_dt FROM Activity ) TEMP GROUP BY install_dt;

Step-by-Step Approach

Step 1: Identify Install Dates

Determine the install date (install_dt) for each player, which is defined as the first day the player logged in.

SQL Snippet:

SELECT *, MIN(event_date) OVER (PARTITION BY player_id) AS install_dt FROM Activity

Explanation:

  1. SELECT *,

    • Selects all columns from the Activity table for further processing.
  2. MIN(event_date) OVER (PARTITION BY player_id) AS install_dt

    • MIN(event_date) OVER (...): Utilizes a window function to calculate the minimum event_date for each player_id. This represents the player's first login day.
    • PARTITION BY player_id: Divides the data into partitions based on player_id, ensuring that the minimum date is calculated separately for each player.
    • AS install_dt: Aliases the resulting minimum date as install_dt.
  3. FROM Activity

    • Specifies the Activity table as the data source.

Intermediate Output After Step 1:

+-----------+-----------+------------+--------------+------------+ | player_id | device_id | event_date | games_played | install_dt | +-----------+-----------+------------+--------------+------------+ | 1 | 2 | 2016-03-01 | 5 | 2016-03-01 | | 1 | 2 | 2016-03-02 | 6 | 2016-03-01 | | 2 | 3 | 2017-06-25 | 1 | 2017-06-25 | | 3 | 1 | 2016-03-01 | 0 | 2016-03-01 | | 3 | 4 | 2016-07-03 | 5 | 2016-03-01 | +-----------+-----------+------------+--------------+------------+

Step 2: Calculate Installations per Date and Day One Retention

For each install_dt, count the number of players who installed the game on that day and calculate the day one retention, which is the proportion of those players who logged back in on the day immediately following their install date.

SQL Snippet:

SELECT install_dt, COUNT(DISTINCT player_id) AS installs, ROUND( SUM(CASE WHEN event_date = install_dt + INTERVAL 1 DAY THEN 1 ELSE 0 END) / COUNT(DISTINCT player_id), 2 ) AS Day1_retention FROM ( -- Subquery from Step 1 ) TEMP GROUP BY install_dt;

Explanation:

  1. SELECT install_dt,

    • Selects the install_dt to group the results by each install date.
  2. COUNT(DISTINCT player_id) AS installs,

    • Counts the number of unique players (player_id) who have install_dt equal to the current install_dt being processed.
    • COUNT(DISTINCT player_id) ensures that each player is only counted once per install date.
    • AS installs aliases the count as installs.
  3. ROUND( SUM(CASE WHEN event_date = install_dt + INTERVAL 1 DAY THEN 1 ELSE 0 END) / COUNT(DISTINCT player_id), 2 ) AS Day1_retention

    • SUM(CASE WHEN event_date = install_dt + INTERVAL 1 DAY THEN 1 ELSE 0 END):
      • For each row, checks if the event_date is exactly one day after the install_dt.
      • If true, adds 1 to the sum; otherwise, adds 0.
      • This effectively counts the number of players who returned on the day following their install date.
    • / COUNT(DISTINCT player_id):
      • Divides the number of players who returned on day one by the total number of installations on that date.
      • This calculates the retention rate as a decimal.
    • ROUND(..., 2):
      • Rounds the retention rate to two decimal places for readability.
    • AS Day1_retention:
      • Aliases the rounded value as Day1_retention.
  4. FROM ( SELECT ..., install_dt FROM Activity ) TEMP

    • References the subquery (TEMP) from Step 1, which contains all activity records along with each player's install_dt.
  5. GROUP BY install_dt;

    • Groups the results by each install_dt to perform aggregations (COUNT and SUM) within each group.

Final Output:

+------------------------+----------+----------------+ | install_dt | installs | Day1_retention | +------------------------+----------+----------------+ | 2016-03-01T00:00:00.000Z | 2 | 0.50 | | 2017-06-25T00:00:00.000Z | 1 | 0.00 | +------------------------+----------+----------------+

.....

.....

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible