0% completed
Problem
Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) is the primary key (combination of columns with unique values) of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
Problem Definition
The install date of a player is the first login day of that player.
We define day one retention of some date x
to be the number of players whose install date is x
and they logged back in on the day right after x
, divided by the number of players whose install date is x
, rounded to 2
decimal places.
Write a solution to report for each install date, the number of players that installed the game on that day, and the day one retention.
Example
Output
Try It Yourself
Solution
To determine the day one retention for each install date in the Activity
table, we need to analyze player login patterns. Specifically, we want to identify players who installed the game on a particular day (their first login day) and then check if they returned to log in the day immediately following their install date. We can efficiently accomplishes this by leveraging window functions and aggregation techniques. Below is a comprehensive breakdown of the approach, step-by-step execution, and detailed explanations for each line of the SQL query.
Approach Overview
-
Identify Install Dates:
- Determine the first login day (
install_dt
) for each player.
- Determine the first login day (
-
Calculate Installations per Date:
- Count the number of players who installed the game on each
install_dt
.
- Count the number of players who installed the game on each
-
Determine Day One Retention:
- For each
install_dt
, calculate the proportion of players who logged in again on the day immediately after their install date (install_dt + 1 day
).
- For each
-
Present the Results:
- Display each
install_dt
alongside the number of installations and the corresponding day one retention rate, rounded to two decimal places.
- Display each
SQL Query
SELECT install_dt, COUNT(DISTINCT player_id) AS installs, ROUND( SUM(CASE WHEN event_date = install_dt + INTERVAL 1 DAY THEN 1 ELSE 0 END) / COUNT(DISTINCT player_id), 2 ) AS Day1_retention FROM ( SELECT *, MIN(event_date) OVER (PARTITION BY player_id) AS install_dt FROM Activity ) TEMP GROUP BY install_dt;
Step-by-Step Approach
Step 1: Identify Install Dates
Determine the install date (install_dt
) for each player, which is defined as the first day the player logged in.
SQL Snippet:
SELECT *, MIN(event_date) OVER (PARTITION BY player_id) AS install_dt FROM Activity
Explanation:
-
SELECT *,
- Selects all columns from the
Activity
table for further processing.
- Selects all columns from the
-
MIN(event_date) OVER (PARTITION BY player_id) AS install_dt
MIN(event_date) OVER (...)
: Utilizes a window function to calculate the minimumevent_date
for eachplayer_id
. This represents the player's first login day.PARTITION BY player_id
: Divides the data into partitions based onplayer_id
, ensuring that the minimum date is calculated separately for each player.AS install_dt
: Aliases the resulting minimum date asinstall_dt
.
-
FROM Activity
- Specifies the
Activity
table as the data source.
- Specifies the
Intermediate Output After Step 1:
+-----------+-----------+------------+--------------+------------+ | player_id | device_id | event_date | games_played | install_dt | +-----------+-----------+------------+--------------+------------+ | 1 | 2 | 2016-03-01 | 5 | 2016-03-01 | | 1 | 2 | 2016-03-02 | 6 | 2016-03-01 | | 2 | 3 | 2017-06-25 | 1 | 2017-06-25 | | 3 | 1 | 2016-03-01 | 0 | 2016-03-01 | | 3 | 4 | 2016-07-03 | 5 | 2016-03-01 | +-----------+-----------+------------+--------------+------------+
Step 2: Calculate Installations per Date and Day One Retention
For each install_dt
, count the number of players who installed the game on that day and calculate the day one retention, which is the proportion of those players who logged back in on the day immediately following their install date.
SQL Snippet:
SELECT install_dt, COUNT(DISTINCT player_id) AS installs, ROUND( SUM(CASE WHEN event_date = install_dt + INTERVAL 1 DAY THEN 1 ELSE 0 END) / COUNT(DISTINCT player_id), 2 ) AS Day1_retention FROM ( -- Subquery from Step 1 ) TEMP GROUP BY install_dt;
Explanation:
-
SELECT install_dt,
- Selects the
install_dt
to group the results by each install date.
- Selects the
-
COUNT(DISTINCT player_id) AS installs,
- Counts the number of unique players (
player_id
) who haveinstall_dt
equal to the currentinstall_dt
being processed. COUNT(DISTINCT player_id)
ensures that each player is only counted once per install date.AS installs
aliases the count asinstalls
.
- Counts the number of unique players (
-
ROUND( SUM(CASE WHEN event_date = install_dt + INTERVAL 1 DAY THEN 1 ELSE 0 END) / COUNT(DISTINCT player_id), 2 ) AS Day1_retention
SUM(CASE WHEN event_date = install_dt + INTERVAL 1 DAY THEN 1 ELSE 0 END)
:- For each row, checks if the
event_date
is exactly one day after theinstall_dt
. - If true, adds
1
to the sum; otherwise, adds0
. - This effectively counts the number of players who returned on the day following their install date.
- For each row, checks if the
/ COUNT(DISTINCT player_id)
:- Divides the number of players who returned on day one by the total number of installations on that date.
- This calculates the retention rate as a decimal.
ROUND(..., 2)
:- Rounds the retention rate to two decimal places for readability.
AS Day1_retention
:- Aliases the rounded value as
Day1_retention
.
- Aliases the rounded value as
-
FROM ( SELECT ..., install_dt FROM Activity ) TEMP
- References the subquery (
TEMP
) from Step 1, which contains all activity records along with each player'sinstall_dt
.
- References the subquery (
-
GROUP BY install_dt;
- Groups the results by each
install_dt
to perform aggregations (COUNT
andSUM
) within each group.
- Groups the results by each
Final Output:
+------------------------+----------+----------------+ | install_dt | installs | Day1_retention | +------------------------+----------+----------------+ | 2016-03-01T00:00:00.000Z | 2 | 0.50 | | 2017-06-25T00:00:00.000Z | 1 | 0.00 | +------------------------+----------+----------------+
.....
.....
.....
Table of Contents
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible