Grokking SQL for Tech Interviews
Ask Author
Back to course home

0% completed

Vote For New Content
13. Active Users
On this page

Problem

Problem Definition

Example

Output

Try It Yourself

Solution

Step 1: Identify consecutive logins within a 4-day window

Step 2: Group by user and login date, filter for users with consecutive logins on 4 different days

Step 3: Get unique active users

Problem

Table: Accounts

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| name          | varchar |
+---------------+---------+
id is the primary key (column with unique values) for this table.
This table contains the account id and the user name of each account.

Table: Logins

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| login_date    | date    |
+---------------+---------+
This table may contain duplicate rows.
This table contains the account id of the user who logged in and the login date. A user may log in multiple times in the day.

Problem Definition

Active users are those who logged in to their accounts for five or more consecutive days.

Write a solution to find the id and the name of active users.

Return the result table ordered by id.

Example

Image

Output

Image

Try It Yourself

MYSQL
MYSQL

. . . .

Solution

To identify active users who have logged in for five or more consecutive days, the SQL solution employs a self-join and aggregation. The SELECT clause retrieves distinct account IDs (l1.id) and uses a subquery to fetch the corresponding names from the "Accounts" table. The FROM clause specifies the "Logins" table with aliases (l1 and l2) for the self-join.

The self-join is established in the JOIN clause, connecting rows where the account IDs are the same (l1.id = l2.id) and the login dates have a difference between 1 and 4 days (DATEDIFF(l2.login_date, l1.login_date) BETWEEN 1 AND 4). This ensures that the login dates are consecutive.

The GROUP BY clause is used to group the results by account ID (l1.id) and login date (l1.login_date). The HAVING clause filters the groups to include only those where the count of distinct login dates (COUNT(DISTINCT l2.login_date)) is equal to 4, indicating five or more consecutive logins.

SELECT DISTINCT l1.id, (SELECT NAME FROM accounts WHERE id = l1.id) AS NAME FROM logins l1 JOIN logins l2 ON l1.id = l2.id AND Datediff(l2.login_date, l1.login_date) BETWEEN 1 AND 4 GROUP BY l1.id, l1.login_date HAVING Count(DISTINCT l2.login_date) = 4

Let's break down the query step by step:

Step 1: Identify consecutive logins within a 4-day window

We use a self-join on the Logins table (l1 and l2) to find consecutive logins for each user where the login dates are within a 4-day window.

SELECT l1.id, (SELECT NAME FROM Accounts WHERE id = l1.id) AS NAME FROM Logins l1 JOIN Logins l2 ON l1.id = l2.id AND Datediff(l2.login_date, l1.login_date) BETWEEN 1 AND 4;

Output After Step 1:

+----+----------+ | id | name | +----+----------+ | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | | 7 | Jonathan | +----+----------+

Step 2: Group by user and login date, filter for users with consecutive logins on 4 different days

We group the results from Step 1 by user and login date and filter out users who don't have logins on 4 different days.

SELECT l1.id, (SELECT NAME FROM Accounts WHERE id = l1.id) AS NAME, l1.login_date FROM Logins l1 JOIN Logins l2 ON l1.id = l2.id AND Datediff(l2.login_date, l1.login_date) BETWEEN 1 AND 4 GROUP BY l1.id, l1.login_date HAVING Count(DISTINCT l2.login_date) = 4;

Output After Step 2:

+----+----------+ | id | name | +----+----------+ | 7 | Jonathan | +----+----------+

Step 3: Get unique active users

We use DISTINCT to get unique active users from the filtered result.

SELECT DISTINCT id, NAME FROM ( SELECT l1.id, (SELECT NAME FROM Accounts WHERE id = l1.id) AS NAME, l1.login_date FROM Logins l1 JOIN Logins l2 ON l1.id = l2.id AND Datediff(l2.login_date, l1.login_date) BETWEEN 1 AND 4 GROUP BY l1.id, l1.login_date HAVING Count(DISTINCT l2.login_date) = 4 ) active_users;

Final Output:

+----+----------+ | id | name | +----+----------+ | 7 | Jonathan | +----+----------+

.....

.....

.....

Like the course? Get enrolled and start learning!

On this page

Problem

Problem Definition

Example

Output

Try It Yourself

Solution

Step 1: Identify consecutive logins within a 4-day window

Step 2: Group by user and login date, filter for users with consecutive logins on 4 different days

Step 3: Get unique active users