Grokking SQL for Tech Interviews
Ask Author
Back to course home

0% completed

Vote For New Content
2. Human Traffic of Stadium
Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Problem

Table: Stadium

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| visit_date    | date    |
| people        | int     |
+---------------+---------+
visit_date is the column with unique values for this table.
Each row of this table contains the visit date and visit id to the stadium with the number of people during the visit.
As the id increases, the date increases as well.

Problem Definition

Write a solution to display the records with three or more rows with consecutive id's, and the number of people is greater than or equal to 100 for each.

Return the result table ordered by visit_date in ascending order.

Example

Image

Output

Image

Try It Yourself

MYSQL
MYSQL

. . . .

Solution

To identify and display records from the Stadium table where three or more consecutive id's have a number of people greater than or equal to 100, we can utilize SQL's window functions and Common Table Expressions (CTEs). This approach allows us to group consecutive qualifying records efficiently and filter out those groups that meet the specified criteria. The final results will be ordered by visit_date in ascending order.

Approach Overview

  1. Filter Records with people ≥ 100:

    • Select only those records where the number of people is greater than or equal to 100. These are the records of interest for identifying consecutive groups.
  2. Assign Group Identifiers for Consecutive id's:

    • Use the difference between the id and a sequential row number to assign a unique group identifier (grp) for consecutive records. This technique ensures that consecutive id's share the same grp value.
  3. Identify Groups with Three or More Consecutive Records:

    • Group the filtered records by the calculated grp and count the number of records in each group. Retain only those groups that have three or more records.
  4. Retrieve the Desired Records:

    • Join the qualifying groups back to the filtered records to extract the complete details (id, visit_date, people) of each record within these groups.
  5. Order the Results:

    • Sort the final output by visit_date in ascending order to present the data chronologically.

SQL Query

WITH ConsecutiveGroups AS ( -- Step 1: Assign groups where `people` >= 100 and `id`'s are consecutive SELECT id, visit_date, people, id - ROW_NUMBER() OVER (ORDER BY id) AS grp FROM Stadium WHERE people >= 100 ), GroupedCounts AS ( -- Step 2: Count the number of rows in each group SELECT grp, COUNT(*) AS group_size FROM ConsecutiveGroups GROUP BY grp HAVING COUNT(*) >= 3 ) -- Step 3: Retrieve rows from groups with 3 or more consecutive rows SELECT c.id, c.visit_date, c.people FROM ConsecutiveGroups c JOIN GroupedCounts g ON c.grp = g.grp ORDER BY c.visit_date;

Step-by-Step Explanation

Step 1: Assigning Group Identifiers (ConsecutiveGroups CTE)

We begin by filtering the Stadium records to include only those where the people count is greater than or equal to 100. For these filtered records, we assign a group identifier (grp) based on the difference between the id and a sequential row number. This method ensures that consecutive id's share the same grp value, effectively grouping them together.

WITH ConsecutiveGroups AS ( SELECT id, visit_date, people, id - ROW_NUMBER() OVER (ORDER BY id) AS grp FROM Stadium WHERE people >= 100 )

Explanation of Each Line:

  1. WITH ConsecutiveGroups AS (

    • Initiates a Common Table Expression (CTE) named ConsecutiveGroups.
  2. SELECT id, visit_date, people,

    • Selects the id, visit_date, and people columns from the Stadium table.
  3. id - ROW_NUMBER() OVER (ORDER BY id) AS grp

    • Utilizes the ROW_NUMBER() window function to assign a sequential number to each row based on the ascending order of id.
    • Subtracts this row number from the id to compute a group identifier (grp). For consecutive id's, this difference remains constant, thereby grouping them together.
  4. FROM Stadium

    • Specifies the Stadium table as the source of data.
  5. WHERE people >= 100

    • Filters the records to include only those where the people count is greater than or equal to 100.

Intermediate Output After Step 1:

Based on the provided input data:

idvisit_datepeoplegrp
22017-01-021091
32017-01-031501
52017-01-051452
62017-01-0614552
72017-01-071992
82017-01-081882

Note:

  • Record with id = 4 is excluded because people = 99, which is less than 100.

Step 2: Counting Records in Each Group (GroupedCounts CTE)

Next, we group the filtered records by the grp identifier and count the number of records in each group. We retain only those groups that have three or more records, as they meet the criteria of consecutive id's with people ≥ 100.

GroupedCounts AS ( SELECT grp, COUNT(*) AS group_size FROM ConsecutiveGroups GROUP BY grp HAVING COUNT(*) >= 3 )

Explanation of Each Line:

  1. GroupedCounts AS (

    • Initiates another CTE named GroupedCounts.
  2. SELECT grp, COUNT(*) AS group_size

    • Selects the grp identifier and counts the number of records in each group, aliasing it as group_size.
  3. FROM ConsecutiveGroups

    • Specifies the ConsecutiveGroups CTE as the source of data.
  4. GROUP BY grp

    • Groups the records based on the grp identifier.
  5. HAVING COUNT(*) >= 3

    • Filters the groups to include only those with three or more records.

Intermediate Output After Step 2:

grpgroup_size
24

Step 3: Retrieving Qualified Records

Finally, we join the ConsecutiveGroups CTE with the GroupedCounts CTE on the grp identifier to extract all records that belong to groups with three or more consecutive id's. The results are then ordered by visit_date in ascending order.

SELECT c.id, c.visit_date, c.people FROM ConsecutiveGroups c JOIN GroupedCounts g ON c.grp = g.grp ORDER BY c.visit_date;

Explanation of Each Line:

  1. SELECT c.id, c.visit_date, c.people

    • Selects the id, visit_date, and people columns from the ConsecutiveGroups CTE.
  2. JOIN GroupedCounts g ON c.grp = g.grp

    • Performs an inner join with the GroupedCounts CTE (aliased as g) on the grp identifier. This ensures that only records from groups that have three or more consecutive id's are included.
  3. ORDER BY c.visit_date

    • Orders the final results by visit_date in ascending order to present the data chronologically.

Final Output:

Based on the provided input data, the final output will include records from groups that have three or more consecutive id's with people ≥ 100, ordered by visit_date in ascending order.

+----+----------------------------+--------+ | id | visit_date | people | +----+----------------------------+--------+ | 5 | 2017-01-05T00:00:00.000Z | 145 | | 6 | 2017-01-06T00:00:00.000Z | 1455 | | 7 | 2017-01-07T00:00:00.000Z | 199 | | 8 | 2017-01-08T00:00:00.000Z | 188 | +----+----------------------------+--------+

.....

.....

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible