Grokking SQL for Tech Interviews
Ask Author
Back to course home

0% completed

Vote For New Content
29. High School Attendance
Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Problem

Table: Students

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| student_id    | int     |
| student_name  | varchar |
| grade         | int     |
+---------------+---------+
student_id is the primary key for this table.
This table contains the information of students in a high school.

Table: Attendance

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| student_id    | int     |
| attendance_date | date  |
| status        | enum    |
+---------------+---------+
(student_id, attendance_date) is the primary key for this table.
status is an ENUM of type ('Present', 'Absent').

Problem Definition

Write a solution to find the students who were absent for three or more consecutive days.

Return the result table ordered by student_id and start_date, where start_date is the first date of the consecutive absences.

Example

Input: 
Students table:
+------------+--------------+-------+
| student_id | student_name | grade |
+------------+--------------+-------+
| 1          | Alice        | 10    |
| 2          | Bob          | 11    |
| 3          | Charlie      | 10    |
+------------+--------------+-------+
Attendance table:
+------------+-----------------+--------+
| student_id | attendance_date | status |
+------------+-----------------+--------+
| 1          | 2020-11-01      | Absent |
| 1          | 2020-11-02      | Absent |
| 1          | 2020-11-03      | Absent |
| 2          | 2020-11-01      | Present|
| 2          | 2020-11-02      | Absent |
| 2          | 2020-11-03      | Absent |
| 2          | 2020-11-04      | Absent |
| 3          | 2020-11-01      | Present|
| 3          | 2020-11-02      | Present|
+------------+-----------------+--------+
Output: 
+------------+-------------+
| student_id | start_date  |
+------------+-------------+
| 1          | 2020-11-01  |
| 2          | 2020-11-02  |
+------------+-------------+

Try It Yourself

MYSQL
MYSQL

. . . .

Solution

To identify students who were absent for three or more consecutive days, we can utilize SQL's window functions and Common Table Expressions (CTEs). This approach allows us to efficiently analyze attendance records, detect consecutive absences, and extract the required information for each qualifying student.

  • Filter Absent Records: Isolate records where the student's status is 'Absent' to focus solely on absences.
  • Assign Sequential Ranks: Use ROW_NUMBER() to assign a sequential number to each absence per student based on the attendance_date.
  • Identify Consecutive Absences: Calculate a grouping identifier by subtracting the row number from the sequential rank. This helps in grouping consecutive absences.
  • Aggregate Consecutive Absence Groups: Group the records based on the student and the calculated grouping identifier to identify sequences of consecutive absences.
  • Filter Groups with Three or More Consecutive Absences: Select only those groups where the count of consecutive absences is three or more.
  • Determine the Start Date of Absence Streaks: For each qualifying group, identify the earliest attendance_date as the start_date of the consecutive absence streak.
  • Order the Results: Present the final list ordered by student_id and start_date for clarity.

SQL Query

WITH RankedAttendance AS ( SELECT student_id, attendance_date, status, ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS s_rank FROM Attendance WHERE status = 'Absent' ), ConsecutiveAttendance AS ( SELECT student_id, MIN(attendance_date) AS start_date, COUNT(*) AS consecutive_days FROM ( SELECT student_id, attendance_date, s_rank - ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS grouping_col FROM RankedAttendance ) AS grouped_absences GROUP BY student_id, grouping_col HAVING COUNT(*) >= 3 ) SELECT student_id, start_date FROM ConsecutiveAttendance ORDER BY student_id, start_date;

Step-by-Step Approach

Step 1: Filter Absent Records and Assign Sequential Ranks

Extract all absence records and assign a sequential rank to each absence per student based on the attendance_date. This ranking facilitates the detection of consecutive absences.

SQL Query:

WITH RankedAttendance AS ( SELECT student_id, attendance_date, status, ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS s_rank FROM Attendance WHERE status = 'Absent' )

Explanation:

  • WITH RankedAttendance AS (...):
    • Defines a Common Table Expression (CTE) named RankedAttendance to store intermediate results.
  • SELECT student_id, attendance_date, status, ROW_NUMBER() OVER (...) AS s_rank:
    • student_id & attendance_date & status:
      • Selects relevant columns from the Attendance table.
    • ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS s_rank:
      • Assigns a sequential number (s_rank) to each absence per student_id ordered by attendance_date. This helps in identifying the sequence of absences.
  • FROM Attendance:
    • Specifies the Attendance table as the data source.
  • WHERE status = 'Absent':
    • Filters the records to include only those where the student's status is 'Absent'.

Output After Step 1:

Assuming the example input provided, the RankedAttendance CTE would produce:

+------------+-----------------+--------+--------+ | student_id | attendance_date | status | s_rank | +------------+-----------------+--------+--------+ | 1 | 2020-11-01 | Absent | 1 | | 1 | 2020-11-02 | Absent | 2 | | 1 | 2020-11-03 | Absent | 3 | | 2 | 2020-11-02 | Absent | 1 | | 2 | 2020-11-03 | Absent | 2 | | 2 | 2020-11-04 | Absent | 3 | +------------+-----------------+--------+--------+

Step 2: Calculate Grouping Identifier to Detect Consecutive Absences

Compute a grouping identifier (grouping_col) to help group consecutive absences. By subtracting the row number from the sequential rank, consecutive dates will have the same grouping_col.

SQL Query:

ConsecutiveAttendance AS ( SELECT student_id, MIN(attendance_date) AS start_date, COUNT(*) AS consecutive_days FROM ( SELECT student_id, attendance_date, s_rank - ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS grouping_col FROM RankedAttendance ) AS grouped_absences GROUP BY student_id, grouping_col HAVING COUNT(*) >= 3 )

Explanation:

  • SELECT student_id, MIN(attendance_date) AS start_date, COUNT(*) AS consecutive_days:
    • student_id:
      • Identifies the student.
    • MIN(attendance_date) AS start_date:
      • Determines the earliest date in the consecutive absence streak.
    • COUNT(*) AS consecutive_days:
      • Counts the number of consecutive absent days in the group.
  • FROM (...) AS grouped_absences:
    • Utilizes a subquery to compute the grouping_col.
  • SELECT student_id, attendance_date, s_rank - ROW_NUMBER() OVER (...) AS grouping_col:
    • s_rank - ROW_NUMBER() OVER (...) AS grouping_col:
      • Calculates the grouping_col by subtracting the row number from the sequential rank. Consecutive dates will result in the same grouping_col, effectively grouping them together.
  • FROM RankedAttendance:
    • References the RankedAttendance CTE from Step 1.
  • GROUP BY student_id, grouping_col:
    • Groups the records by student_id and grouping_col to identify distinct consecutive absence streaks.
  • HAVING COUNT(*) >= 3:
    • Filters the groups to include only those where the count of consecutive absences is three or more.

Output After Step 2:

Based on the example input, the ConsecutiveAttendance CTE would produce:

+------------+-------------+-----------------+ | student_id | start_date | consecutive_days| +------------+-------------+-----------------+ | 1 | 2020-11-01 | 3 | | 2 | 2020-11-02 | 3 | +------------+-------------+-----------------+

Step 3: Select and Order the Final Results

Retrieve the student_id and start_date of each qualifying student who was absent for three or more consecutive days. Order the results by student_id and start_date.

SQL Query:

SELECT student_id, start_date FROM ConsecutiveAttendance ORDER BY student_id, start_date;

Explanation:

  • SELECT student_id, start_date:
    • Selects the student_id and the corresponding start_date of the consecutive absence streak.
  • FROM ConsecutiveAttendance:
    • Utilizes the ConsecutiveAttendance CTE from Step 2, which contains the grouped consecutive absences.
  • ORDER BY student_id, start_date:
    • Orders the final output first by student_id in ascending order and then by start_date to organize the results clearly.

Final Output:

+------------+-------------+ | student_id | start_date | +------------+-------------+ | 1 | 2020-11-01 | | 2 | 2020-11-02 | +------------+-------------+

.....

.....

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible