0% completed
Problem
Table: Students
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| student_name | varchar |
| grade | int |
+---------------+---------+
student_id is the primary key for this table.
This table contains the information of students in a high school.
Table: Attendance
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| attendance_date | date |
| status | enum |
+---------------+---------+
(student_id, attendance_date) is the primary key for this table.
status is an ENUM of type ('Present', 'Absent').
Problem Definition
Write a solution to find the students who were absent for three or more consecutive days.
Return the result table ordered by student_id
and start_date
, where start_date
is the first date of the consecutive absences.
Example
Input:
Students table:
+------------+--------------+-------+
| student_id | student_name | grade |
+------------+--------------+-------+
| 1 | Alice | 10 |
| 2 | Bob | 11 |
| 3 | Charlie | 10 |
+------------+--------------+-------+
Attendance table:
+------------+-----------------+--------+
| student_id | attendance_date | status |
+------------+-----------------+--------+
| 1 | 2020-11-01 | Absent |
| 1 | 2020-11-02 | Absent |
| 1 | 2020-11-03 | Absent |
| 2 | 2020-11-01 | Present|
| 2 | 2020-11-02 | Absent |
| 2 | 2020-11-03 | Absent |
| 2 | 2020-11-04 | Absent |
| 3 | 2020-11-01 | Present|
| 3 | 2020-11-02 | Present|
+------------+-----------------+--------+
Output:
+------------+-------------+
| student_id | start_date |
+------------+-------------+
| 1 | 2020-11-01 |
| 2 | 2020-11-02 |
+------------+-------------+
Try It Yourself
Solution
To identify students who were absent for three or more consecutive days, we can utilize SQL's window functions and Common Table Expressions (CTEs). This approach allows us to efficiently analyze attendance records, detect consecutive absences, and extract the required information for each qualifying student.
- Filter Absent Records: Isolate records where the student's status is 'Absent' to focus solely on absences.
- Assign Sequential Ranks: Use
ROW_NUMBER()
to assign a sequential number to each absence per student based on theattendance_date
. - Identify Consecutive Absences: Calculate a grouping identifier by subtracting the row number from the sequential rank. This helps in grouping consecutive absences.
- Aggregate Consecutive Absence Groups: Group the records based on the student and the calculated grouping identifier to identify sequences of consecutive absences.
- Filter Groups with Three or More Consecutive Absences: Select only those groups where the count of consecutive absences is three or more.
- Determine the Start Date of Absence Streaks: For each qualifying group, identify the earliest
attendance_date
as thestart_date
of the consecutive absence streak. - Order the Results: Present the final list ordered by
student_id
andstart_date
for clarity.
SQL Query
WITH RankedAttendance AS ( SELECT student_id, attendance_date, status, ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS s_rank FROM Attendance WHERE status = 'Absent' ), ConsecutiveAttendance AS ( SELECT student_id, MIN(attendance_date) AS start_date, COUNT(*) AS consecutive_days FROM ( SELECT student_id, attendance_date, s_rank - ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS grouping_col FROM RankedAttendance ) AS grouped_absences GROUP BY student_id, grouping_col HAVING COUNT(*) >= 3 ) SELECT student_id, start_date FROM ConsecutiveAttendance ORDER BY student_id, start_date;
Step-by-Step Approach
Step 1: Filter Absent Records and Assign Sequential Ranks
Extract all absence records and assign a sequential rank to each absence per student based on the attendance_date
. This ranking facilitates the detection of consecutive absences.
SQL Query:
WITH RankedAttendance AS ( SELECT student_id, attendance_date, status, ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS s_rank FROM Attendance WHERE status = 'Absent' )
Explanation:
WITH RankedAttendance AS (...)
:- Defines a Common Table Expression (CTE) named
RankedAttendance
to store intermediate results.
- Defines a Common Table Expression (CTE) named
SELECT student_id, attendance_date, status, ROW_NUMBER() OVER (...) AS s_rank
:student_id
&attendance_date
&status
:- Selects relevant columns from the
Attendance
table.
- Selects relevant columns from the
ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS s_rank
:- Assigns a sequential number (
s_rank
) to each absence perstudent_id
ordered byattendance_date
. This helps in identifying the sequence of absences.
- Assigns a sequential number (
FROM Attendance
:- Specifies the
Attendance
table as the data source.
- Specifies the
WHERE status = 'Absent'
:- Filters the records to include only those where the student's status is 'Absent'.
Output After Step 1:
Assuming the example input provided, the RankedAttendance
CTE would produce:
+------------+-----------------+--------+--------+ | student_id | attendance_date | status | s_rank | +------------+-----------------+--------+--------+ | 1 | 2020-11-01 | Absent | 1 | | 1 | 2020-11-02 | Absent | 2 | | 1 | 2020-11-03 | Absent | 3 | | 2 | 2020-11-02 | Absent | 1 | | 2 | 2020-11-03 | Absent | 2 | | 2 | 2020-11-04 | Absent | 3 | +------------+-----------------+--------+--------+
Step 2: Calculate Grouping Identifier to Detect Consecutive Absences
Compute a grouping identifier (grouping_col
) to help group consecutive absences. By subtracting the row number from the sequential rank, consecutive dates will have the same grouping_col
.
SQL Query:
ConsecutiveAttendance AS ( SELECT student_id, MIN(attendance_date) AS start_date, COUNT(*) AS consecutive_days FROM ( SELECT student_id, attendance_date, s_rank - ROW_NUMBER() OVER (PARTITION BY student_id ORDER BY attendance_date) AS grouping_col FROM RankedAttendance ) AS grouped_absences GROUP BY student_id, grouping_col HAVING COUNT(*) >= 3 )
Explanation:
SELECT student_id, MIN(attendance_date) AS start_date, COUNT(*) AS consecutive_days
:student_id
:- Identifies the student.
MIN(attendance_date) AS start_date
:- Determines the earliest date in the consecutive absence streak.
COUNT(*) AS consecutive_days
:- Counts the number of consecutive absent days in the group.
FROM (...) AS grouped_absences
:- Utilizes a subquery to compute the
grouping_col
.
- Utilizes a subquery to compute the
SELECT student_id, attendance_date, s_rank - ROW_NUMBER() OVER (...) AS grouping_col
:s_rank - ROW_NUMBER() OVER (...) AS grouping_col
:- Calculates the
grouping_col
by subtracting the row number from the sequential rank. Consecutive dates will result in the samegrouping_col
, effectively grouping them together.
- Calculates the
FROM RankedAttendance
:- References the
RankedAttendance
CTE from Step 1.
- References the
GROUP BY student_id, grouping_col
:- Groups the records by
student_id
andgrouping_col
to identify distinct consecutive absence streaks.
- Groups the records by
HAVING COUNT(*) >= 3
:- Filters the groups to include only those where the count of consecutive absences is three or more.
Output After Step 2:
Based on the example input, the ConsecutiveAttendance
CTE would produce:
+------------+-------------+-----------------+ | student_id | start_date | consecutive_days| +------------+-------------+-----------------+ | 1 | 2020-11-01 | 3 | | 2 | 2020-11-02 | 3 | +------------+-------------+-----------------+
Step 3: Select and Order the Final Results
Retrieve the student_id
and start_date
of each qualifying student who was absent for three or more consecutive days. Order the results by student_id
and start_date
.
SQL Query:
SELECT student_id, start_date FROM ConsecutiveAttendance ORDER BY student_id, start_date;
Explanation:
SELECT student_id, start_date
:- Selects the
student_id
and the correspondingstart_date
of the consecutive absence streak.
- Selects the
FROM ConsecutiveAttendance
:- Utilizes the
ConsecutiveAttendance
CTE from Step 2, which contains the grouped consecutive absences.
- Utilizes the
ORDER BY student_id, start_date
:- Orders the final output first by
student_id
in ascending order and then bystart_date
to organize the results clearly.
- Orders the final output first by
Final Output:
+------------+-------------+ | student_id | start_date | +------------+-------------+ | 1 | 2020-11-01 | | 2 | 2020-11-02 | +------------+-------------+
.....
.....
.....
Table of Contents
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible