Grokking SQL for Tech Interviews
Ask Author
Back to course home

0% completed

Vote For New Content
1. Trips and Users
Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Problem

Table: Trips

+-------------+----------+
| Column Name | Type     |
+-------------+----------+
| id          | int      |
| client_id   | int      |
| driver_id   | int      |
| city_id     | int      |
| status      | enum     |
| request_at  | date     |     
+-------------+----------+
id is the primary key (column with unique values) for this table.
The table holds all taxi trips. Each trip has a unique id, while client_id and driver_id are foreign keys to the users_id at the Users table.
Status is an ENUM (category) type of ('completed', 'cancelled_by_driver', 'cancelled_by_client'). 

Table: Users

+-------------+----------+
| Column Name | Type     |
+-------------+----------+
| users_id    | int      |
| banned      | enum     |
| role        | enum     |
+-------------+----------+
users_id is the primary key (column with unique values) for this table.
The table holds all users. Each user has a unique users_id, and role is an ENUM type of ('client', 'driver', 'partner').
banned is an ENUM (category) type of ('Yes', 'No').

Problem Definition

The cancellation rate is computed by dividing the number of canceled (by client or driver) requests with unbanned users by the total number of requests with unbanned users on that day.

Write a solution to find the cancellation rate of requests with unbanned users (both client and driver must not be banned) each day between "2013-10-01" and "2013-10-03". Round Cancellation Rate to two decimal points.

Example

Image

Output

Image

Try It Yourself

MYSQL
MYSQL

. . . .

Solution

To calculate the cancellation rate of taxi trip requests with unbanned users between "2013-10-01" and "2013-10-03", we need to analyze the Trips and Users tables. The cancellation rate is defined as the ratio of the number of canceled trips (either canceled by the client or the driver) to the total number of trip requests, considering only those trips where both the client and the driver are not banned.

Approach Overview

  1. Filter Unbanned Users:

    • Clients: Exclude trips where the client_id corresponds to a banned user.
    • Drivers: Exclude trips where the driver_id corresponds to a banned user.
  2. Filter by Date Range:

    • Consider only trips with request_at dates between "2013-10-01" and "2013-10-03".
  3. Calculate Cancellation Rate:

    • Numerator: Count of trips with status 'cancelled_by_driver' or 'cancelled_by_client'.
    • Denominator: Total number of trips (including both completed and canceled) within the specified date range and with unbanned users.
    • Cancellation Rate: (Number of Canceled Trips) / (Total Number of Trips) rounded to two decimal places.
  4. Present the Results:

    • Display the Day and the corresponding Cancellation Rate in the specified format.

SQL Query

SELECT request_at AS Day, Round(SUM(IF(status = 'completed', 0, 1)) / Count(status), 2) AS 'Cancellation Rate' FROM Trips WHERE client_id NOT IN (SELECT users_id FROM Users WHERE banned = 'Yes') AND driver_id NOT IN (SELECT users_id FROM Users WHERE banned = 'Yes') AND request_at BETWEEN '2013-10-01' AND '2013-10-03' GROUP BY request_at;

Step-by-Step Approach

We'll break down the SQL query into smaller steps to understand how the Cancellation Rate is calculated. For illustration purposes, we'll use the provided sample data.

Step 1: Identify Unbanned Users

Filter out trips where either the client or the driver is banned.

SQL:

SELECT * FROM Trips WHERE client_id NOT IN ( SELECT users_id FROM Users WHERE banned = 'Yes' ) AND driver_id NOT IN ( SELECT users_id FROM Users WHERE banned = 'Yes' );

Explanation:

  • client_id NOT IN (...) and driver_id NOT IN (...):
    • Ensures that both the client and the driver associated with a trip are not banned.

Intermediate Output After Step 1:

+----+-----------+-----------+---------+-----------------------+------------+ | id | client_id | driver_id | city_id | status | request_at | +----+-----------+-----------+---------+-----------------------+------------+ | 1 | 1 | 10 | 1 | completed | 2013-10-01 | | 3 | 3 | 12 | 6 | completed | 2013-10-01 | | 4 | 4 | 13 | 6 | cancelled_by_driver | 2013-10-01 | | 5 | 1 | 10 | 1 | completed | 2013-10-02 | | 6 | 2 | 11 | 6 | completed | 2013-10-02 | | 7 | 3 | 12 | 6 | completed | 2013-10-02 | | 8 | 2 | 12 | 12 | completed | 2013-10-03 | | 9 | 3 | 10 | 12 | completed | 2013-10-03 | | 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 | +----+-----------+-----------+---------+-----------------------+------------+

Note:

  • Trips with client_id = 2 are excluded because user 2 is banned (banned = 'Yes').

Step 2: Filter Trips Within the Date Range

Consider only trips that occurred between "2013-10-01" and "2013-10-03".

SQL:

SELECT * FROM Trips WHERE client_id NOT IN ( SELECT users_id FROM Users WHERE banned = 'Yes' ) AND driver_id NOT IN ( SELECT users_id FROM Users WHERE banned = 'Yes' ) AND request_at BETWEEN '2013-10-01' AND '2013-10-03';

Explanation:

  • request_at BETWEEN '2013-10-01' AND '2013-10-03':
    • Limits the analysis to trips within the specified date range.

Intermediate Output After Step 2:

+----+-----------+-----------+---------+-----------------------+------------+ | id | client_id | driver_id | city_id | status | request_at | +----+-----------+-----------+---------+-----------------------+------------+ | 1 | 1 | 10 | 1 | completed | 2013-10-01 | | 3 | 3 | 12 | 6 | completed | 2013-10-01 | | 4 | 4 | 13 | 6 | cancelled_by_driver | 2013-10-01 | | 5 | 1 | 10 | 1 | completed | 2013-10-02 | | 6 | 2 | 11 | 6 | completed | 2013-10-02 | | 7 | 3 | 12 | 6 | completed | 2013-10-02 | | 8 | 2 | 12 | 12 | completed | 2013-10-03 | | 9 | 3 | 10 | 12 | completed | 2013-10-03 | | 10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 | +----+-----------+-----------+---------+-----------------------+------------+

Step 3: Calculate Cancellation Rate Per Day

Compute the Cancellation Rate for each day by dividing the number of canceled trips by the total number of trips for that day. Only consider trips where both the client and driver are not banned.

SQL:

SELECT request_at AS Day, Round(SUM(IF(status = 'completed', 0, 1)) / Count(status), 2) AS 'Cancellation Rate' FROM Trips WHERE client_id NOT IN ( SELECT users_id FROM Users WHERE banned = 'Yes' ) AND driver_id NOT IN ( SELECT users_id FROM Users WHERE banned = 'Yes' ) AND request_at BETWEEN '2013-10-01' AND '2013-10-03' GROUP BY request_at;

Explanation:

  • SUM(IF(status = 'completed', 0, 1)):
    • Counts the number of canceled trips (status not equal to 'completed').
  • Count(status):
    • Counts the total number of trips for that day.
  • Round(..., 2):
    • Rounds the cancellation rate to two decimal places.
  • GROUP BY request_at:
    • Calculates the cancellation rate for each day separately.

Final Output:

+--------------------------+-------------------+ | Day | Cancellation Rate | +--------------------------+-------------------+ | 2013-10-01T00:00:00.000Z | 0.33 | | 2013-10-02T00:00:00.000Z | 0.00 | | 2013-10-03T00:00:00.000Z | 0.50 | +--------------------------+-------------------+

Explanation of Output:

  • 2013-10-01:
    • Cancellation Rate: 0.33 (1 canceled trip out of 3)
  • 2013-10-02:
    • Cancellation Rate: 0.00 (0 canceled trips out of 3)
  • 2013-10-03:
    • Cancellation Rate: 0.50 (2 canceled trips out of 4)

.....

.....

.....

Like the course? Get enrolled and start learning!

Table of Contents

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible

Contents are not accessible