0% completed
Step 3: Back-of-the-Envelope Capacity Estimation
Let’s estimate the scale to ensure our design can handle it:
- Concurrent Users: Assume a peak of ~5 million concurrent users on the site during the sale. (Flash sales of popular brands can attract millions globally in a short window.) Not every user generates a request every second, but many will repeatedly refresh or navigate. If even half of them perform an action in a given second, that’s ~2.5 million operations/second across the system. We should prepare for a ballpark peak of hundreds of thousands to a few million requests per second in bursts.
- Peak QPS (Queries Per Second): Most of these operations are reads (viewing pages, checking item status). We might estimate a 90% read vs 10% write ratio. For example, out of 1M requests per second, 900k could be read (GET requests for product info, inventory status) and 100k could be purchase attempts (writes that create orders or update inventory). The read volume is huge because many users browse or refresh even if only a small fraction successfully checkout. Write volume will spike initially (everyone trying to buy) but the total successful orders are limited by inventory (e.g., if only 100k items total, at most 100k successful order writes will happen, though there may be many more attempts that get rejected due to stock out).
- Database Load: For reads, we cannot hit the database for every page view at these numbers – we’ll need caching. If we don't use cache, 900k reads/sec could overwhelm a single database cluster. Instead, product info and even inventory status should be served from in-memory caches or CDNs. Write load (100k/sec in worst case) is also extremely high for a single DB, so we must distribute writes or use high-throughput stores. The order placement transactions (which involve inventory decrement + order insert) are the most critical. They must be handled in a strongly consistent way, but as quickly as possible.
- Bandwidth: Serving millions of users means high network bandwidth. If each page or API response is ~5 KB of JSON (for example, product data without images), 1 million requests/sec = 5 GB/sec of outgoing data from servers. That’s 40 Gbps – feasible only with distributed servers and CDNs. We will offload large content (images, CSS/JS files, etc.) to a Content Delivery Network (CDN), so our application servers mostly send small JSON or HTML responses. The CDN edges near users handle the heavy lifting of images/videos. Internally, between services and databases, network links must handle bursts of write operations and replication.
- Storage: The flash sale itself might not consume massive storage (the number of products is limited and orders are relatively few compared to reads). For instance, if 100k orders are placed, and each order record is a few KB, that’s only a few hundred MB of new data.
- Peak API Requests Patterns: We expect a traffic spike at the sale start (e.g., everyone refreshing at 12:00 PM when the sale opens). The pattern might be: heavy GET requests to view the item page, followed by a wave of POST requests to attempt purchases. After inventory sells out, read traffic may remain high (people checking if more stock appears or browsing other items) but write traffic will drop. Our system should handle the initial spike which is the most intense period. In summary, design for worst-case QPS in the millions (if including all reads) and order events in the tens of thousands per second. These numbers align with large-scale events – e.g., Alibaba reported 583k orders/sec at peak of Singles’ Day, and Amazon’s Prime Day backend handled even higher internal request rates (hundreds of millions of ops/sec) by scaling out globally.
Step 5: Database Schema
Below is a detailed relational schema for the key tables — Users, Products, Inventory, Reservation, Orders, and Payments — followed by indexing, partitioning, and constraint recommendations to optimize performance for high-volume events.
Users Table
This table stores user accounts and authentication details.
Field Name | Data Type | Description |
---|---|---|
user_id | BIGINT (PK) | Unique user identifier (primary key, auto-increment). |
email | VARCHAR(255) UNIQUE | User's email address (used for login; must be unique). |
password_hash | VARCHAR(255) | Hashed password for authentication. |
salt | VARCHAR(255) | Salt used for hashing the password (if applicable). |
name | VARCHAR(100) | User’s full name. |
created_at | DATETIME | Timestamp when the account was created. |
updated_at | DATETIME | Timestamp of the last update to the account information. |
status | VARCHAR(50) | Account status (e.g., 'active', 'disabled', 'banned'). |
Notes:
- The primary key is
user_id
. A unique index onemail
ensures fast lookups during login and enforces uniqueness. - Passwords are stored as hashes with a unique salt per user for security.
Products Table
Manages items available for the flash sale.
Field Name | Data Type | Description |
---|---|---|
product_id | BIGINT (PK) | Unique product identifier (primary key, auto-increment). |
name | VARCHAR(200) | Product name. |
description | TEXT | Detailed description of the product. |
price | DECIMAL(10,2) | Regular price of the product. |
category | VARCHAR(100) | Category or type of product (for organization/filtering). |
status | VARCHAR(50) | Product status (e.g., 'active', 'out_of_stock', 'discontinued'). |
created_at | DATETIME | Timestamp when the product was added to the catalog. |
updated_at | DATETIME | Timestamp of the last update to product info or price. |
Notes:
- Primary key is
product_id
. Indexingname
orcategory
can help with product searches or category listings if needed. - The
status
field can quickly indicate if a product is available for sale.
Inventory Table
Tracks stock levels for each product to prevent overselling during the flash sale.
Field Name | Data Type | Description |
---|---|---|
product_id | BIGINT (PK, FK to Products) | Product identifier (primary key, references Products). |
quantity | INT | Available stock for the product. This value is decremented when orders or reservations occur. |
last_updated | DATETIME | Timestamp of the last inventory update. |
Notes:
- There is a one-to-one relationship between Products and Inventory (
product_id
is both the primary key and a foreign key to Products). - Concurrent Stock Deduction: Updates to
quantity
should occur within a transaction to avoid race conditions. One approach is using an atomic UPDATE (e.g.,UPDATE Inventory SET quantity = quantity - 1 WHERE product_id = X AND quantity >= 1
) or aSELECT ... FOR UPDATE
to lock the row during a purchase attempt. This prevents multiple transactions from overselling the same item by queuing concurrent row access. - A check constraint can ensure
quantity >= 0
at all times (no negative stock).
Reservation Table
Temporarily holds stock for a user who is in the process of checking out, to prevent others from buying that stock until the user completes payment or the hold expires.
Field Name | Data Type | Description |
---|---|---|
reservation_id | UUID (PK) | Unique reservation identifier (primary key). |
user_id | BIGINT (FK to Users) | User who reserved the product (foreign key to Users). |
product_id | BIGINT (FK to Products) | Product reserved (foreign key to Products). |
quantity | INT | Quantity reserved for this user. |
reserved_at | DATETIME | Timestamp when the reservation was created. |
expires_at | DATETIME | Timestamp when the reservation will expire if not completed. |
status | VARCHAR(50) | Reservation status ('PROCESSING', 'RESERVED', 'FAILED', 'SUCCEEDED', etc). |
Notes:
- When a user initiates checkout, a reservation record is created and the stock is temporarily allocated to that user. This prevents others from purchasing it during the reservation window. The Inventory
quantity
will be decremented at this point. - The
expires_at
field is crucial. A background job or service should remove or mark reservations as expired when this timestamp passes, releasing the held stock back to inventory. This helps avoid underselling (stock being tied up in stale reservations). - A unique constraint on (
user_id
,product_id
) can ensure a user has only one active reservation per product at a time (preventing the same user from holding the same item multiple times). - Indexes on
product_id
help quickly calculate total reserved stock for a product, and an index onexpires_at
speeds up queries that find and clean up expired reservations.
Orders Table
Records purchases made by users during the flash sale. Each order is a purchase of a product by a user.
Field Name | Data Type | Description |
---|---|---|
order_id | BIGINT (PK) | Unique order identifier (primary key, often auto-increment). |
user_id | BIGINT (FK to Users) | User who placed the order (foreign key to Users). |
reservation_id | UUID (FK to Reservation) | The rservertion against this order. |
product_id | BIGINT (FK to Products) | Product that was purchased (foreign key to Products). |
quantity | INT | Quantity of the product purchased in this order. |
price | DECIMAL(10,2) | Purchase price per unit at the time of order (captures flash sale price or discount if any). |
status | VARCHAR(50) | Order status (e.g., 'pending', 'completed', 'cancelled', 'refunded'). |
created_at | DATETIME | Timestamp when the order was created. |
updated_at | DATETIME | Timestamp of the last update to the order status or details. |
Notes:
- Foreign keys:
user_id
references Users,reservation_id
references Reservation andproduct_id
references Products. This maintains referential integrity (a valid user, reservation, and product must exist for each order). - In a simple flash sale scenario, each order might represent a single product purchase (to simplify checkout during high traffic). For multi-item carts, an additional OrderItems table would be introduced, but it’s omitted here for brevity.
- The
status
field, combined with timestamps, helps track the order lifecycle (creation, completion, cancellation). - An index on
user_id
can speed up querying a user's order history, and an index onproduct_id
helps in analyzing total sales per product or verifying stock consumption.
Payments Table
Logs payment transactions (successful or failed) for orders.
Field Name | Data Type | Description |
---|---|---|
payment_id | BIGINT (PK) | Unique payment transaction identifier (primary key). |
order_id | BIGINT (FK to Orders) | Associated order that this payment is for (foreign key to Orders). |
user_id | BIGINT (FK to Users) | User who made the payment (foreign key to Users; redundant to order’s user for quick access). |
amount | DECIMAL(10,2) | Payment amount (should match order total for successful payments). |
method | VARCHAR(50) | Payment method (e.g., 'credit_card', 'paypal', 'wallet'). |
status | VARCHAR(50) | Payment status ('success', 'failed', 'pending'). |
transaction_ref | VARCHAR(100) | Reference ID from the payment gateway (e.g., transaction ID). |
timestamp | DATETIME | Timestamp of the payment attempt. |
failure_reason | VARCHAR(255) | Reason for failure if the payment did not succeed (nullable). |
Notes:
- Foreign keys:
order_id
links to Orders, ensuring a payment is tied to a valid order.user_id
links to Users` (this could be derived through the order, but is stored for convenience and redundancy). - Typically, an order will have one successful payment. Multiple entries for the same order could exist if retries or different payment methods were attempted (one will be
success
, othersfailed
). A unique constraint onorder_id
forstatus='success'
could enforce only one successful payment per order (if desired). - Index on
order_id
is important for quickly retrieving payments by order (especially during reconciliation). An index onstatus
can help isolate failed transactions for analysis or retry.
Indexing and Partitioning Strategies
To support high concurrency and fast performance, proper indexing and data partitioning are essential:
-
Indexes:
- Each primary key (e.g.,
user_id
,product_id
,order_id
) is automatically indexed, ensuring fast lookups by ID. - Users: Unique index on
email
for quick authentication lookups. - Products: Index on
category
(if queries by category are common) and possibly a full-text index onname/description
if keyword searches are needed. - Inventory: The primary key on
product_id
suffices for direct stock lookups. This table is small (one row per product), but proper locking on updates is crucial for concurrency control rather than additional indexes. - Orders: Index on
user_id
to retrieve a user's orders quickly. Index onproduct_id
to analyze orders per product or to assist in stock reconciliation. If querying recent orders often, an index oncreated_at
(or a composite index like(product_id, created_at)
) can help, especially if partitioned by date. - Payments: Index on
order_id
(to quickly find the payment for a given order). Index onstatus
(to find all failed payments, for example) or onuser_id
if analyzing user payment history. - Reservations: Index on
product_id
to efficiently calculate how many units of a product are reserved at a given moment. Index onexpires_at
to find expired reservations for cleanup. A composite index on (user_id
,product_id
) paired with the unique constraint helps enforce one reservation per user per product and also speeds up checking an existing reservation for a user.
- Each primary key (e.g.,
-
Database Sharding: To handle high write volumes and large data, we implement sharding on the primary databases. Choosing the right shard key is important for load distribution. One effective strategy is to shard by User ID or Order ID rather than by Product. The rationale is that a hot product in a flash sale would concentrate load on a single shard if we sharded by product, whereas sharding by user spreads the load more evenly. For example, we can apply a hash function to the user ID to determine which shard an order or reservation goes to, achieving an even distribution and avoiding hotspots. This way, even if 100k users click “Buy” at once, their requests are being recorded across, say, 10 shards based on user ID hash, instead of all hitting one database. Similarly, user data (addresses, etc.) can be sharded by user ID. Product catalog data might be smaller and not require sharding, but order and reservation data will be huge during a flash sale and benefit from this partitioning.
-
Scalability Considerations:
For extreme loads, consider read-write separation (primary for writes, replicas for reads) to distribute the traffic. The schema design supports this since reads (e.g., browsing products, checking inventory) can be done on replicas, while writes (placing orders, payments) go to the primary.
.....
.....
.....
Table of Contents
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible
Contents are not accessible