Grokking the Advanced System Design Interview
Ask Author
Back to course home

0% completed

Vote For New Content
14. Checksum
On this page

Background

Definition

Solution

Examples

Let's learn about checksum and its usage.

Background

In a distributed system, while moving data between components, it is possible that the data fetched from a node may arrive corrupted. This corruption can occur because of faults in a storage device, network, software, etc. How can a distributed system ensure data integrity, so that the client receives an error instead of corrupt data?

Definition

Calculate a checksum and store it with data.

To calculate a checksum, a cryptographic hash function like MD5, SHA-1, SHA-256, or SHA-512 is used. The hash function takes the input data and produces a string (containing letters and numbers) of fixed length; this string is called the checksum.

Solution

When a system is storing some data, it computes a checksum of the data, and stores the checksum with the data. When a client retrieves data, it verifies that the data it received from the server matches the checksum stored. If not, then the client can opt to retrieve that data from another replica.

Examples

HDFS and Chubby store the checksum of each file with the data.

.....

.....

.....

Like the course? Get enrolled and start learning!

On this page

Background

Definition

Solution

Examples