Grokking the Advanced System Design Interview
Ask Author
Back to course home

0% completed

Data Integrity & Caching

Let's explore how HDFS ensures data integrity and implements caching.

Data integrity

Data Integrity refers to ensuring the correctness of the data. When a client retrieves a block from a DataNode, the data may arrive corrupted. This corruption can occur because of faults in the storage device, network, or the software itself. HDFS client uses checksum to verify the file contents. When a client stores a file in HDFS, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace




Like the course? Get enrolled and start learning!