Grokking the Advanced System Design Interview
Summary: HDFS

Here is a quick summary of HDFS for you!


  • HDFS is a scalable distributed file system for large, distributed data-intensive applications.
  • HDFS uses commodity hardware to reduce infrastructure costs.
  • HDFS provides APIs for usual file operations like create, delete, open, close, read, and write.
  • Random writes are not possible; writes are always made at the end of the file in an append-only fashion.
  • HDFS does not support multiple concurrent writers.
  • An HDFS cluster consists of a single NameNode and multiple DataNodes and is accessed by multiple clients.




