Fault Tolerance, High Availability, and Data Integrity
Let's learn how GFS implements fault tolerance, high availability, and data integrity.
To make the system fault-tolerant and available, GFS makes use of two simple strategies:
- Fast recovery in case of component failures.
- Replication for high availability.
Let's first see how GFS recovers from master or replica failure:
- On master failure: The Master being a single point of failure, can make the entire system unavailable in a short time. To handle this, all operations applied on master are saved in an operation log