Distributed Systems
Raft Consensus and Failure
A look at why the easy part of consensus is the normal case.
Raft is approachable because it gives consensus a shape that humans can hold in their heads. Leaders replicate logs, followers respond, terms advance, and majorities decide what counts as committed.
The normal path is only the beginning. The interesting questions start when messages are delayed, nodes restart with stale state, elections overlap, or a leader believes it is still in charge after the rest of the cluster has moved on.
Good implementations are careful about persistence, term checks, idempotency, and timeouts. The algorithm is simple compared with Paxos, but the operational lesson is the same: distributed state needs a very explicit story about what can be trusted.