Raft
Distributed leader election system implementing core Raft consensus mechanics.
<1
Re-election Time
3
Cluster Size
150–300
Election Timeout
50
Heartbeat Interval
Problem
Problem statement, constraint shape, and the gap this project explores.
How do you guarantee exactly one leader in a distributed cluster under concurrent failures?
Built a distributed leader election system implementing core Raft consensus mechanics, including quorum-based voting, randomized election timeouts, and heartbeat stabilization over HTTP RPC.
Distributed systems require consensus — multiple nodes must agree on a single leader even when messages are delayed, nodes crash, and elections happen simultaneously. Raft is the most understandable consensus algorithm, but implementing it correctly requires handling a surprising number of edge cases around term numbering, vote splitting, and heartbeat timing.
Constraints
Non-negotiable boundaries that shaped the implementation.
3-node cluster
HTTP RPC over localhost
Node crash and restart
Randomized to prevent split votes
<1 second after leader failure
Architecture
The primary design surface: flow, subsystem roles, and state boundaries.
Each node runs an Axum HTTP server exposing /vote and /heartbeat endpoints. Nodes maintain Raft state (Follower, Candidate, Leader) and a current term. A Tokio interval drives election timeout checks. The leader broadcasts heartbeats to prevent unnecessary elections.
Timeout
Increment term
Broadcast RequestVote
Collect votes
If majority: become Leader
Broadcast Heartbeat
Followers reset timeout
Raft State Machine
Manages Follower → Candidate → Leader transitions with term tracking.
Election Timer
Randomized timeout (150–300ms) triggers candidacy if no heartbeat received.
Vote RPC
HTTP POST /vote — candidates request votes from all peers.
Heartbeat RPC
HTTP POST /heartbeat — leader suppresses elections across followers.
Quorum Checker
Grants leadership only when majority (⌊N/2⌋+1) votes received.
Engineering Tradeoffs
Design review notes: what was optimized and what was deliberately left behind.
HTTP vs raw TCP for RPC
HTTP simplifies request routing, debugging, and future extension. For a 3-node local cluster, the latency difference is negligible.
Lower latency (raw TCP is faster)
HTTP via Axum
Randomized timeouts vs fixed
Fixed timeouts cause simultaneous elections — all candidates vote for themselves, no majority forms. Randomization breaks symmetry.
Deterministic election timing
Randomized election timeouts (150–300ms)
In-memory state vs persistent log
The implementation focuses on leader election, not log replication. Persistent state is required for full Raft but out of scope here.
Crash recovery of committed log entries
In-memory state only
Failure Modes
Incident-style notes for the ways the design can break.
Split vote
FM-01All nodes become candidates simultaneously. Each votes for itself, no majority.
: randomized timeouts make simultaneous elections statistically rare.
IPv4/IPv6 mismatch
FM-02Axum binds on 0.0.0.0 (IPv4) but peer URLs used ::1 (IPv6). Silent connection failures.
by normalizing all addresses to 127.0.0.1.
Term regression on restart
FM-03Restarted node has term 0, causing it to accept stale heartbeats.
— full Raft requires persisting currentTerm to disk.
Heartbeat lost under load
FM-04HTTP timeouts under high CPU caused missed heartbeats triggering unnecessary elections.
with explicit timeout configuration on RPC clients.
Benchmarks
Environment first, numbers second. Metrics should be inspectable, not ornamental.
Rust / Tokio
3-node cluster
Rust, Tokio, Axum, Consensus
Distributed Systems
Project-level benchmark notes
<1second
3nodes
150–300ms (randomized)
50ms
2/3nodes
Lessons Learned
Engineering takeaways from the implementation, including remaining work.
Raft looks simple on paper but has many subtle correctness requirements
especially around term numbering and vote grant conditions.
Network address configuration errors (IPv4/IPv6 mismatches) produce silent failures that look like algorithm bugs.
Randomized timeouts are essential and elegant
a small implementation detail that prevents a catastrophic failure mode.
Future: implement log replication and persistence to complete the full Raft protocol and support state machine commands.