Distributed Systems

Raft

Distributed leader election system implementing core Raft consensus mechanics.

RustTokioAxumConsensusDistributed Systems

Re-election Time

Cluster Size

150–300

Election Timeout

Heartbeat Interval

Problem

Problem statement, constraint shape, and the gap this project explores.

Problem Statement

How do you guarantee exactly one leader in a distributed cluster under concurrent failures?

Challenge

Built a distributed leader election system implementing core Raft consensus mechanics, including quorum-based voting, randomized election timeouts, and heartbeat stabilization over HTTP RPC.

Why Existing Approaches Failed

Distributed systems require consensus — multiple nodes must agree on a single leader even when messages are delayed, nodes crash, and elections happen simultaneously. Raft is the most understandable consensus algorithm, but implementing it correctly requires handling a surprising number of edge cases around term numbering, vote splitting, and heartbeat timing.

Constraints

Non-negotiable boundaries that shaped the implementation.

Nodes

3-node cluster

Transport

HTTP RPC over localhost

Failure Model

Node crash and restart

Election Timeout

Randomized to prevent split votes

Convergence

<1 second after leader failure

Architecture

The primary design surface: flow, subsystem roles, and state boundaries.

Architecture Brief

Each node runs an Axum HTTP server exposing /vote and /heartbeat endpoints. Nodes maintain Raft state (Follower, Candidate, Leader) and a current term. A Tokio interval drives election timeout checks. The leader broadcasts heartbeats to prevent unnecessary elections.

Execution Flow

Timeout

Increment term

Broadcast RequestVote

Collect votes

If majority: become Leader

Broadcast Heartbeat

Followers reset timeout

Raft State Machine

Manages Follower → Candidate → Leader transitions with term tracking.

Election Timer

Randomized timeout (150–300ms) triggers candidacy if no heartbeat received.

Vote RPC

HTTP POST /vote — candidates request votes from all peers.

Heartbeat RPC

HTTP POST /heartbeat — leader suppresses elections across followers.

Quorum Checker

Environment first, numbers second. Metrics should be inspectable, not ornamental.

Test Environment

Runtime

Rust / Tokio

Workload

3-node cluster

Stack

Rust, Tokio, Axum, Consensus

Scope

Distributed Systems

Evidence

Project-level benchmark notes

Performance Results

Re-election Time

<1second

Cluster Size

3nodes

Election Timeout

150–300ms (randomized)

Heartbeat Interval

50ms

Quorum

2/3nodes

Lessons Learned

Engineering takeaways from the implementation, including remaining work.

Raft looks simple on paper but has many subtle correctness requirements

especially around term numbering and vote grant conditions.

Network address configuration errors (IPv4/IPv6 mismatches) produce silent failures that look like algorithm bugs.

Randomized timeouts are essential and elegant

a small implementation detail that prevents a catastrophic failure mode.

Future: implement log replication and persistence to complete the full Raft protocol and support state machine commands.

PreviousSysRift

All Projects

NextSmartQueue