Home/Writings/Distributed Systems

Emergence in Complex Systems

Why distributed behavior can surprise you even when every local rule is simple.

Figure 1A working loop for technical inquiry: observe the system, reason about its constraints, and build to test the model.

The Problem No Central Authority Can Solve

Imagine you are building a system that thousands of people depend on. It stores critical data. It must never go down. It must never lose a write. It must always return the correct answer. Now imagine you cannot have a single server handling all of this. Not because you don't want to — but because a single server is a single point of failure. One hardware fault, one network hiccup, one bad deployment, and everything goes dark. So you do what every serious system eventually does: you distribute it. You spread the responsibility across multiple machines. And that's where the problem starts. Because now you have five nodes, each holding a copy of the data. A write comes in. Node 1 gets it. Did Node 3 get it? Did Node 5? What if the network dropped the message halfway? What if Node 2 just crashed and came back up — how does it know what it missed? In a single-server system, consistency is trivial. There is one source of truth. One node. One answer. In a distributed system, you've traded that simplicity away. And what you get in return is a harder question than most engineers expect when they first encounter it:

Who's in charge?

The honest answer — the one that took decades of systems research to accept — is nobody. There is no master coordinator sitting above the nodes, watching everything, issuing commands. In a real distributed system, every node is local. Every node has a partial view. Every node makes decisions based on incomplete information. And yet — these systems work. Databases reach agreement. Configs propagate correctly. Leaders get elected without anyone organizing the election. That shouldn't be possible. But it is. The reason it's possible has a name. We'll get to it. But first, you need to feel exactly how little each node actually knows.

Continue reading
Distributed Systems · 16 min read

How We Taught a Scheduler to Predict the Future

The engineering realities of building adaptive prioritization, predictive runtime modeling, and resilient coordination on top of PostgreSQL and why most simple queues fail under real workloads.

Read note
Distributed Systems · 11 min read

Lessons from My First Distributed System

The small surprises that appear when a program stops living in one process.

Read note
Distributed Systems · 14 min read

Raft Consensus and Failure

A look at why the easy part of consensus is the normal case.

Read note