Distributed Systems¶

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." — Leslie Lamport

Building reliable systems from unreliable components.

Consensus

How distributed nodes agree on a single value. Paxos, Raft, and practical implementations.

Consensus Algorithms
Resiliency Patterns

Circuit breakers, bulkheads, sagas, and patterns for fault tolerance.

Patterns Guide

The 8 Fallacies¶

Beliefs that new engineers often hold, which lead to failure in distributed systems:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    THE 8 FALLACIES OF DISTRIBUTED COMPUTING                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. The network is reliable.            5. Topology doesn't change.         │
│  2. Latency is zero.                    6. There is one administrator.      │
│  3. Bandwidth is infinite.              7. Transport cost is zero.          │
│  4. The network is secure.              8. The network is homogeneous.      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Fallacy	Reality
Network is reliable	Packets drop, connections reset, hardware fails
Latency is zero	Network calls are 10,000x slower than local calls
Bandwidth is infinite	Serialization overhead, network saturation
Network is secure	Defense in depth, encrypt everything
Topology doesn't change	Servers added/removed, IPs change
One administrator	Multiple teams, permissions, coordination
Transport cost is zero	Serialization, protocol overhead, cloud egress
Network is homogeneous	Different OSes, protocols, versions

Time and Clocks¶

In a distributed system, "now" is a fuzzy concept.

Physical Clocks¶

Clock Problems

Quartz crystals drift (~50ppm = 4.3 seconds/day)
NTP helps but can jump backwards
Clocks across nodes can differ by milliseconds to seconds

Logical Clocks¶

Capture causal relationships (happened-before), not physical time.

                    Process A          Process B          Process C
                        │                  │                  │
                        │    Event A1      │                  │
                        ●─────────────────>●  Event B1        │
                        │                  │                  │
                        │                  ●─────────────────>●  Event C1
                        │                  │                  │
                        │  Event A2        ●                  │
                        ●<─────────────────│  Event B2        │
                        │                  │                  │

Lamport Timestamps: A1=1, B1=2, C1=3, B2=4, A2=5

Clock Type	Description	Use Case
Lamport Clocks	Simple counter, incremented on events	Ordering events
Vector Clocks	Array of counters per node	Detecting concurrent updates
Hybrid Logical Clocks	Combines physical + logical	Modern databases (CockroachDB)

Consistency Models¶

From strongest to weakest guarantees:

STRONGEST
    │
    ▼
┌───────────────────────────────────────┐
│        STRICT CONSISTENCY              │  Impossible in practice
│  Instant global replication            │  (speed of light)
└───────────────────────────────────────┘
    │
    ▼
┌───────────────────────────────────────┐
│        LINEARIZABILITY                 │  Operations appear instantaneous
│  "Strong consistency"                  │  Once written, all reads see it
└───────────────────────────────────────┘
    │
    ▼
┌───────────────────────────────────────┐
│        SEQUENTIAL CONSISTENCY          │  Same-process ops in order
│                                        │  Global order consistent
└───────────────────────────────────────┘
    │
    ▼
┌───────────────────────────────────────┐
│        CAUSAL CONSISTENCY              │  Causally related ops in order
│                                        │  Concurrent ops may differ
└───────────────────────────────────────┘
    │
    ▼
┌───────────────────────────────────────┐
│        EVENTUAL CONSISTENCY            │  Eventually all reads return
│  "BASE model"                          │  latest value (DNS, email)
└───────────────────────────────────────┘
    │
    ▼
WEAKEST

Model	Guarantee	Example Systems
Linearizability	Real-time ordering	Spanner, etcd
Sequential	Global total order	Single-leader DBs
Causal	Respects causality	CRDT-based systems
Eventual	Convergence over time	DynamoDB, Cassandra

Key Patterns Summary¶

Pattern	Purpose	Example
Leader Election	Single coordinator	Zookeeper, etcd
Replication	Durability, read scaling	MySQL replicas
Partitioning	Write scaling	Sharded databases
Two-Phase Commit	Distributed transactions	XA transactions
Saga	Long-running transactions	Microservices workflows
Circuit Breaker	Fail fast, prevent cascade	Netflix Hystrix

Deep Dives¶

Consensus Algorithms: Paxos, Raft, and how nodes agree
Resiliency Patterns: Circuit Breakers, Bulkheads, and Sagas