System Design¶
"Design is not just what it looks like and feels like. Design is how it works." — Steve Jobs
System design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements.
-
Scalability
Vertical vs horizontal scaling, load balancing, caching, and sharding strategies.
-
API Design
REST, GraphQL, gRPC patterns and best practices for service interfaces.
The Three Pillars¶
Every production system must balance these concerns:
┌─────────────────────────────────────────────────────────────────────────────┐
│ SYSTEM DESIGN PILLARS │
└─────────────────────────────────────────────────────────────────────────────┘
RELIABILITY SCALABILITY MAINTAINABILITY
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Works correctly │ │ Handles growth │ │ Easy to operate │
│ under adversity │ │ efficiently │ │ and evolve │
├─────────────────┤ ├─────────────────┤ ├─────────────────┤
│ - Fault tolerant│ │ - Horizontal │ │ - Observability │
│ - Redundancy │ │ - Vertical │ │ - Simplicity │
│ - Graceful fail │ │ - Auto-scaling │ │ - Documentation │
└─────────────────┘ └─────────────────┘ └─────────────────┘
| Pillar | Definition | Key Practices |
|---|---|---|
| Reliability | System works correctly despite faults | Redundancy, failover, testing, monitoring |
| Scalability | System handles increased load | Load balancing, caching, sharding, CDN |
| Maintainability | System is easy to operate and evolve | Clean code, documentation, observability |
Core Concepts¶
CAP Theorem¶
In a distributed data store, you can only guarantee two of the three:
| Property | Description |
|---|---|
| Consistency | Every read receives the most recent write or an error |
| Availability | Every request receives a response (not necessarily the latest) |
| Partition Tolerance | System works despite network failures between nodes |
Reality Check
Partition tolerance is mandatory in distributed systems. You choose between CP (Consistency + Partition Tolerance) and AP (Availability + Partition Tolerance).
ACID vs. BASE¶
| ACID (RDBMS) | BASE (NoSQL) |
|---|---|
| Atomicity | Basically Available |
| Consistency | Soft state |
| Isolation | Eventual consistency |
| Durability | |
| Strong consistency | High availability |
Back-of-the-Envelope Math¶
Essential numbers for capacity planning and system design interviews.
Powers of Two¶
| Power | Value | Common Name |
|---|---|---|
| $2^{10}$ | ~1,000 | 1 KB |
| $2^{20}$ | ~1,000,000 | 1 MB |
| $2^{30}$ | ~1,000,000,000 | 1 GB |
| $2^{40}$ | ~1,000,000,000,000 | 1 TB |
Latency Numbers¶
┌─────────────────────────────────────────────────────────────────────────────┐
│ LATENCY COMPARISON │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ L1 Cache Reference ████ 0.5 ns │
│ Mutex Lock/Unlock ████████████████ 100 ns │
│ Main Memory Reference ████████████████ 100 ns │
│ Send 2KB over 1Gbps ████████████████████████ 20 us │
│ Read 1MB from Memory ████████████████████████████ 250 us │
│ Datacenter Round Trip █████████████████████████████ 500 us │
│ Disk Seek ██████████████████████████████ 10 ms │
│ CA to Netherlands Round Trip ██████████████████████████████ 150 ms │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Operation | Latency |
|---|---|
| L1 Cache Reference | 0.5 ns |
| Mutex Lock/Unlock | 100 ns |
| Main Memory Reference | 100 ns |
| Send 2KB over 1Gbps | 20,000 ns (20 us) |
| Read 1MB from Memory | 250,000 ns (250 us) |
| Datacenter Round Trip | 500,000 ns (500 us) |
| Disk Seek | 10,000,000 ns (10 ms) |
| CA to Netherlands Round Trip | 150,000,000 ns (150 ms) |
System Design Interview Framework¶
1. CLARIFY REQUIREMENTS (5 min)
└── Functional requirements
└── Non-functional requirements (scale, latency, availability)
└── Constraints and assumptions
2. ESTIMATE SCALE (5 min)
└── Users: DAU, peak concurrent users
└── Storage: data size, growth rate
└── Bandwidth: read/write ratio
3. HIGH-LEVEL DESIGN (10-15 min)
└── Core components
└── Data flow
└── API design
4. DEEP DIVE (15-20 min)
└── Data model
└── Scaling strategies
└── Trade-offs
5. WRAP UP (5 min)
└── Bottlenecks
└── Future improvements
└── Monitoring and alerts
Deep Dives¶
- Scalability Strategies: Vertical vs Horizontal, Load Balancing, Caching, Sharding
- API Design: REST, GraphQL, gRPC, and best practices