System Design: Building Scalable Applications
System design separates junior developers from senior engineers. It's the art of building systems that can handle growth, failure, and complexity. Let me share lessons from designing systems that now serve thousands of users.
Core Principles of System Design
1. Scalability
Design for growth from day one:
Vertical Scaling: Add more power to a single machine
└─ Cheaper initially
└─ Hardware limits (can't buy infinite RAM)
└─ Single point of failure
Horizontal Scaling: Add more machines
└─ Can scale indefinitely
└─ More complex (distributed systems problems)
└─ Better for high availability
2. Reliability
Systems should work even when things break:
// Circuit breaker pattern for handling failures
class CircuitBreaker {
private failureCount = 0;
private lastFailureTime = 0;
private state: "CLOSED" | "OPEN" | "HALF_OPEN" = "CLOSED";
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === "OPEN") {
if (Date.now() - this.lastFailureTime > 30000) {
this.state = "HALF_OPEN";
} else {
throw new Error("Circuit breaker is open");
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failureCount = 0;
this.state = "CLOSED";
}
private onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount > 5) {
this.state = "OPEN";
}
}
}3. Availability
Keep services running (measured in "nines"):
| Availability | Downtime/Year |
|---|---|
| 99% (Two Nines) | 87.6 hours |
| 99.9% (Three Nines) | 8.76 hours |
| 99.99% (Four Nines) | 52.6 minutes |
| 99.999% (Five Nines) | 5.26 minutes |
Techniques:
- Load balancing
- Redundancy
- Health checks
- Automated failover
4. Consistency
Different consistency models for different needs:
Strong Consistency
└─ Every read gets latest write
└─ Used by: Banking, stock exchanges
└─ Trade-off: Slower (requires coordination)
Eventual Consistency
└─ Reads may be stale temporarily
└─ Used by: Social media, email
└─ Trade-off: Complex to reason about
Causal Consistency
└─ Maintains causal relationships
└─ Used by: Collaborative apps
└─ Trade-off: Good balance
Architectural Patterns
Load Balancing
Distribute traffic across multiple servers:
┌─────────────┐
│ Load Balancer│
└──────┬──────┘
┌─────────────┬──┴──┬──────────────┐
│ │ │ │
┌───▼──┐ ┌──▼──┐┌──▼──┐ ┌──▼──┐
│App 1 │ │App 2││App 3│ │App N│
└──────┘ └─────┘└─────┘ └─────┘
│ │ │ │
└─────────────┴──┬──┴──────────────┘
┌──▼─────┐
│Database │
└─────────┘
Caching Strategy
Reduce database load with strategic caching:
// Multi-level caching
class CacheManager {
private l1Cache = new Map(); // In-memory (milliseconds)
private l2Cache = new Redis(); // Redis (microseconds)
private l3Cache = new Database(); // Database (milliseconds)
async get<T>(key: string, fn: () => Promise<T>): Promise<T> {
// L1: Check memory
if (this.l1Cache.has(key)) {
return this.l1Cache.get(key);
}
// L2: Check Redis
const redisValue = await this.l2Cache.get(key);
if (redisValue) {
this.l1Cache.set(key, redisValue);
return redisValue;
}
// L3: Database
const value = await fn();
await this.l2Cache.set(key, value, 3600); // 1 hour TTL
this.l1Cache.set(key, value);
return value;
}
invalidate(key: string) {
this.l1Cache.delete(key);
this.l2Cache.delete(key);
}
}Database Sharding
Distribute data across multiple databases:
// Consistent hashing for sharding
class ShardRouter {
private shards: Database[];
constructor(shardCount: number) {
this.shards = Array.from(
{ length: shardCount },
(_, i) => new Database(`shard_${i}`),
);
}
getShardId(userId: number): number {
// Consistent hashing ensures same user always goes to same shard
return userId % this.shards.length;
}
async getUserData(userId: number) {
const shardId = this.getShardId(userId);
return this.shards[shardId].query(`SELECT * FROM users WHERE id = ?`, [
userId,
]);
}
async updateUser(userId: number, data: any) {
const shardId = this.getShardId(userId);
return this.shards[shardId].query(`UPDATE users SET ? WHERE id = ?`, [
data,
userId,
]);
}
}Real-World Example: Scaling Floral Radiance
Journey from startup to 10k daily users:
Phase 1: Monolith (0-100 users)
Single Server
├── Express API
├── Next.js Frontend
└── SQLite Database
Phase 2: Separation (100-1k users)
Frontend (Vercel) API (1 instance) Database (Single)
Phase 3: Scaling (1k-10k users)
┌─────────────────────┐
│ Vercel CDN │
└──────────┬──────────┘
│
┌───────▼───────┐
│ Load Balancer │
└───────┬────────┘
┌────────────┬───┴───┬────────────┐
┌───▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐
│API 1 │ │API 2│ │API 3│ │API 4│
└───┬──┘ └──┬──┘ └──┬──┘ └──┬──┘
│ │ │ │
┌─────▼───────▼────┴──────▼─────┐
│ Redis Cache (Sessions) │
└────────────────────────────────┘
│
┌─────▼──────────────┐
│ Primary Database │
└──────────┬─────────┘
│
┌─────▼─────┐
│ Read Only │
│ Replicas │
└───────────┘
Database Design at Scale
Indexing Strategy
-- ❌ Bad: Full table scans
SELECT * FROM orders WHERE customer_id = 123 AND created_at > '2026-01-01';
-- ✓ Good: Strategic indexes
CREATE INDEX idx_customer_date ON orders(customer_id, created_at);
SELECT * FROM orders WHERE customer_id = 123 AND created_at > '2026-01-01';Query Optimization
// ❌ N+1 Query Problem
const users = await db.user.findMany();
const orders = [];
for (const user of users) {
// This queries database for EACH user!
orders.push(await db.order.findMany({ where: { userId: user.id } }));
}
// ✓ Optimized: Single query with join
const usersWithOrders = await db.user.findMany({
include: {
orders: true,
},
});Monitoring and Observability
// Three pillars of observability
class Observability {
// 1. Logging: What happened?
async handleRequest(req, res) {
logger.info(`Request: ${req.method} ${req.url}`, {
userId: req.user?.id,
timestamp: new Date(),
});
}
// 2. Metrics: How much happened?
recordMetrics() {
metrics.recordCounter("requests.total", 1);
metrics.recordHistogram("response.time_ms", responseTime);
metrics.recordGauge("users.active", activeUserCount);
}
// 3. Traces: How did it happen?
async traceRequest() {
const span = tracer.startSpan("handleRequest");
try {
const user = await db.user.findById(userId);
span.setTag("user.id", user.id);
// ... rest of operation
} finally {
span.finish();
}
}
}Common Pitfalls
| Pitfall | Impact | Solution |
|---|---|---|
| Premature optimization | Wasted effort | Measure first, optimize second |
| Over-engineering | Complexity | Start simple, add complexity as needed |
| Ignoring monitoring | Blind failures | Instrument from day one |
| Single point of failure | Downtime | Design redundancy |
| Not planning for failure | Cascading failures | Assume failures and design around them |
Conclusion
System design is about making trade-offs. There's no perfect system - only systems optimized for specific constraints. The key is:
- Understand your requirements - Scale, consistency, availability
- Know the trade-offs - Nothing is free
- Start simple - Add complexity only when needed
- Measure everything - Data-driven decisions
- Plan for failure - It will happen
The most scalable systems are often the simplest ones that solve the problem well.