System Design: Building Scalable Applications

April 12, 2026 (1w ago)

System Design: Building Scalable Applications

System design separates junior developers from senior engineers. It's the art of building systems that can handle growth, failure, and complexity. Let me share lessons from designing systems that now serve thousands of users.

Core Principles of System Design

1. Scalability

Design for growth from day one:

Vertical Scaling: Add more power to a single machine
└─ Cheaper initially
└─ Hardware limits (can't buy infinite RAM)
└─ Single point of failure

Horizontal Scaling: Add more machines
└─ Can scale indefinitely
└─ More complex (distributed systems problems)
└─ Better for high availability

2. Reliability

Systems should work even when things break:

// Circuit breaker pattern for handling failures
class CircuitBreaker {
  private failureCount = 0;
  private lastFailureTime = 0;
  private state: "CLOSED" | "OPEN" | "HALF_OPEN" = "CLOSED";
 
  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === "OPEN") {
      if (Date.now() - this.lastFailureTime > 30000) {
        this.state = "HALF_OPEN";
      } else {
        throw new Error("Circuit breaker is open");
      }
    }
 
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
 
  private onSuccess() {
    this.failureCount = 0;
    this.state = "CLOSED";
  }
 
  private onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount > 5) {
      this.state = "OPEN";
    }
  }
}

3. Availability

Keep services running (measured in "nines"):

Availability Downtime/Year
99% (Two Nines) 87.6 hours
99.9% (Three Nines) 8.76 hours
99.99% (Four Nines) 52.6 minutes
99.999% (Five Nines) 5.26 minutes

Techniques:

4. Consistency

Different consistency models for different needs:

Strong Consistency
└─ Every read gets latest write
└─ Used by: Banking, stock exchanges
└─ Trade-off: Slower (requires coordination)

Eventual Consistency
└─ Reads may be stale temporarily
└─ Used by: Social media, email
└─ Trade-off: Complex to reason about

Causal Consistency
└─ Maintains causal relationships
└─ Used by: Collaborative apps
└─ Trade-off: Good balance

Architectural Patterns

Load Balancing

Distribute traffic across multiple servers:

                    ┌─────────────┐
                    │ Load Balancer│
                    └──────┬──────┘
          ┌─────────────┬──┴──┬──────────────┐
          │             │     │              │
      ┌───▼──┐      ┌──▼──┐┌──▼──┐      ┌──▼──┐
      │App 1 │      │App 2││App 3│      │App N│
      └──────┘      └─────┘└─────┘      └─────┘
          │             │     │              │
          └─────────────┴──┬──┴──────────────┘
                        ┌──▼─────┐
                        │Database │
                        └─────────┘

Caching Strategy

Reduce database load with strategic caching:

// Multi-level caching
class CacheManager {
  private l1Cache = new Map(); // In-memory (milliseconds)
  private l2Cache = new Redis(); // Redis (microseconds)
  private l3Cache = new Database(); // Database (milliseconds)
 
  async get<T>(key: string, fn: () => Promise<T>): Promise<T> {
    // L1: Check memory
    if (this.l1Cache.has(key)) {
      return this.l1Cache.get(key);
    }
 
    // L2: Check Redis
    const redisValue = await this.l2Cache.get(key);
    if (redisValue) {
      this.l1Cache.set(key, redisValue);
      return redisValue;
    }
 
    // L3: Database
    const value = await fn();
    await this.l2Cache.set(key, value, 3600); // 1 hour TTL
    this.l1Cache.set(key, value);
    return value;
  }
 
  invalidate(key: string) {
    this.l1Cache.delete(key);
    this.l2Cache.delete(key);
  }
}

Database Sharding

Distribute data across multiple databases:

// Consistent hashing for sharding
class ShardRouter {
  private shards: Database[];
 
  constructor(shardCount: number) {
    this.shards = Array.from(
      { length: shardCount },
      (_, i) => new Database(`shard_${i}`),
    );
  }
 
  getShardId(userId: number): number {
    // Consistent hashing ensures same user always goes to same shard
    return userId % this.shards.length;
  }
 
  async getUserData(userId: number) {
    const shardId = this.getShardId(userId);
    return this.shards[shardId].query(`SELECT * FROM users WHERE id = ?`, [
      userId,
    ]);
  }
 
  async updateUser(userId: number, data: any) {
    const shardId = this.getShardId(userId);
    return this.shards[shardId].query(`UPDATE users SET ? WHERE id = ?`, [
      data,
      userId,
    ]);
  }
}

Real-World Example: Scaling Floral Radiance

Journey from startup to 10k daily users:

Phase 1: Monolith (0-100 users)

Single Server
├── Express API
├── Next.js Frontend
└── SQLite Database

Phase 2: Separation (100-1k users)

Frontend (Vercel)    API (1 instance)    Database (Single)

Phase 3: Scaling (1k-10k users)

                ┌─────────────────────┐
                │    Vercel CDN       │
                └──────────┬──────────┘
                           │
                   ┌───────▼───────┐
                   │ Load Balancer  │
                   └───────┬────────┘
          ┌────────────┬───┴───┬────────────┐
      ┌───▼──┐    ┌──▼──┐ ┌──▼──┐    ┌──▼──┐
      │API 1 │    │API 2│ │API 3│    │API 4│
      └───┬──┘    └──┬──┘ └──┬──┘    └──┬──┘
          │       │    │      │
    ┌─────▼───────▼────┴──────▼─────┐
    │   Redis Cache (Sessions)       │
    └────────────────────────────────┘
          │
    ┌─────▼──────────────┐
    │ Primary Database   │
    └──────────┬─────────┘
               │
         ┌─────▼─────┐
         │ Read Only  │
         │ Replicas  │
         └───────────┘

Database Design at Scale

Indexing Strategy

-- ❌ Bad: Full table scans
SELECT * FROM orders WHERE customer_id = 123 AND created_at > '2026-01-01';
 
-- ✓ Good: Strategic indexes
CREATE INDEX idx_customer_date ON orders(customer_id, created_at);
SELECT * FROM orders WHERE customer_id = 123 AND created_at > '2026-01-01';

Query Optimization

// ❌ N+1 Query Problem
const users = await db.user.findMany();
const orders = [];
for (const user of users) {
  // This queries database for EACH user!
  orders.push(await db.order.findMany({ where: { userId: user.id } }));
}
 
// ✓ Optimized: Single query with join
const usersWithOrders = await db.user.findMany({
  include: {
    orders: true,
  },
});

Monitoring and Observability

// Three pillars of observability
class Observability {
  // 1. Logging: What happened?
  async handleRequest(req, res) {
    logger.info(`Request: ${req.method} ${req.url}`, {
      userId: req.user?.id,
      timestamp: new Date(),
    });
  }
 
  // 2. Metrics: How much happened?
  recordMetrics() {
    metrics.recordCounter("requests.total", 1);
    metrics.recordHistogram("response.time_ms", responseTime);
    metrics.recordGauge("users.active", activeUserCount);
  }
 
  // 3. Traces: How did it happen?
  async traceRequest() {
    const span = tracer.startSpan("handleRequest");
    try {
      const user = await db.user.findById(userId);
      span.setTag("user.id", user.id);
      // ... rest of operation
    } finally {
      span.finish();
    }
  }
}

Common Pitfalls

Pitfall Impact Solution
Premature optimization Wasted effort Measure first, optimize second
Over-engineering Complexity Start simple, add complexity as needed
Ignoring monitoring Blind failures Instrument from day one
Single point of failure Downtime Design redundancy
Not planning for failure Cascading failures Assume failures and design around them

Conclusion

System design is about making trade-offs. There's no perfect system - only systems optimized for specific constraints. The key is:

  1. Understand your requirements - Scale, consistency, availability
  2. Know the trade-offs - Nothing is free
  3. Start simple - Add complexity only when needed
  4. Measure everything - Data-driven decisions
  5. Plan for failure - It will happen

The most scalable systems are often the simplest ones that solve the problem well.