Tutorials System Design Mastery

Database Sharding: Scaling writes across 100 nodes

On this page

Database Sharding

When a single database server can't handle the data volume or write throughput, we use Sharding. This is the process of splitting a single logical dataset across multiple physical database instances (Shards).

1. The Sharding Key

The most critical decision in sharding is the Sharding Key. This determines which shard a piece of data goes to.

  • Hash-based: Shard = Hash(UserID) % NumberOfShards. (Even distribution, but hard to add new shards).
  • Range-based: Shard 1: A-M, Shard 2: N-Z. (Easy to add shards, but can create "Hotspots").

2. The Penalty of Sharding

Once you shard, you lose the ability to perform Joins across shards. You also lose Global Atomicity (ACID). Your application logic becomes significantly more complex as it now has to "know" which shard to talk to for every query.

4. Interview Mastery

Q: "What is 'Resharding' and why is it scary?"

Architect Answer: "Resharding is the process of moving data between shards when a shard becomes too full or when you add new servers. It is scary because moving terabytes of live data while the app is running is like **changing an airplane engine mid-flight**. We use techniques like 'Consistent Hashing' to minimize the amount of data that needs to move during resharding."

Questions on this lesson 0

Sign in to ask a question or upvote helpful answers.

No questions yet — be the first to ask!

System Design Mastery
Course syllabus
1. Distributed Systems Fundamentals
2. Database Scalability
3. Caching & CDN Strategies
4. Event-Driven Architecture
5. High Availability & Load Balancing
6. Microservices & API Gateway
7. Monitoring & Disaster Recovery
8. FAANG System Design Interview
Toolliyo Assistant
Ask about tutorials, ebooks, training, pricing, mentor services, and support. I use public site content only—not admin or internal tools.

care@toolliyo.com

Need callback? Share your details