Tutorials System Design Mastery
Database Sharding: Scaling writes across 100 nodes
On this page
Database Sharding
When a single database server can't handle the data volume or write throughput, we use Sharding. This is the process of splitting a single logical dataset across multiple physical database instances (Shards).
1. The Sharding Key
The most critical decision in sharding is the Sharding Key. This determines which shard a piece of data goes to.
- Hash-based:
Shard = Hash(UserID) % NumberOfShards. (Even distribution, but hard to add new shards). - Range-based: Shard 1: A-M, Shard 2: N-Z. (Easy to add shards, but can create "Hotspots").
2. The Penalty of Sharding
Once you shard, you lose the ability to perform Joins across shards. You also lose Global Atomicity (ACID). Your application logic becomes significantly more complex as it now has to "know" which shard to talk to for every query.
4. Interview Mastery
Q: "What is 'Resharding' and why is it scary?"
Architect Answer: "Resharding is the process of moving data between shards when a shard becomes too full or when you add new servers. It is scary because moving terabytes of live data while the app is running is like **changing an airplane engine mid-flight**. We use techniques like 'Consistent Hashing' to minimize the amount of data that needs to move during resharding."
Sign in to ask a question or upvote helpful answers.
No questions yet — be the first to ask!