• DevOps Weekly
  • Posts
  • System Design 101: Understanding Database Sharding

System Design 101: Understanding Database Sharding

In this article, we'll explore Database Sharding—what it is, why it's crucial, and how companies use it to handle massive amounts of data. We'll use real-world examples from Spotify and Zoom to show how sharding solves scalability challenges.

Hello “👋

Welcome to another week, another opportunity to become a great DevOps and Software Engineer

Today’s issue is brought to you by DevOpsWeekly→ A great resource for devops and backend engineers. We offer next-level devops and backend engineering resources.

PS: Before we dive into the topic of today, I have some very exciting news to share with you:

I am launching a platform called Mentoraura in March-designed to help you break into tech, grow your career, and become a world-class engineer.

Mentoraura is for:
Beginners who want a structured, no-BS path to mastering DevOps and software engineering
Career changers looking for practical, hands-on mentorship to transition into tech Engineers who want to level up their skills and build real-world expertise

With Mentoraura, I’ll guide you through solving real business challenges, mastering in-demand technologies, and becoming a highly valuable engineer in the industry.

🔥 Join the waitlist now and be the first to access the platform when it launches! 👉 mentoraura.com

In our last episode, we discussed Caching-how storing frequently accessed data in memory helps systems like YouTube and Twitter deliver lightning-fast performance. We saw how caching reduces database load and improves user experience.

Now, let's dive into Sharding, the technique that lets databases scale horizontally by splitting data across multiple machines.

What is Database Sharding?

Imagine Spotify's music library stored in a single database table with billions of songs. Every time you searched for a track, the system would need to scan this enormous table, making searches painfully slow.

Sharding solves this by breaking the database into smaller, more manageable pieces called shards. Each shard contains a subset of the data (like organizing a library by sections). For example:

  • Shard 1: Songs A-M

  • Shard 2: Songs N-Z

When you search for "Ed Sheeran," Spotify only needs to query the shard containing "S" artists, making your search instant.

Why is Sharding Important?

Let's look at Zoom, which hosts millions of concurrent meetings worldwide. Without sharding:

  • All meeting data (participants, chat logs, recordings) would be in one database

  • During peak hours, the database would become a bottleneck

  • Users might experience lag or dropped calls

With sharding:

  • Meeting data is split by region (e.g., North America, Europe, Asia)

  • When you join a call, Zoom queries only the relevant regional shard

  • The system scales seamlessly as more users join

How Companies Implement Sharding

  1. Horizontal Partitioning: Splitting data by rows (e.g., user IDs 1-10M in Shard A, 10M-20M in Shard B)

  2. Vertical Partitioning: Splitting by columns (e.g., user profiles in Shard A, payment data in Shard B)

  3. Directory-Based: Using a lookup service to track which shard contains which data

  4. Hash-Based: Applying a hash function to distribute data evenly (e.g., hashing user emails to assign to shards)

Real-World Example: Spotify

Challenge: Store 100M+ songs while keeping search fast
Solution:

  • Shards organized by artist name ranges

  • "Ed Sheeran" queries only hit the "S" shard

  • Results return in milliseconds instead of seconds

Impact: Instant search results, even with massive catalog growth

Real-World Example: Zoom

Challenge: Handle metadata for millions of concurrent meetings
Solution:

  • Shards organized by geographic region

  • Meeting in Japan? Only the Asia shard is queried

  • No single point of failure

Impact: Smooth performance even during global usage spikes

Tradeoffs to Consider

Pros:

  • Enables near-infinite scalability

  • Improves query performance

  • Isolates failures (one shard going down doesn't crash everything)

Cons:

  • Complex to implement and maintain

  • Joins across shards are challenging

  • Requires careful planning for even data distribution

Sharding is how Spotify serves 100M+ songs and Zoom hosts endless meetings without breaking a sweat. It transforms impossible scalability challenges into manageable solutions.

Next week, we'll explore Content Delivery Networks (CDNs)—how Netflix streams 4K video globally without buffering. Stay tuned!

P.S. If you found this helpful, share it with a friend or colleague who’s on their DevOps journey. Let’s grow together!

Got questions or thoughts? Reply to this newsletter-we’d love to hear from you!

See you on Next Week.

Remember to get  Salezoft→ A great comprehensive cloud-based platform designed for business management, offering solutions for retail, online stores, barbershops, salons, professional services, and healthcare. It includes tools for point-of-sale (POS), inventory management, order management, employee management, invoicing, and receipt generation.

Weekly Backend and DevOps Engineering Resources

Reply

or to participate.