Microservices & Event-Driven Architecture (EDA) Mastery

Case Study: Designing a Global Notification Engine (Reliability at Scale)

1 Views Updated 5/4/2026

Case Study: Notification Engine

The challenge: Deliver 1B notifications (Push, Email, SMS) per day across the globe with less than 2 seconds of latency, while handling vendor failures.

1. Architecture: The 'Fan-Out' Pattern

A single `NotificationRequested` event is published to a Kafka topic. Multiple workers (Canary, Email, SMS) consume this same event. This allows us to scale each channel independently. If SMS volume spikes, we just add more SMS workers without affecting Email delivery.

2. Resiliency: Circuit Breakers for Vendors

External vendors (SendGrid, Twilio) fail or throttle us. We use **Polly Circuit Breakers**. If Twilio fails 5 times, we stop calling them for 60 seconds and automatically route the "Critical" SMS through a backup vendor (e.g., AWS SNS). This ensures our users always get their 2FA codes regardless of vendor status.

3. Rate Limiting: The 'Priority Queue'

Not all notifications are equal. 'Password Reset' is high priority; 'Monthly Newsletter' is low. We use separate queues. If the system is under heavy load, the Newsletter queue is throttled, ensuring that the critical system messages always have enough CPU and network bandwidth to ship instantly.

4. Interview Mastery

Q: "How do you prevent sending the same notification twice if a vendor is slow?"

Architect Answer: "**External Idempotency**. We generate a `NotificationId` and pass it to the vendor. Most modern vendors (like Stripe or SendGrid) accept an idempotency key. If we retry the call because of a timeout, the vendor sees the key and realizes it's a duplicate, preventing the user from receiving two identical emails."

Previous Part Next Part

Microservices & Event-Driven Architecture (EDA) Mastery

Case Study: Designing a Global Notification Engine (Reliability at Scale)

Case Study: Notification Engine

1. Architecture: The 'Fan-Out' Pattern

2. Resiliency: Circuit Breakers for Vendors

3. Rate Limiting: The 'Priority Queue'

4. Interview Mastery

Toolliyo Code Playground

Microservices & Event-Driven Architecture (EDA) Mastery