Microservices & Event-Driven Architecture (EDA) Mastery

Case Study: Designing a Global Notification Engine (Reliability at Scale)

1 Views Updated 5/4/2026

Case Study: Notification Engine

The challenge: Deliver 1B notifications (Push, Email, SMS) per day across the globe with less than 2 seconds of latency, while handling vendor failures.

1. Architecture: The 'Fan-Out' Pattern

A single `NotificationRequested` event is published to a Kafka topic. Multiple workers (Canary, Email, SMS) consume this same event. This allows us to scale each channel independently. If SMS volume spikes, we just add more SMS workers without affecting Email delivery.

2. Resiliency: Circuit Breakers for Vendors

External vendors (SendGrid, Twilio) fail or throttle us. We use **Polly Circuit Breakers**. If Twilio fails 5 times, we stop calling them for 60 seconds and automatically route the "Critical" SMS through a backup vendor (e.g., AWS SNS). This ensures our users always get their 2FA codes regardless of vendor status.

3. Rate Limiting: The 'Priority Queue'

Not all notifications are equal. 'Password Reset' is high priority; 'Monthly Newsletter' is low. We use separate queues. If the system is under heavy load, the Newsletter queue is throttled, ensuring that the critical system messages always have enough CPU and network bandwidth to ship instantly.

4. Interview Mastery

Q: "How do you prevent sending the same notification twice if a vendor is slow?"

Architect Answer: "**External Idempotency**. We generate a `NotificationId` and pass it to the vendor. Most modern vendors (like Stripe or SendGrid) accept an idempotency key. If we retry the call because of a timeout, the vendor sees the key and realizes it's a duplicate, preventing the user from receiving two identical emails."

Microservices & Event-Driven Architecture (EDA) Mastery
1. Foundations of Microservices
The Monolith to Microservices transition: When and why? Domain Driven Design (DDD): Bounded Contexts and Aggregates Database Per Service: Managing data consistency Service Discovery and Health Checks in .NET
2. Communication Patterns
Synchronous Communication: HTTP/gRPC and Service Mesh Asynchronous Communication: Message Brokers (RabbitMQ/Kafka) API Gateways: YARP (Yet Another Reverse Proxy) vs Ocelot Protobuf and Shared Contracts: Managing breaking changes
3. Event-Driven Architecture (EDA)
Introduction to EDA: Producers, Consumers, and Topics The Publisher/Subscriber Pattern in .NET Event Sourcing: Capturing every state change CQRS (Command Query Responsibility Segregation) with MediatR
4. Distributed Transactions & Resiliency
The Saga Pattern: Orchestration vs Choreography The Outbox Pattern: Ensuring reliable message delivery Idempotency: Preventing duplicate message processing Distributed Locking with Redis (Redlock)
5. Observability & Monitoring
Distributed Tracing with OpenTelemetry Centralized Logging: ELK Stack (Elasticsearch, Logstash, Kibana) Metrics and Dashboards: Prometheus and Grafana Correlation IDs: Tracking requests across services
6. Security & Identity
Centralized Authentication: IdentityServer4 & Duende Identity OAuth2 and OIDC Flow for Microservices API Key Management and Rate Limiting Mutual TLS (mTLS) for Internal Service-to-Service Security
7. Infrastructure & Deployment
Containerization: Production-grade Dockerfiles Kubernetes for .NET: Pods, Services, and Ingress Helm Charts: Managing complex deployments Blue-Green and Canary Deployments in K8s
8. FAANG Microservices Case Studies
Case Study: Designing a Global Notification Engine (Reliability at Scale) Case Study: Building a High-Performance Logging Pipeline (PB/Day)