Event-Driven Systems

Event-driven systems decouple producers from consumers by passing facts (events) through a broker. They’re the backbone of microservices, real-time pipelines, and any system where “things happen and other things react.”

Event vs Command vs Message

Term	Definition	Example
Event	An immutable fact — something that happened	`OrderPlaced`, `PaymentFailed`
Command	A request for a specific action to occur	`PlaceOrder`, `ProcessPayment`
Message	Generic envelope — events and commands are both messages	Any payload sent over a broker

Core Patterns

Event Notification — Services emit lightweight events; consumers decide what to do. Loose coupling, but consumers must fetch state if they need details. Use when you want decoupling and don’t need full event history.
Event-Carried State Transfer — Events carry enough data so consumers never need to call back. Reduces coupling further but bloats event payloads. Use when consumers need state immediately and round-trips are expensive.
Event Sourcing — State is derived by replaying a log of events; the event store is the source of truth. Enables full audit trail and time-travel debugging. Use when auditability, compliance, or temporal queries matter.
CQRS (Command Query Responsibility Segregation) — Write model (commands) and read model (queries) are separate. Often paired with Event Sourcing but not required. Use when read/write access patterns diverge significantly.

Saga: Choreography vs Orchestration

Sagas manage distributed transactions across services without two-phase commit.

	Choreography	Orchestration
How it works	Each service reacts to events and emits new ones	A central coordinator tells each service what to do
Coupling	Low — services only know their own events	Higher — coordinator knows the full flow
Observability	Hard — flow is implicit across services	Easy — flow is explicit in one place
Failure handling	Compensating events per service	Coordinator handles rollback logic
Best for	Simple, stable workflows	Complex, long-running, or frequently-changing flows

Delivery Semantics

Semantic	Guarantee	Cost	Use Case
At-most-once	May lose messages	Cheapest — fire and forget	Metrics, telemetry where loss is acceptable
At-least-once	No message loss; duplicates possible	Moderate — requires retry logic	Most production systems; pair with idempotent consumers
Exactly-once	No loss, no duplicates	Expensive — distributed coordination	Financial transactions, billing (verify broker support)

Kafka vs Azure Service Bus

	Apache Kafka	Azure Service Bus
Model	Distributed log (pull)	Message broker (push/pull)
Retention	Configurable — messages persist after consumption	Deleted after consumption (or DLQ)
Replay	✅ Built-in — rewind consumer offset	❌ Not supported
Ordering	Per-partition only	Per session (with sessions enabled)
Throughput	Millions of msgs/sec	Thousands of msgs/sec
Delivery	At-least-once; exactly-once within cluster	At-least-once; deduplicated window available
Dead-letter	Manual (separate topic)	✅ Built-in DLQ per queue/topic
Best for	Event streaming, log aggregation, replay scenarios	Workflow integration, enterprise messaging, Azure-native apps

Eventual Consistency

In distributed systems, writes propagate asynchronously — there’s a window where different nodes see different data. That’s the trade-off you accept for availability and partition tolerance (CAP theorem).

How to handle it:

Idempotent consumers — Process the same message twice safely; use unique event IDs.
Outbox pattern — Write to DB and event log atomically; avoids dual-write inconsistency.
Saga compensations — If a downstream step fails, emit compensating events to undo prior steps.
Read-your-writes consistency — Route reads to the write replica briefly after a mutation if stale reads are unacceptable.

Dead-Letter Queues & Poison Messages

A dead-letter queue (DLQ) is where messages go when they can’t be processed — after max retries, on deserialization failure, or when they expire.

A poison message is one that causes the consumer to crash or loop repeatedly. Without a DLQ, it blocks the whole queue.

Key practices:

Always configure a DLQ — never let a broker silently drop failed messages.
Alert on DLQ depth; a growing DLQ is a canary for schema or logic bugs.
Include original metadata (error reason, retry count) in the DLQ message for debugging.

Common Gotchas

Confusing Event Sourcing with event-driven — They often coexist but are independent concepts. Interviewers use this to separate practitioners from readers.
Choreography spaghetti — With enough services, choreography produces invisible workflows. No single place shows the full business process.
Schema evolution breaks consumers — Adding a required field or renaming a property can silently break downstream consumers. Use schema registries (Confluent Schema Registry, Azure Schema Registry) and favour additive changes.
Kafka ordering is per-partition — Globally ordered processing requires a single partition, which kills parallelism. Design partition keys carefully.
Exactly-once is harder than it sounds — Broker-level exactly-once doesn’t cover your database writes. You still need idempotent consumers for true end-to-end guarantees.