Microservices Trade-offs
Microservices Trade-offs
Section titled “Microservices Trade-offs”Microservices split a system into independently deployable services. Done right, they enable scale and team autonomy. Done wrong, they’re a monolith with extra network hops.
The Progression
Section titled “The Progression”Don’t jump straight to microservices. Most systems follow this arc:
| Stage | What It Is | When to Use |
|---|---|---|
| Monolith | Single deployable unit, all code together | MVPs, small teams, fast iteration |
| Modular Monolith | Monolith with enforced internal boundaries | Growing teams, prepping for a future split |
| Microservices | Independently deployed services per domain | Scale, multiple teams, proven domain boundaries |
Trade-offs
Section titled “Trade-offs”| What You Gain | What You Pay |
|---|---|
| Independent deployability per service | Distributed system complexity |
| Scale each service in isolation | Network latency on every call |
| Team autonomy around domains | Operational overhead — CI/CD × N services |
| Fault isolation (one service fails, others survive) | Data consistency is now your problem |
| Tech stack flexibility per service | Observability requires serious tooling investment |
| Smaller codebases, easier to reason about | Integration testing is significantly harder |
Communication Patterns
Section titled “Communication Patterns”Services need to talk. Two fundamental approaches:
| Type | Examples | When to Use |
|---|---|---|
| Sync (request/response) | REST, gRPC | Real-time queries, reads, simple commands |
| Async (event-driven) | Kafka, RabbitMQ, Azure Service Bus | Workflows, writes, cross-service side effects |
Sync chains are fragile — if A calls B calls C, one failure cascades. Async decouples services; the publisher doesn’t wait for consumers to catch up.
Data Patterns
Section titled “Data Patterns”Each service should own its data. A shared database creates hidden coupling — a schema change in one service can silently break others.
| Pattern | Pros | Cons |
|---|---|---|
| Database-per-service | True isolation, independent schema evolution | Cross-service joins need API calls or events |
| Shared database | Easy joins, simple to start | Tight coupling, hard to deploy independently |
Accept that data will be eventually consistent across services. Use events to synchronise state, and design your UX to tolerate slight delays.
Resilience Patterns
Section titled “Resilience Patterns”Distributed systems fail in partial ways — handle it explicitly:
- Circuit Breaker — After N failures, stop calling the failing service and fail fast. In .NET, Polly implements this in a few lines.
- Retry with exponential backoff — Retry transient failures, but wait longer between each attempt. Without backoff, retries amplify load on an already struggling service.
- Timeouts — Always set timeouts on outbound calls. Without them, one slow dependency can exhaust your thread pool and take down the caller.
- Bulkhead — Isolate calls to different services in separate thread pools so one slow dependency can’t starve all others.
Observability
Section titled “Observability”A monolith has one log file. Microservices have N — you can’t debug what you can’t see:
- Distributed tracing — Propagate a correlation/trace ID across all service calls. Tools like Jaeger, Zipkin, or Azure Application Insights let one trace show the full request path.
- Centralised logging — Aggregate logs from all services into one place (ELK, Seq, Azure Monitor). Search by trace ID to reconstruct any request’s journey.
- Health checks — Every service exposes
/healthand/ready. Kubernetes and Azure Container Apps use these to route traffic and restart unhealthy instances.
Common Gotchas
Section titled “Common Gotchas”- Distributed monolith — Services that are technically separate but share a database or require coordinated deploys. You’ve got all the complexity with none of the benefits.
- Eventual consistency underestimated — “The order exists in Service A but not B yet” confuses users and breaks downstream logic. Design for it deliberately, not as an afterthought.
- Operational overhead — Each service needs its own CI/CD, logging, monitoring, and on-call runbook. 10 services = 10× the operational surface area.
- Too many services too soon — “Death by a thousand cuts.” Splitting before you understand the domain gives you wrong boundaries that are painful to fix later.
- No contract testing — Consumer-driven contract tests (e.g., Pact) catch breaking API changes before production. Without them, a schema change in Service A silently breaks Service B.