Scenario
The Product team wants to launch a new “Search” microservice. They ask for “100% availability” because it’s critical.
Question
How do you explain why 100% is unrealistic, and how do you help them define a meaningful SLO (Service Level Objective)?
Expected Answer
- Education: Explain the cost of “nines”. 100% limits innovation (no deployments allowed). Error budgets allow for safe failure.
- Definition:
- SLI (Indicator): Latency < 200ms and Success Rate (HTTP 200s).
- SLO (Objective): 99.9% of requests meet the SLI over 30 days.
- Consequences: Define what happens if the Error Budget is depleted (freeze features, focus on stability).