Scenario
A bad configuration push caused a cascading failure across the entire checkout flow. You are the designated Incident Commander (IC). Multiple teams (Network, DB, App) are jumping on the call, panicking, and proposing fixes simultaneously.
Question
How do you manage this situation effectively?
Expected Answer
- Establish Command: calm everyone down. State clearly: “I am the IC. All actions must be approved by me.”
- Roles: Assign a Scribe (record timeline), Ops Lead (execute commands), and Comms Lead (stakeholder updates).
- Triage: Focus on mitigation (rollback, shed load, killswitch), not root cause analysis (RCA).
- Communication: Keep the main channel clear. Spin off investigation threads if needed but mandate reporting back.
- Aftermath: Once stable, declare “Incident Closed” and schedule the Post-Mortem.