Operations
Building an Incident Response Playbook
Sarah Chen
2024-12-10
10 min read
Panic is the result of unpreparedness. An incident response playbook is a set of pre-agreed rules that kick in when an alert fires.
Phase 1: Triage
The first responder's only job is to confirm the issue and assess severity. Is this a SEV-1 (Wake everyone up) or a SEV-3 (Fix it tomorrow)?
Phase 2: Assemble
Get the right people in the room (or Zoom). Establish a clear Incident Commander. Everyone else follows their lead.
Phase 3: Communicate
Update the status page within 15 minutes. Even "We are investigating reports of X" is better than silence. Silence creates conspiracy theories.
Phase 4: Mitigate (Not Fix)
Focus on stopping the bleeding. If a bad deploy caused it, rollback. Don't try to "fix forward" unless it's the only option. Restore service first, diagnose root cause later.