Observer
Observer

Incident SLO impact

How the auto-impact panel computes burn rate and time to budget exhaustion.

When an incident lists affected services, every SLO bound to those services contributes to the auto-impact panel. The panel updates every 30 seconds while an incident is open and freezes on resolve.

What gets computed

For each affected SLO:

  • Burn during incident: total seconds the metric was unhealthy between the incident's published_at and either resolved_at or now, whichever is earlier.
  • Percent of budget consumed: burn seconds divided by the SLO's total budget seconds. Total budget = window seconds × (1 − target%).
  • Total budget remaining: read from slos.error_budget_remaining_pct (populated by the SLO eval scheduler tick, not recomputed in the panel).
  • Time to exhaust: at the current burn rate (burn seconds / incident duration seconds), how long until the remaining budget reaches zero. Reported in minutes; null when the burn rate is zero.

Caching

Repeated panel polls within 30 seconds reuse the same computation (in-memory cache keyed by incident id). This protects the SLO eval pipeline from hammering when an open dashboard polls every 30s.

Sources of error

  • The metric history table is the source of truth for burn. If the agent missed pushes during the incident, those gaps are not counted as unhealthy.
  • The remaining-budget % comes from the most recent SLO eval tick. If the scheduler fell behind, the value can be stale by a few minutes. The burn-during-incident value is always fresh.
  • Time-to-exhaust extrapolates a linear burn rate. Real systems rarely sustain a linear rate; treat the number as a rough budget rather than a precise countdown.

Public visibility

The auto-impact panel is console-only by default. A per-incident toggle exposes a slimmed view (burn % only, no time-to-exhaust) on the public incident page. Some operators choose to surface it for transparency; others view it as internal-only. The default is off.

Was this page helpful?