SLOs and error budgets
How service level objectives translate metric status into a contractual signal.
A Service Level Objective (SLO) is a commitment that a metric will remain healthy for a defined fraction of a rolling window. SLOs turn the binary "is this healthy right now" question into a running balance: the error budget, which is the remaining allowance of unhealthy time.
Definition
An SLO has three core fields:
- Metric: which metric the SLO observes.
- Target percentage: the fraction of the window the metric
must be
healthy. Common values: 99, 99.5, 99.9, 99.95, 99.99. - Window in days: the rolling period the target applies to. Common values: 7, 30, 90.
The window is rolling: at any instant, the SLO looks back N days and computes the fraction of that time the metric was healthy. There is no calendar boundary that resets the budget.
Error budget
Given a 99.9% target over 30 days, the budget allowance is:
allowance = 30 days * (1 - 99.9 / 100)
= 30 days * 0.001
= 43.2 minutes per 30-day window
The budget burns whenever the metric is in the unhealthy state.
It does not burn for degraded, no_data, or unknown (the
threshold operators reference
covers each).
Burn events
A burn event opens when the metric flips to unhealthy and the SLO
drops below 100% remaining. It closes when the metric returns to
healthy. Each burn event records its start, end, and the percent of
the budget it consumed.
Webhook subscribers receive slo.burn_started when an event opens
and slo.burn_resolved when it closes. Pair the two by their
burn_event_id.
Picking a target
The right SLO target reflects the system's actual achieved availability over the prior 90 days, plus a margin for the behaviour you want to drive. Three common starting points:
- 99.5% for a new service or unknown baseline. Loose enough that noise does not drive false alerts.
- 99.9% for a service with a stable history and a reasonable remediation pipeline.
- 99.99% for systems where customers feel every minute of unhealthy time. Requires investment in error-handling and rapid remediation; otherwise the target produces churn rather than signal.
Per-customer targets
Different customers can sign different SLO targets against the same underlying metric. The model and configuration steps live in Customer scopes.