Observer
Observer

Incidents and metrics

How customer-facing incidents relate to metric-driven status.

Observer's status model has two layers:

  1. Metrics drive status by default. A metric flips to unhealthy when its measured value crosses the threshold; the page status rolls up from the worst metric. No human action required.
  2. Incidents are the customer comm layer on top. An incident is what the operator publishes to explain context — what is broken, what we know, what we are doing about it.

Both layers can fire independently, and they often do.

Why two layers

A metric flip is an automated signal. The threshold breach happened at 14:32:18 because the agent reported 4.2% errors and the unhealthy rule said over 2%. That is precise, but it is not customer communication. Customers want to know:

  • Are you aware?
  • What is the impact?
  • When will it be fixed?
  • How will I know it is fixed?

Those are operator-authored sentences. The metric flip cannot answer them on its own.

How they relate at runtime

The page status that customers see is only driven by metrics. Posting an incident does not change page status; resolving an incident does not change page status. Status is the measured truth; incidents are the human commentary.

The exception is manual metrics (see Manual metrics): when an open incident lists a service, the manual metrics on that service auto-set their status to mirror the incident severity. This is the case where incidents drive status — by design — because manual metrics have no probe to measure them.

The "draft from metric" flow

When a metric flips unhealthy, the metric edit page surfaces a Draft incident CTA. One click pre-fills:

  • Title: Investigating: <metric-title>
  • Severity: major if metric is unhealthy, minor if degraded
  • Affected services: every service that has an SLO bound to this metric
  • Initial message: Investigating <metric>. Current status: <status>

The operator reviews, edits if needed, and publishes. The "metric flipped → I need to update status" loop drops from minutes to one click.

The CTA is idempotent within 30 minutes: a second click on the same metric in the same window surfaces the existing draft instead of creating a duplicate.

Auto-drafts (opt-in)

The same "draft from metric" path can run automatically. Opt a metric in via the Automatic incident creation section on its edit form (Pro+). When the metric flips unhealthy, Observer creates the draft for you and emails your org owners with publish / dismiss buttons.

The auto flow shares the same dedup rule as the manual CTA — if an open incident already affects the metric's service, a message is appended to the existing incident instead of opening a new one. Per-metric cooldown is one hour. Drafts that go unactioned for 24 hours auto-expire.

See the full setup walkthrough at Auto-incident creation.

Was this page helpful?