Auto-incident creation
Opt a metric in to automatic draft-incident creation when it flips unhealthy. Drafts ship with email CTAs so a human always verifies before customers see the incident.
When a metric flips unhealthy in the middle of the night, the on-call already knows. The question is whether the customer-facing status page should be updated to reflect that. Auto-incident creation does the typing-out part for you — without ever publishing without a human pressing a button.
How it works
- You opt a metric in to the feature on its edit form (Pro+).
- The metric flips unhealthy (with dwell gating, exactly as a manual status change would).
- The auto-incident worker creates a draft incident on the metric's bound service.
- Observer emails your org owners with two buttons: Publish (flip to published; customers see it) and Dismiss (soft-delete the draft).
- If neither button is clicked within 24 hours, the draft auto-expires. Nothing ever reaches the public page without a human action.
Enable for a metric
- Open Console → Metrics → <your metric> → Edit.
- Scroll to the Automatic incident creation section.
- Pick a Policy:
- Off — auto-creation is disabled for this metric.
- On — create immediately — a draft is created the moment the metric flips unhealthy.
- On — wait then re-check — Observer waits the configured number of seconds, then re-checks the metric's current status. If it's still unhealthy, the draft is created. If the metric recovered during the dwell window, nothing happens. This is the recommended setting for metrics that occasionally flap.
- Pick a Severity (
minor/major/critical). This value is stamped on every auto-drafted incident. - For dwell-mode, pick a Dwell seconds value between 60 and 3600. Defaults to 300 (5 minutes).
- Save.
What gets created
When the worker fires, you get:
- A new incident row with:
title:Investigating elevated errors on <metric title>severity: as configured on the metricaffected_services: every service that has an SLO pointing at the metricis_auto_drafted:true- An initial
Informationmessage describing the value vs the threshold and the timestamp.
- An audit row (
incident.auto_draftedon the metric, plus the parent row on the incident itself). - A webhook event
incident.auto_drafted(separate from the manualincident.createdso you can listen specifically). - An email to every org owner who hasn't opted out (see Notification preferences).
Email CTAs
Each email has two buttons:
- Publish incident —
GET /api/incidents/auto-action?token=…&action=publishinside the signed token. Flips the draft to published. Firesincident.auto_published. - Dismiss draft —
GET /api/incidents/auto-action?token=…&action=dismiss. Soft-deletes the row. Firesincident.auto_dismissedwithreason: "operator_dismiss".
The token format is base64url(body) + "." + base64url(sig) with
body <incidentId>|<action>|<expiresAtMs> and signature
HMAC-SHA-256(server_secret, body). Action is part of the signed
body, not the URL — you can't flip a publish link to dismiss (or
vice versa) by editing the URL. Tokens expire after 24 hours.
Both endpoints are idempotent. Re-clicking publish after the incident is already published returns a success page. Re-clicking dismiss after it's already dismissed returns a success page.
Dedup, cooldown, and expiry
Three guardrails keep the auto-incident flow from spamming you:
- Dedup against open incidents on the service. If you (or a
prior auto-draft) have already filed an incident affecting the
metric's service, the worker appends a new Information message
to the existing incident instead of creating a duplicate.
Message text:
Metric <name> is now unhealthy (auto-detected). - One auto-draft per metric per hour. If a metric was already auto-drafted or auto-dismissed in the last hour, the worker skips. Flapping metrics never produce more than one draft per hour.
- 24-hour auto-expiry. Drafts older than 24 hours that
haven't been published or dismissed are soft-deleted by a
15-minute cron, audited as
incident.auto_expired, and fireincident.auto_dismissedwithreason: "auto_expired".
Notification preferences
Per-user opt-out lives at Console → Settings → Notifications → Auto-incident draft emails. Default is ON for org owners. Owners who toggle this off do not receive auto-incident emails (any other type of email is unaffected).
The toggle stores as
users.notification_preferences.autoIncidentDrafts = false on the
user row.
Plan gate
This feature is Pro+ only. Free and Starter plans see a
locked-feature card on the metric edit form. Set the metric
policy to disabled (the default) on lower plans or upgrade.
Webhook events
Three event types fire from the auto flow:
incident.auto_drafted— fires when the draft is created.incident.auto_published— fires when the draft is published via the email link (or the equivalent API endpoint).incident.auto_dismissed— fires for both the email-dismiss and the 24h auto-expiry paths.reasondistinguishes them.
Payloads are documented at Webhook payload reference.
Recommended setup
For most teams:
- Dwell mode with 300 seconds for any latency or error-rate metric. The dwell window catches noisy alarms before they generate an email.
- Immediate mode for binary signals (TLS expiry hit zero, a service is unreachable). These should not flap, so dwell adds nothing.
- Leave auto-creation off for noisy dashboards that are not customer-visible. The console already shows unhealthy metrics; not every internal alarm deserves a draft.