Observer
Observer

Why metrics, not pings

The case for metric-based status over availability pings.

Most status page tools assert availability with periodic pings: a GET request every 60 seconds against a public endpoint, with a green check when the response code is 2xx. Observer's default is to compute status from metrics you already collect, with pings as one source among many. The reasoning:

Pings only see the public envelope

A ping confirms a single endpoint accepted a single request at a single moment. It does not see:

  • The error rate served to actual customers in the last five minutes.
  • The 95th percentile latency under real load.
  • The depth of an internal queue draining slower than its inflow.
  • A degraded backend that has been masked by retries upstream.

A page that reads green from pings while customers are filing support tickets is the standard failure mode of ping-based status.

Metrics see the actual signal

Observer's primary data source is your own metrics: Prometheus queries, HTTP probes that include body checks, TCP connection times, DNS resolution times, TLS certificate expiry. The status the public page shows is computed from the same numbers your on-call team already trusts on the internal Grafana dashboard.

The result: when customers see red, the on-call's dashboard shows the same red, with the same threshold semantics. There is no gap.

Pings still have a place

For systems that do not emit metrics (third-party APIs, public DNS, certificates issued by external CAs), the agent supports HTTP, TCP, DNS, and TLS-cert probes directly. These produce a metric in the same shape as a Prometheus query: a numeric value with a timestamp, evaluated against thresholds.

Was this page helpful?