Observer
Observer Agent

Configure a Grafana Loki source

Turn a LogQL aggregation into a metric. Error rates, event counts, and other signals that live only in logs.

The Loki source runs one LogQL instant query per interval and reports the single numeric result as a metric. Use it for production signals that live only in logs: error counts, specific pattern frequencies, business event rates.

Observer is not a log store. The Loki source extracts one number from a LogQL aggregation; it never reads or keeps log lines. This is the same shape as the SQL probe: the query returns a number, that number is the metric value.

Loki vs Prometheus

  • Prometheus: numeric time series your services already export.
  • Loki: a number you compute from log content at query time (for example, how many lines matched ERROR in the last 5 minutes).

If a signal is already a Prometheus metric, use the Prometheus source. Reach for Loki when the signal exists only in the logs.

The query must aggregate

A raw log query returns log lines, not a number, so it is rejected. The query has to use a metric aggregation. Observer checks this when you save, and the agent checks again at run time (a log-stream result is reported as loki_not_aggregation).

Sample queries by use case:

  • Error rate: sum(rate({app="checkout"} |= "ERROR" [5m]))
  • Event count: count_over_time({service="payments"} |~ "charge_failed" [1h])
  • Parsed and filtered: sum(rate({app="api"} | json | __error__="" [5m]))

The query must resolve to a single value. If it returns several label sets, the probe reports loki_multiple_series; aggregate further (for example sum without (instance) (...)) so it collapses to one number.

Configuration shape

{
  "base_url": "https://loki.internal.example.com",
  "query": "sum(rate({app=\"checkout\"} |= \"ERROR\" [5m]))",
  "auth_mode": "bearer",
  "token_ref": "OBSERVER_LOKI_TOKEN",
  "tenant_id": "team-a",
  "timeout_ms": 10000
}

Field reference

FieldDefaultNotes
base_urlrequiredLoki's HTTP endpoint. The agent appends /loki/api/v1/query.
queryrequiredA LogQL metric aggregation. Raw log streams are rejected.
auth_modenonenone, bearer, or basic.
token_refnoneEnv var NAME holding the bearer token. Required when auth_mode: bearer.
username / password_refnoneBasic auth username + env var NAME of the password. Both required when auth_mode: basic.
tenant_idnoneSent as X-Scope-OrgID for multi-tenant Loki.
timeout_ms10000100 to 60000 ms.

Authentication

Auth secrets stay on the agent. For bearer or basic auth you store the NAME of an environment variable on the agent host, not the token. The agent reads the value at query time, so the token never reaches the cloud and never appears in logs.

Export the variable on the agent and reference it by name:

Multi-tenant Loki

A multi-tenant Loki requires the X-Scope-OrgID header on every query; without it Loki returns 401 even with valid credentials. Set tenant_id to the org/tenant the logs live under. If your Loki runs in single-tenant mode (auth_enabled: false), leave tenant_id blank. A missing or wrong tenant is the most common cause of an unexpected loki_unauthorized.

Testing the query

Observer cannot reach a private Loki from the cloud, so there is no live "run query" button here. Validate the query in Grafana's Explore view (or logcli) against the same Loki, confirm it returns a single number, then paste it in. The first real value arrives on the next agent tick.

Reason codes

  • loki_not_aggregation: the query returned log lines. Wrap it in count_over_time / rate / sum(...).
  • loki_no_data: no events matched in the window. A label/filter typo, or genuinely nothing happened.
  • loki_multiple_series: more than one label set. Aggregate further.
  • loki_unreachable: couldn't connect to base_url (refused, DNS, network).
  • loki_timeout: the query didn't finish in time. Narrow the range.
  • loki_unauthorized: 401/403. Check auth + the tenant_id.
  • loki_auth_ref_missing: the token/password env var isn't set on the agent.
  • loki_query_error: Loki rejected the LogQL (400/422). The error text is in the metadata; test the query in Grafana.
  • loki_server_error: Loki returned a 5xx. Usually transient.

Troubleshooting

  • loki_not_aggregation on a query that works in Grafana. Grafana's log view accepts raw streams; the metric probe needs an aggregation. Add count_over_time(... [5m]) or wrap in sum(rate(...)).
  • loki_unauthorized with a valid token. Almost always a missing or wrong tenant_id on multi-tenant Loki.
  • loki_multiple_series. The query returns one series per label value. Use sum(...) or sum without (label) (...) to collapse it.
Was this page helpful?