Observer
Observer Agent

Heartbeat payload

JSON shape the agent posts to /api/agent/heartbeat.

The agent emits a heartbeat to the cloud on a fixed interval (typically every 30 seconds). The payload is the agent's view of its own runtime state. The cloud uses the payload to compute 24-hour uptime, restart counts, and lag alerts.

Endpoint

POST <CLOUD_SERVER_URL>/api/agent/heartbeat
Authorization: provided via Agent-Key header (key transport detail)
Content-Type: application/json

Body

{
  "version": "1.2.3",
  "uptime_seconds": 12345,
  "buffer_size": 0,
  "buffer_oldest_age_seconds": 0,
  "queue_depth": 0,
  "queue_oldest_age_seconds": 0,
  "queue_capacity": 10000,
  "agent_started_at": "2026-05-09T12:00:00Z",
  "source_types_active": ["prometheus", "http", "tcp"]
}

Field reference

FieldTypeMeaning
versionstringBuild-time version of the agent.
uptime_secondsintegerWall-clock seconds since process start.
buffer_sizeintegerDeprecated alias for queue_depth; kept for older cloud versions.
buffer_oldest_age_secondsintegerLegacy alias for queue_oldest_age_seconds.
queue_depthintegerPushes currently waiting in the local queue.
queue_oldest_age_secondsintegerAge of the oldest pending push, in seconds.
queue_capacityintegerHard cap on the queue (BUFFER_MAX_ROWS).
agent_started_atISO timestampWhen the process started. The cloud uses changes to this value to detect restarts.
source_types_activestring[]Distinct source types the agent has actually run since boot.

Lag and uptime alerts

The queue and uptime numbers in the heartbeat drive two alerts:

  • agent.lag_high: opens when queue_depth is above 1000 or queue_oldest_age_seconds is above 300.
  • agent.uptime_degraded: opens when uptime over the last 24 hours falls below 95%.

These alerts surface on the agent detail page and as webhook events when subscribed. A separate agent.offline event fires when the cloud stops receiving heartbeats from the agent.

Duplicate key detection

The agent sends its start time in an Agent-Started-At header on every request. When two processes share one agent key, the cloud treats the most recently started process as the owner:

  • Heartbeats from the older process are acknowledged but ignored. They do not count toward uptime and do not update the dashboard.
  • Pushes from the older process are rejected with HTTP 409. The agent drops the rejected samples instead of retrying and logs a warning at most once every five minutes.
  • The dashboard shows a duplicate key warning on the agent, and an agent.duplicate_key webhook event fires (state open). Once the older process goes quiet for ten minutes, the warning clears and a cleared event follows.

Restart detection is monotonic: only a strictly newer agent_started_at records a restart. To resolve a duplicate key situation, stop one of the two processes, or rotate the key and give each deployment its own agent.

Versioning

Field additions to the payload are additive. Older agents that omit newer fields stay compatible; older cloud versions ignore fields they do not recognise.

Was this page helpful?