Write custom probes

The escape hatch. Register a probe function in your agent's codebase and reference it by name from Observer.

A custom probe is a function you write in your agent's codebase, register by name, and reference from Observer. The agent runs it on the schedule you set and uses its return value as the metric. This is the escape hatch for monitoring that no standard probe covers: proprietary APIs with bespoke auth, calculations across several internal sources, custom protocols, or systems with no public client library.

How the trust model works#

The probe code lives in your agent, deployed by you, running with whatever privileges you granted the agent. There is no sandbox and nothing to review, because it is your own trusted code, the same as the rest of the agent. Observer stores only a reference: which registered probe to run and an optional config object. Your code is never sent to or stored by Observer.

The trade-off is friction: adding a custom probe means editing the agent and redeploying. That friction is intentional. If a standard probe (HTTP, TCP, DNS, SQL, CloudWatch, and so on) fits, use it. Reach for a custom probe only when none do.

Quickstart#

Custom probes live under src/sources/custom/probes/ in your agent checkout. Add a file, register a probe, and import it from the barrel.

// src/sources/custom/probes/queue-depth.ts

registerCustomProbe({
  name: "internal-queue-depth",
  description: "Depth of our internal work queue",
  async run({ env, log }) {
    const res = await fetch(`${env.INTERNAL_API_URL}/queue/stats`, {
      headers: { Authorization: `Bearer ${env.INTERNAL_API_TOKEN}` },
    });
    const data = await res.json();
    log(`queue depth ${data.depth}`);
    return data.depth; // a number
  },
});

Then import it so it registers at boot:

// src/sources/custom/probes/index.ts

Redeploy the agent. On its next heartbeat the probe appears in the Observer metric form's custom-probe dropdown. Create a metric, pick internal-queue-depth, set thresholds, and save.

The probe contract#

interface CustomProbe {
  name: string;            // unique; referenced from Observer
  description?: string;    // shown in the console dropdown
  configSchema?: ZodSchema; // optional; validates probe_config
  run(ctx: CustomProbeContext): Promise<number | { value: number; metadata?: object }>;
}

interface CustomProbeContext {
  config: Record<string, unknown>; // probe_config from Observer
  env: AgentEnv;                    // the agent's environment
  log: (msg: string, meta?: object) => void;
  signal: AbortSignal;              // aborted at the timeout
}

run returns either a bare number or { value, metadata }. Anything else (a string, an object without a numeric value, a non-finite number) is reported as a probe error, not a metric value.

Registering two probes with the same name throws at boot, so a copy-paste mistake fails fast rather than silently shadowing.

Passing config from Observer#

The metric form has a JSON config editor. Whatever object you enter there arrives as ctx.config at runtime. Use it for per-metric parameters so one probe serves several metrics:

registerCustomProbe({
  name: "endpoint-latency",
  async run({ config }) {
    const url = String(config.url);
    const start = Date.now();
    await fetch(url);
    return Date.now() - start;
  },
});

Type-safe config with a schema#

Declare a configSchema to validate ctx.config before run is called. The agent rejects a metric whose config fails the schema and reports it as a config error. Any validator exposing safeParse(value) works, so a Zod schema fits directly:


const schema = z.object({ url: z.string().url(), warn_ms: z.number().default(500) });

registerCustomProbe({
  name: "endpoint-latency",
  configSchema: schema,
  async run({ config }) {
    const c = schema.parse(config);
    const start = Date.now();
    await fetch(c.url);
    return Date.now() - start;
  },
});

Secrets#

Read secrets from the agent environment (ctx.env), not from the Observer config. The config object is stored by Observer and visible to anyone who can read the metric; the agent environment stays on your host. Put API keys, tokens, and connection strings in the agent's env and reference them in run.

Timeout and cancellation#

Each probe has a hard timeout (default 30 seconds, configurable per metric up to 30 seconds). At the deadline the agent aborts the probe's AbortSignal and reports custom_probe_timeout. Respect ctx.signal for clean cancellation:

async run({ config, signal }) {
  const res = await fetch(String(config.url), { signal });
  return res.status;
}

If your code ignores the signal, the agent still moves on at the deadline; a late return is discarded.

Errors and logs#

A probe that throws is reported as custom_probe_error with the message and a short stack in the metric's metadata. It never crashes the agent. Use ctx.log for diagnostics; the last several lines ride along in the probe metadata so you can see them in Observer.

Reason codes#

custom_probe_not_found: no probe registered under that name on the agent. Confirm the agent was redeployed with the probe.
custom_probe_config_invalid: the probe's configSchema rejected the config. Check the JSON against what the probe expects.
custom_probe_timeout: the probe didn't finish in time.
custom_probe_error: the probe threw. See the metadata for the message and stack.
custom_probe_bad_return: the probe returned something other than a finite number or { value, metadata }.

Deployment notes#

Probes are code. Deploying a new or changed probe means rebuilding and redeploying the agent; a restart re-runs the registrations. There is no hot reload. Each agent has its own registered probes, so the console dropdown shows the union across the agents in your organisation.

Assign the metric to the right agent#

A metric runs only on the agent you assign it (the Agent field in the metric form's Schedule section). A custom-probe metric must be assigned to an agent that has registered that probe. Assign it to a different agent, or leave the agent unset, and the probe isn't there to run, so the metric reports custom_probe_not_found. The metric form scopes the probe dropdown to the selected agent and warns when the pairing is wrong.

When a standard probe is better#

Prefer a built-in source whenever one fits:

Checking an HTTP endpoint is up or fast: use the HTTP probe.
A port is open: use the TCP probe.
A value from a SQL query: use the SQL probe.
A CloudWatch metric: use the CloudWatch source.

Standard sources need no code, no redeploy, and carry richer built-in error reporting. Custom probes are for the cases those cannot reach.