Write custom probes
The escape hatch. Register a probe function in your agent's codebase and reference it by name from Observer.
A custom probe is a function you write in your agent's codebase, register by name, and reference from Observer. The agent runs it on the schedule you set and uses its return value as the metric. This is the escape hatch for monitoring that no standard probe covers: proprietary APIs with bespoke auth, calculations across several internal sources, custom protocols, or systems with no public client library.
How the trust model works
The probe code lives in your agent, deployed by you, running with whatever privileges you granted the agent. There is no sandbox and nothing to review, because it is your own trusted code, the same as the rest of the agent. Observer stores only a reference: which registered probe to run and an optional config object. Your code is never sent to or stored by Observer.
The trade-off is friction: adding a custom probe means editing the agent and redeploying. That friction is intentional. If a standard probe (HTTP, TCP, DNS, SQL, CloudWatch, and so on) fits, use it. Reach for a custom probe only when none do.
Quickstart
Custom probes live under src/sources/custom/probes/ in your agent
checkout. Add a file, register a probe, and import it from the barrel.
// src/sources/custom/probes/queue-depth.ts
registerCustomProbe({
name: "internal-queue-depth",
description: "Depth of our internal work queue",
async run({ env, log }) {
const res = await fetch(`${env.INTERNAL_API_URL}/queue/stats`, {
headers: { Authorization: `Bearer ${env.INTERNAL_API_TOKEN}` },
});
const data = await res.json();
log(`queue depth ${data.depth}`);
return data.depth; // a number
},
});
Then import it so it registers at boot:
// src/sources/custom/probes/index.ts
Redeploy the agent. On its next heartbeat the probe appears in the
Observer metric form's custom-probe dropdown. Create a metric, pick
internal-queue-depth, set thresholds, and save.
The probe contract
interface CustomProbe {
name: string; // unique; referenced from Observer
description?: string; // shown in the console dropdown
configSchema?: ZodSchema; // optional; validates probe_config
run(ctx: CustomProbeContext): Promise<number | { value: number; metadata?: object }>;
}
interface CustomProbeContext {
config: Record<string, unknown>; // probe_config from Observer
env: AgentEnv; // the agent's environment
log: (msg: string, meta?: object) => void;
signal: AbortSignal; // aborted at the timeout
}
run returns either a bare number or { value, metadata }. Anything
else (a string, an object without a numeric value, a non-finite
number) is reported as a probe error, not a metric value.
Registering two probes with the same name throws at boot, so a copy-paste mistake fails fast rather than silently shadowing.
Passing config from Observer
The metric form has a JSON config editor. Whatever object you enter
there arrives as ctx.config at runtime. Use it for per-metric
parameters so one probe serves several metrics:
registerCustomProbe({
name: "endpoint-latency",
async run({ config }) {
const url = String(config.url);
const start = Date.now();
await fetch(url);
return Date.now() - start;
},
});
Type-safe config with a schema
Declare a configSchema to validate ctx.config before run is
called. The agent rejects a metric whose config fails the schema and
reports it as a config error. Any validator exposing
safeParse(value) works, so a Zod schema fits directly:
const schema = z.object({ url: z.string().url(), warn_ms: z.number().default(500) });
registerCustomProbe({
name: "endpoint-latency",
configSchema: schema,
async run({ config }) {
const c = schema.parse(config);
const start = Date.now();
await fetch(c.url);
return Date.now() - start;
},
});
Secrets
Read secrets from the agent environment (ctx.env), not from the
Observer config. The config object is stored by Observer and visible to
anyone who can read the metric; the agent environment stays on your
host. Put API keys, tokens, and connection strings in the agent's env
and reference them in run.
Timeout and cancellation
Each probe has a hard timeout (default 30 seconds, configurable per
metric up to 30 seconds). At the deadline the agent aborts the probe's
AbortSignal and reports custom_probe_timeout. Respect ctx.signal
for clean cancellation:
async run({ config, signal }) {
const res = await fetch(String(config.url), { signal });
return res.status;
}
If your code ignores the signal, the agent still moves on at the deadline; a late return is discarded.
Errors and logs
A probe that throws is reported as custom_probe_error with the
message and a short stack in the metric's metadata. It never crashes
the agent. Use ctx.log for diagnostics; the last several lines ride
along in the probe metadata so you can see them in Observer.
Reason codes
custom_probe_not_found: no probe registered under that name on the agent. Confirm the agent was redeployed with the probe.custom_probe_config_invalid: the probe'sconfigSchemarejected the config. Check the JSON against what the probe expects.custom_probe_timeout: the probe didn't finish in time.custom_probe_error: the probe threw. See the metadata for the message and stack.custom_probe_bad_return: the probe returned something other than a finite number or{ value, metadata }.
Deployment notes
Probes are code. Deploying a new or changed probe means rebuilding and redeploying the agent; a restart re-runs the registrations. There is no hot reload. Each agent has its own registered probes, so the console dropdown shows the union across the agents in your organisation.
Assign the metric to the right agent
A metric runs only on the agent you assign it (the Agent field in the
metric form's Schedule section). A custom-probe metric must be assigned
to an agent that has registered that probe. Assign it to a different
agent, or leave the agent unset, and the probe isn't there to run, so
the metric reports custom_probe_not_found. The metric form scopes the
probe dropdown to the selected agent and warns when the pairing is
wrong.
When a standard probe is better
Prefer a built-in source whenever one fits:
- Checking an HTTP endpoint is up or fast: use the HTTP probe.
- A port is open: use the TCP probe.
- A value from a SQL query: use the SQL probe.
- A CloudWatch metric: use the CloudWatch source.
Standard sources need no code, no redeploy, and carry richer built-in error reporting. Custom probes are for the cases those cannot reach.