Observer
Observer Agent

Configure gRPC health-check probes

Probe a gRPC service using the standard gRPC Health Checking Protocol. Reports SERVING / NOT_SERVING or Check latency.

gRPC probes call the standard gRPC Health Checking Protocol (grpc.health.v1.Health/Check) on the configured interval and report the serving status as 1 (SERVING) or 0 (NOT_SERVING), or the Check round-trip latency in milliseconds.

This is the right probe for gRPC services used in service-to-service communication. It does not invoke arbitrary methods on your service: only the well-known health-check RPC, which most gRPC frameworks expose with a few lines of setup.

What it requires of your service

Your server must implement the standard health service. Most gRPC stacks ship a ready-made implementation:

  • Go: google.golang.org/grpc/health + grpc_health_v1.RegisterHealthServer.
  • Java: grpc-services HealthStatusManager.
  • Node: grpc-health-check.
  • Python: grpcio-health-checking.

If the server does not implement it, the probe reports grpc_unimplemented rather than a connection error, so you can tell the two apart.

Overall health vs a named service

The health service tracks status per service name, plus an overall server status under the empty name:

  • Leave Service name blank to check the overall server health. This is the common case.
  • Set a registered service name (for example my.package.MyService) to check just that service. If the name is not registered, the probe reports grpc_service_unknown.

Transport security

Three modes:

  • Plaintext (plaintext): no TLS. Use inside a trusted cluster network (h2c).
  • TLS (tls): server-authenticated TLS. Set the CA certificate env var if the server uses a private CA; leave it blank to use the system trust store.
  • mTLS (mtls): mutual TLS. The agent presents a client certificate. This reuses the same env-var-reference mechanism as the HTTP probe's mTLS: each field is the NAME of an environment variable on the agent host whose value is the PEM material (or a path to a PEM file). The cloud stores only the variable name, never the certificate or key.

Set the cert / key env vars on the agent and reference them by name. See the HTTP probe guide's mTLS section for the env-var patterns (systemd, Docker, Kubernetes secret mounts).

Configuration shape

Plaintext overall-health check:

{
  "host": "grpc.internal",
  "port": 50051,
  "tls_mode": "plaintext",
  "interpretation": "health_state",
  "timeout_ms": 5000
}

mTLS check of a named service with an auth token:

{
  "host": "grpc.internal",
  "port": 443,
  "service": "my.package.MyService",
  "tls_mode": "mtls",
  "client_cert_ref": "OBSERVER_GRPC_CLIENT_CERT",
  "client_key_ref": "OBSERVER_GRPC_CLIENT_KEY",
  "ca_cert_ref": "OBSERVER_GRPC_CA",
  "metadata": { "authorization": "Bearer ..." },
  "interpretation": "health_state",
  "timeout_ms": 5000
}

Field reference

FieldDefaultNotes
hostrequiredHostname or IP of the gRPC server.
portrequiredgRPC port (often 50051, or 443 behind TLS).
service""Empty checks overall server health; a name checks one registered service.
tls_modeplaintextplaintext, tls, or mtls.
client_cert_ref / client_key_refnoneEnv var names for the client cert + key. Required for mtls.
ca_cert_refnoneEnv var name for a CA cert PEM to verify the server. Optional for tls / mtls.
metadatanonegRPC call metadata (for example an authorization token). Never logged or surfaced.
timeout_ms5000Check deadline, 100 to 30000 ms.
interpretationhealth_statehealth_state (1 / 0) or latency (ms).

Interpretations

InterpretationValueThreshold idea
health_state1 for SERVING, 0 for NOT_SERVING. UNKNOWN or an unregistered service is no_data.Healthy: over 0, Unhealthy: under 1
latencyCheck round-trip in ms (reported on any successful Check).Healthy: under 200, Unhealthy: over 1000

Reason codes

  • grpc_unimplemented: the server doesn't implement the health service. Add grpc.health.v1.Health to the server.
  • grpc_service_unknown: the named service isn't registered. Check the name, or leave it blank for overall health.
  • grpc_health_unknown: the server answered UNKNOWN (often mid-startup).
  • grpc_unavailable: connection failed (refused, DNS, network). Distinct from auth and TLS errors.
  • grpc_timeout: the Check didn't complete within the timeout.
  • grpc_unauthenticated: the server rejected the call as unauthenticated. Check the metadata token.
  • grpc_permission_denied: authenticated but not authorized.
  • grpc_tls_failed: TLS handshake failed. Check the TLS mode and set the CA cert env var for a private CA.
  • grpc_ca_unreadable: the CA cert env var is unset or points at an unreadable file on the agent.
  • grpc_error: an uncategorised gRPC error. Check the agent log for the status code.

Troubleshooting

  • grpc_unimplemented on a service you know is up. The server is reachable but doesn't register the health service. This is a server change, not a probe setting.
  • grpc_unavailable only over TLS. If plaintext works and TLS doesn't, the listener may not be terminating TLS on that port, or the port differs. Confirm the TLS port.
  • grpc_tls_failed against a private CA. Set ca_cert_ref to an env var holding the CA PEM so the agent can verify the server.
  • mTLS reports mtls_ref_missing. The field wants the NAME of an env var, not the certificate text. Set the env var on the agent and reference it by name.
Was this page helpful?