Configure gRPC health-check probes
Probe a gRPC service using the standard gRPC Health Checking Protocol. Reports SERVING / NOT_SERVING or Check latency.
gRPC probes call the standard gRPC Health Checking Protocol
(grpc.health.v1.Health/Check) on the configured interval and report
the serving status as 1 (SERVING) or 0 (NOT_SERVING), or the Check
round-trip latency in milliseconds.
This is the right probe for gRPC services used in service-to-service communication. It does not invoke arbitrary methods on your service: only the well-known health-check RPC, which most gRPC frameworks expose with a few lines of setup.
What it requires of your service
Your server must implement the standard health service. Most gRPC stacks ship a ready-made implementation:
- Go:
google.golang.org/grpc/health+grpc_health_v1.RegisterHealthServer. - Java:
grpc-servicesHealthStatusManager. - Node:
grpc-health-check. - Python:
grpcio-health-checking.
If the server does not implement it, the probe reports
grpc_unimplemented rather than a connection error, so you can tell the
two apart.
Overall health vs a named service
The health service tracks status per service name, plus an overall server status under the empty name:
- Leave Service name blank to check the overall server health. This is the common case.
- Set a registered service name (for example
my.package.MyService) to check just that service. If the name is not registered, the probe reportsgrpc_service_unknown.
Transport security
Three modes:
- Plaintext (
plaintext): no TLS. Use inside a trusted cluster network (h2c). - TLS (
tls): server-authenticated TLS. Set the CA certificate env var if the server uses a private CA; leave it blank to use the system trust store. - mTLS (
mtls): mutual TLS. The agent presents a client certificate. This reuses the same env-var-reference mechanism as the HTTP probe's mTLS: each field is the NAME of an environment variable on the agent host whose value is the PEM material (or a path to a PEM file). The cloud stores only the variable name, never the certificate or key.
Set the cert / key env vars on the agent and reference them by name. See the HTTP probe guide's mTLS section for the env-var patterns (systemd, Docker, Kubernetes secret mounts).
Configuration shape
Plaintext overall-health check:
{
"host": "grpc.internal",
"port": 50051,
"tls_mode": "plaintext",
"interpretation": "health_state",
"timeout_ms": 5000
}
mTLS check of a named service with an auth token:
{
"host": "grpc.internal",
"port": 443,
"service": "my.package.MyService",
"tls_mode": "mtls",
"client_cert_ref": "OBSERVER_GRPC_CLIENT_CERT",
"client_key_ref": "OBSERVER_GRPC_CLIENT_KEY",
"ca_cert_ref": "OBSERVER_GRPC_CA",
"metadata": { "authorization": "Bearer ..." },
"interpretation": "health_state",
"timeout_ms": 5000
}
Field reference
| Field | Default | Notes |
|---|---|---|
host | required | Hostname or IP of the gRPC server. |
port | required | gRPC port (often 50051, or 443 behind TLS). |
service | "" | Empty checks overall server health; a name checks one registered service. |
tls_mode | plaintext | plaintext, tls, or mtls. |
client_cert_ref / client_key_ref | none | Env var names for the client cert + key. Required for mtls. |
ca_cert_ref | none | Env var name for a CA cert PEM to verify the server. Optional for tls / mtls. |
metadata | none | gRPC call metadata (for example an authorization token). Never logged or surfaced. |
timeout_ms | 5000 | Check deadline, 100 to 30000 ms. |
interpretation | health_state | health_state (1 / 0) or latency (ms). |
Interpretations
| Interpretation | Value | Threshold idea |
|---|---|---|
health_state | 1 for SERVING, 0 for NOT_SERVING. UNKNOWN or an unregistered service is no_data. | Healthy: over 0, Unhealthy: under 1 |
latency | Check round-trip in ms (reported on any successful Check). | Healthy: under 200, Unhealthy: over 1000 |
Reason codes
grpc_unimplemented: the server doesn't implement the health service. Addgrpc.health.v1.Healthto the server.grpc_service_unknown: the named service isn't registered. Check the name, or leave it blank for overall health.grpc_health_unknown: the server answered UNKNOWN (often mid-startup).grpc_unavailable: connection failed (refused, DNS, network). Distinct from auth and TLS errors.grpc_timeout: the Check didn't complete within the timeout.grpc_unauthenticated: the server rejected the call as unauthenticated. Check the metadata token.grpc_permission_denied: authenticated but not authorized.grpc_tls_failed: TLS handshake failed. Check the TLS mode and set the CA cert env var for a private CA.grpc_ca_unreadable: the CA cert env var is unset or points at an unreadable file on the agent.grpc_error: an uncategorised gRPC error. Check the agent log for the status code.
Troubleshooting
grpc_unimplementedon a service you know is up. The server is reachable but doesn't register the health service. This is a server change, not a probe setting.grpc_unavailableonly over TLS. If plaintext works and TLS doesn't, the listener may not be terminating TLS on that port, or the port differs. Confirm the TLS port.grpc_tls_failedagainst a private CA. Setca_cert_refto an env var holding the CA PEM so the agent can verify the server.- mTLS reports
mtls_ref_missing. The field wants the NAME of an env var, not the certificate text. Set the env var on the agent and reference it by name.