Read metrics from AWS CloudWatch
Configure the agent to pull a single CloudWatch metric per cron tick using GetMetricData, with optional cross-account role assumption.
The agent runs a CloudWatch GetMetricData query on the configured
cron interval and reports the most recent data point. One metric
definition maps to one
(region, namespace, metric_name, dimensions, statistic, period)
tuple; create separate definitions for separate metrics or regions.
Pick this source when your workload already publishes to CloudWatch (AWS-managed services, custom EMF metrics, vendor agents pushing to CloudWatch) and you don't want to stand up a CloudWatch exporter. For everything else, the Prometheus source and OTLP receiver are cheaper.
AWS credentials come from the agent's environment, not from this configuration. Set them at the agent process level (env vars on a container, EC2 instance role, EKS IRSA, or ECS task role); the form only carries the target of the read.
When NOT to use this
- The metric is already in Prometheus via a CloudWatch exporter
(yace, cloudwatch_exporter). Use the
prometheussource. You get one query against your Prometheus instead of one CloudWatch API call per tick, and the CloudWatch billing surface stays at your exporter. - You need sub-minute granularity. CloudWatch periods are 60s, 300s, 900s, or 3600s. If you need 10-second resolution, send the metric over OTLP instead.
- You need to alert on the absence of a metric. CloudWatch can
take 1-3 periods to publish; the agent looks back 5 periods to
absorb that lag. If a metric stops emitting,
cloudwatch_no_datasurfaces after ~5 periods, not immediately.
AWS credentials
The agent uses the standard AWS SDK credential provider chain. In order of precedence:
- Environment variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, optionalAWS_SESSION_TOKEN. - AWS shared credentials file (
~/.aws/credentials) with an optionalAWS_PROFILEenv var to pick a profile. - EC2 instance metadata (the agent runs on an EC2 instance with an attached IAM role).
- ECS task role (the agent runs as an ECS task with a task role).
- EKS pod identity / IRSA (the agent runs as a Kubernetes pod with an associated service account).
Pick whichever fits your deployment. For Kubernetes deployments, IRSA avoids handling access keys: bind the IAM role to a service account and the agent picks up credentials from the pod identity.
Minimum-permissions IAM policy
Attach this policy to the role the agent assumes (or to the access key's user, if you're using static credentials):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ObserverAgentReadMetrics",
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricData",
"cloudwatch:ListMetrics"
],
"Resource": "*"
}
]
}
GetMetricData and ListMetrics do not support resource-level
constraints, so Resource must be *. Tighten the scope at the
role's trust policy instead. ListMetrics is required for the
console's Fetch from AWS affordance; omit it if you only want
read access for probes and accept the curated catalog for discovery.
Cross-account access
When the metric lives in account B and the agent runs in account A:
-
In account B, create a role (e.g.
observer-cloudwatch-read) with the policy above. Add a trust policy permitting account A's role / user to assume it:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::AAAAAAAAAAAA:role/observer-agent" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "your-external-id" } } } ] } -
In Observer, set the metric definition's Role ARN to
arn:aws:iam::BBBBBBBBBBBB:role/observer-cloudwatch-readand External ID toyour-external-id. The agent will callsts:AssumeRolewith its ambient credentials before eachGetMetricData. -
In account A, attach this inline policy to the role the agent uses (the same identity named in the
Principalof step 1's trust policy). That's usually the IRSA service-account role, the EC2 instance role, or the ECS task role:{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": "arn:aws:iam::BBBBBBBBBBBB:role/observer-cloudwatch-read" } ] }
The external ID is optional from AWS's side but recommended; it prevents the "confused deputy" attack where a third party tricks account A into reading from the wrong account B role.
Configure a metric in the Observer console
Create a metric, pick AWS CloudWatch, fill in:
- Region: AWS region code (
us-east-1,eu-west-2, etc.). - Period: 60s, 300s, 900s, or 3600s. Lower = more API calls per hour (one per cron tick) and finer-grained alerting. See Cost considerations.
- Namespace:
AWS/RDS,AWS/Lambda,AWS/ApplicationELB, or your custom namespace. See Common namespaces and metrics. - Metric name: e.g.
CPUUtilization. Case-sensitive. - Dimensions:
Key=Valuelines that scope to a single resource (e.g.DBInstanceIdentifier=prod-db). - Statistic:
Average,Sum,Minimum,Maximum,SampleCount, or a percentile (p50,p95,p99.9). See Statistic reference. - Role ARN / External ID (optional): for cross-account reads. See Cross-account access.
Statistic reference
Pick a statistic that maps your metric's meaning to a single number per period:
| Metric shape | Recommended statistic | Example |
|---|---|---|
| Gauge / utilization (CPU, memory, queue depth) | Average | AWS/RDS CPUUtilization Average |
| Counter (requests, errors, invocations) | Sum | AWS/Lambda Invocations Sum |
| Latency-style with skew | p95 or p99 | AWS/ApplicationELB TargetResponseTime p95 |
| Spike detection | Maximum | AWS/ApplicationELB HTTPCode_Target_5XX_Count Maximum |
If you pick Average for a metric CloudWatch only stores as a count,
you get a cloudwatch_no_data result. The AWS Console under
CloudWatch → Metrics shows which statistics each metric
supports.
Common namespaces and metrics
A non-exhaustive starting list; the AWS Console is authoritative.
| Service | Namespace | Useful metrics |
|---|---|---|
| RDS | AWS/RDS | CPUUtilization, FreeableMemory, DatabaseConnections, ReadLatency, WriteLatency |
| Lambda | AWS/Lambda | Invocations, Errors, Duration, ConcurrentExecutions, Throttles |
| Application Load Balancer | AWS/ApplicationELB | RequestCount, HTTPCode_Target_5XX_Count, TargetResponseTime, HealthyHostCount |
| SQS | AWS/SQS | ApproximateNumberOfMessagesVisible, ApproximateAgeOfOldestMessage |
| API Gateway | AWS/ApiGateway | Count, 4XXError, 5XXError, Latency |
| ECS | AWS/ECS | CPUUtilization, MemoryUtilization |
For latency-sensitive services, prefer p95 / p99 statistics over Average. Average hides tail-latency regressions.
Cost considerations
CloudWatch GetMetricData is billed per metric retrieved (see the
AWS CloudWatch pricing page for current rates; at the time of
writing it's roughly $0.01 per 1,000 metrics retrieved). One
Observer metric definition issues one GetMetricData call per cron
tick, retrieving one metric.
Back-of-envelope at 60s period and 1 metric def: 43,200 calls per 30-day month. The agent batches dimensions inside one query but does not batch across metric defs; if you have 50 CloudWatch-backed metrics with 60s periods, you're at ~2.16M calls per month.
Two ways to keep the bill bounded:
- Raise the period for metrics that don't need 1-minute granularity. A 300s period cuts the call rate by 5×.
- Move stable metrics to Prometheus via a CloudWatch exporter. The exporter consolidates many CloudWatch metrics into one exporter scrape; Observer reads the exporter via Prometheus without per-metric CloudWatch billing.
Reason codes specific to CloudWatch
The reason field on no_data results carries one of:
cloudwatch_no_data: GetMetricData returned an empty value list. The metric is not publishing, or no data point exists in the lookback (5 periods).cloudwatch_access_denied: the agent's credentials cannot callGetMetricDataagainst this metric. Check the IAM policy on the role / user the agent assumes.cloudwatch_throttled: AWS is rate-limiting the agent. Raise the period, or split the metric across multiple agents in different AWS accounts.cloudwatch_invalid_parameter: the request was malformed. Common causes: a dimension name CloudWatch doesn't recognize, or a statistic the metric doesn't support.cloudwatch_resource_not_found: the namespace, metric name, and dimension combination doesn't exist (and never has) in this region and account.cloudwatch_expired_credentials: an STS session expired. The agent refreshes automatically; this should self-heal on the next tick.cloudwatch_server_error: AWS returned 5xx. Transient; usually clears within minutes.cloudwatch_error: an uncategorized error. Check the AWS service health dashboard.
Troubleshooting
Each entry leads with the symptom and the action to take.
cloudwatch_access_deniedand IAM policy looks right. Check the role's trust policy. The agent's identity must be a principal named in the target role'sAssumeRolePolicyDocument. Runaws sts get-caller-identityfrom the agent's host to confirm which identity it's using.cloudwatch_no_databut the metric is visible in the AWS Console. Check the dimensions exactly. CloudWatch matches on the full dimension set: a metric published with{DBInstanceIdentifier=prod-db}is not the same metric as{DBInstanceIdentifier=prod-db, EngineName=postgres}. The Console shows you the dimensions when you click a metric.cloudwatch_throttledrepeatedly. Multiple metric definitions on the same agent share an account-wide TPS limit. Either raise the period for non-critical metrics or split agents per account.- Wrong region. A metric in
eu-west-1is invisible from aus-east-1query. Each region has its own metric definition. - Cross-account read returns
cloudwatch_no_databut works for the IAM user directly. The assumed-role session inherits the role's policy, not the trust-policy principal's. Addcloudwatch:GetMetricDatato the target role's policy, not the source role's.
Known limits
- One metric per definition. This source does not batch across
metric defs. If you need 500 metrics in one API call, write a
custom dashboard against
GetMetricDatadirectly and forward aggregates as OTLP. - Static credentials are not stored. If you cannot use the AWS credential chain (env vars / instance role / IRSA), set the env vars on the agent process. Per-metric access keys would require an encryption + rotation surface that v1 deliberately omits.
- Two ways to discover metric names. The namespace + metric-name
inputs are pre-populated from a curated catalog covering the AWS
services most operators instrument (RDS, Lambda, ApplicationELB,
EC2, SQS, ApiGateway, ECS, DynamoDB, S3, CloudFront, SNS, Step
Functions, Kinesis, Network LB). For custom namespaces or AWS
services outside the catalog, click Fetch from AWS next to the
metric name input: the agent runs
cloudwatch:ListMetricsagainst the configured region (using its own AWS credentials, including any cross-account role ARN you set) and returns the live list within ~5 seconds. Click a row to fill the metric name + dimensions in one step.