Ingest AWS CloudWatch metrics
Chronosphere Observability Platform supports receiving AWS CloudWatch metrics through CloudWatch Metrics Streams (opens in a new tab). You can configure Amazon Web Services (AWS) to continually stream metrics to Observability Platform by configuring CloudWatch Metric Streams, either manually in the AWS Management Console or by using Terraform.
Metric naming conventions
Metric names of ingested CloudWatch metrics in Observability Platform follow the prefix naming pattern:
<namespace>_<MetricName>_<statistic>
-
<namespace>
: The namespace is lowercased, and Observability Platform replaces all forward slash (/
) and period (.
) characters in the CloudWatch namespace with underscores (_
).All AWS service namespaces follow the naming convention
AWS/<ServiceName>
, where<ServiceName>
is replaced with the service name. In Observability Platform, the ingested metrics therefore begin withaws_<servicename>
. For a list of AWS services and their respective namespaces, see AWS services that publish CloudWatch metrics (opens in a new tab).If you create custom metrics, the namespace you set for the metric correspondingly becomes the metric name prefix in Observability Platform.
-
<MetricName>
: Observability Platform preserves the CloudWatch metric name's case. -
<statistic>
: Observability Platform appends the CloudWatch statistic's name (count
,sum
,maximum
,minimum
,average
). If you define additional statistics for a metric, Observability Platform appends the corresponding CloudWatch metric statistic name (pXX
).
For examples, see Example metric names.
Metric labeling conventions
Observability Platform adds CloudWatch metric dimensions as labels to the time series
following the pattern dimension_<DimensionName>
. For examples, see
Example metric names.
Stream resource attributes
CloudWatch Metric Streams include OpenTelemetry Protocol (OTLP) resource attributes,
which Observability Platform merges into the time series. Observability Platform
replaces periods (.
) with underscores (_
) in attribute key names.
Amazon Data Firehose includes the following resource attributes in every post:
aws_exporter_arn
: The Amazon Resource Name (ARN) of the CloudWatch Metric Stream, which serves as the unique metric writer instance identifier.cloud_account_id
: The account ID of the Amazon Data Firehose sending the stream, such as123456789
.cloud_provider
: The value is alwaysaws
.cloud_region
: The AWS region of the Amazon Data Firehose sending the stream, such asus-east-2
.
For examples, see Example metric names.
Add custom resource attributes using stream parameters
You can define custom key:value pairs as parameters for Amazon Data Firehose to include in each HTTP call. Observability Platform treats all additional parameters as resource attributes and merges them into the time series.
Your custom parameters take precedence over the default CloudWatch metrics resource attributes. To avoid accidentally overwriting CloudWatch-provided values, do not add custom parameters with key names that conflict with default CloudWatch key names.
You can configure additional parameters using either the AWS Management Console or Terraform.
To configure parameters using Terraform:
-
Edit the Terraform module.
-
In the
aws_kinesis_firehose_delivery_stream
resource definition, modify therequest_configuration
block to define additionalcommon_attributes
. The AWS Management Console names this setting Parameters, while the API name iscommon_attributes
.For example, this configures the
content_encoding
parameter toGZIP
and defines twocommon_attributes
,testname
andtestname2
:request_configuration { content_encoding = "GZIP" common_attributes { name = "testname" value = "testvalue" } common_attributes { name = "testname2" value = "testvalue2" } }
Example metric names
Given a CloudWatch metric with the following attributes:
- The namespace
AWS/EBS
(AWS Service) - The metric name
VolumeReadBytes
- The dimension
VolumeId
- The custom Firehose destination parameter
environment
- The CloudWatch metric resource attributes
aws_exporter_arn
,cloud_account_id
,cloud_provider
, andcloud_region
Observability Platform creates metrics with these names and labels:
aws_ebs_VolumeReadBytes_count{dimension_VolumeId="xyz", aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
aws_ebs_VolumeReadBytes_sum{dimension_VolumeId="xyz", aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
aws_ebs_VolumeReadBytes_maximum{dimension_VolumeId="xyz", aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
aws_ebs_VolumeReadBytes_minimum{dimension_VolumeId="xyz", aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
aws_ebs_VolumeReadBytes_average{dimension_VolumeId="xyz", aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
Given a CloudWatch metric with the following attributes:
- The namespace
Buildkite
(custom metrics) - The metric name
RunningJobsCount
- The custom Firehose destination parameter
environment
- No dimension
- CloudWatch resource metric attributes
aws_exporter_arn
,cloud_account_id
,cloud_provider
, andcloud_region
Observability Platform creates metrics with these names and labels:
buildkite_RunningJobsCount_count{aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
buildkite_RunningJobsCount_sum{aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
buildkite_RunningJobsCount_maximum{aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
buildkite_RunningJobsCount_minimum{aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
buildkite_RunningJobsCount_average{aws_exporter_arn="abc123", cloud_account_id="123", cloud_provider="aws", cloud_region="us-east-2", environment="staging"}
Drop CloudWatch Metric Stream metrics
When you ingest CloudWatch Metric Streams, you generate metrics that consume some of your Standard Metrics License capacity. To determine how this might affect license consumption, configure a drop rule before configuring CloudWatch Metric Stream ingestion.
Create rules to drop CloudWatch Metrics
This example Chronoctl YAML resource definition creates a drop rule that drops all metrics from CloudWatch Metric Streams except for metrics about the Metric Stream itself.
api_version: v1/config
kind: DropRule
spec:
slug: drop-cloudwatch-metric-stream-metrics
name: Drop CloudWatch Metric Stream metrics
mode: ENABLED
filters:
- name: aws_exporter_arn
value_glob: arn:aws:cloudwatch:*
- name: Namespace!
value_glob: '{AWS/CloudWatch/MetricStreams}'
You can modify the rule to allow additional metrics from additional AWS namespaces. This example allows all metrics from the CloudWatch Metric Streams and AWS ECS namespaces.
api_version: v1/config
kind: DropRule
spec:
slug: drop-cloudwatch-metric-stream-metrics
name: Drop CloudWatch Metric Stream metrics
mode: ENABLED
filters:
- name: aws_exporter_arn
value_glob: arn:aws:cloudwatch:*
- name: Namespace!
value_glob: '{AWS/CloudWatch/MetricStreams,AWS/ECS}'
You can also configure CloudWatch Metric Streams to include or exclude specific namespaces to reduce AWS costs associated with the streaming of unwanted metrics.
- In the AWS Management Console, go to Streams.
- Edit the Metric Stream.
- Under Metrics to be streamed, include or exclude namespaces.
View drop rule metrics
To view how many data points per second that Observability Platform is dropping with the example CloudWatch Metric Streams drop rule, use the following PromQL query:
sum by (policy_name) (rate(chrono_policies_count{dropped="yes", policy_name="Drop CloudWatch Metric Stream metrics"}[5m]))
Configure CloudWatch Metric Streams
The following diagram shows the architecture and data flow from your AWS account to Chronosphere. In each of your AWS regions where you want to stream data from, a CloudWatch Metric Streams instance sends data to an AWS Data Firehose, which forwards that data to the AWS Data Firehose ingest endpoint running in your Observability Platform tenant.
Observability Platform processes the CloudWatch metrics and makes them available for use in queries, monitors, and dashboards.
CloudWatch roles and permissions
To use CloudWatch Metric Streams in Observability Platform, you must configure a
CloudWatch Metric Stream in each AWS account and region. The account you use to set
up the CloudWatch Metric Stream must either have the CloudWatchFullAccess
policy
and iam:PassRole
permission, or it must have the following list of permissions:
iam:PassRole
cloudwatch:PutMetricStream
cloudwatch:DeleteMetricStream
cloudwatch:GetMetricStream
cloudwatch:ListMetricStreams
cloudwatch:StartMetricStreams
cloudwatch:StopMetricStreams
iam:CreateRole
iam:PutRolePolicy
Observability Platform authentication
You must also create or use the API token of an Observability Platform restricted service account with write-only permission. For more information, see Create a restricted service account.
You must also provide your Observability Platform organization name, which is the
name of the subdomain that you use to access Observability Platform. For example, if
your team uses example.chronosphere.io
, your team's organization name is example
.
AWS resources and IAM roles
The AWS setup process automatically creates the following resources and IAM roles as part of creating a metrics stream:
AWS resources
- S3 Bucket: A bucket will be created to store data processed by the Kinesis Firehose delivery stream.
- CloudWatch Log Group: A log group will be created to capture logs related to the Kinesis Firehose delivery stream.
- Kinesis Firehose Delivery Stream: A Kinesis Firehose delivery stream will be created with configurations to send data to Observability Platform through an HTTP endpoint, and store backup data in the S3 bucket.
- IAM Role for S3: An IAM role will be created with the following permissions
for the Kinesis Firehose to access the S3 bucket and CloudWatch Logs:
s3:AbortMultipartUpload
s3:GetBucketLocation
s3:GetObject
s3:ListBucket
s3:ListBucketMultipartUploads
s3:PutObject
logs:PutLogEvents
IAM roles
An IAM role to allow CloudWatch Metric Streams to publish data to the Kinesis Firehose delivery stream will be created with the following permissions:
firehose:PutRecord
firehose:PutRecordBatch
Apply the configuration
You can configure CloudWatch Metric Streams either manually in the AWS Management Console or by using Terraform.
Before configuring metric ingestion, you can set up a drop rule to drop all metrics sent by CloudWatch Metric Streams. Doing this avoids unexpected license consumption changes. For examples, see Drop CloudWatch Metric Stream metrics.
-
Set values for the following environment variables, and modify the following Terraform data, and resources to apply the required settings.
# Variables variable "chronosphere_org_name" { type = string description = "The name of your Observability Platform organization, which is the subdomain name before .chronosphere.io." } variable "chronosphere_api_token" { type = string sensitive = true description = "The API token for an Observability Platform Restricted Service Account with write-only permission." } variable "failed_data_bucket_name" { type = string description = "The name of the S3 bucket to create to store data that couldn't be delivered to Observability Platform. If not specified, a random name will be generated." default = "" } variable "common_resource_attributes" { type = map(string) description = "Key-value pairs to apply as OpenTelemetry Resource Attributes on all metrics in this stream." default = {} } # IAM Policy Documents data "aws_iam_policy_document" "kinesis-firehose-stream-role-trust-policy" { statement { sid = "AllowRoleAssumptionByKinesisFirehose" effect = "Allow" actions = ["sts:AssumeRole"] principals { type = "Service" identifiers = ["firehose.amazonaws.com"] } } } data "aws_iam_policy_document" "kinesis-firehose-stream-role-s3-policy" { statement { sid = "AllowFirehoseS3Access" effect = "Allow" actions = [ "s3:AbortMultipartUpload", "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUpload", "s3:PutObject" ] resources = [ aws_s3_bucket.kinesis-firehose-stream-failed-data.arn, "${aws_s3_bucket.kinesis-firehose-stream-failed-data.arn}/*" ] } } data "aws_iam_policy_document" "cloudwatch-metric-stream-role-trust-policy" { statement { sid = "AllowRoleAssumptionByloudWatchMetricStream" effect = "Allow" actions = ["sts:AssumeRole"] principals { type = "Service" identifiers = ["streams.metrics.cloudwatch.amazonaws.com"] } } } data "aws_iam_policy_document" "cloudwatch-metric-stream-role-firehose-policy" { statement { sid = "AllowCloudWatchFirehoseAccess" effect = "Allow" actions = [ "firehose:PutRecord", "firehose:PutRecordBatch" ] resources = [ aws_kinesis_firehose_delivery_stream.kinesis-firehose-stream.arn ] } } # S3 resource "random_id" "default_bucket_name_suffix" { byte_length = 8 } resource "aws_s3_bucket" "kinesis-firehose-stream-failed-data" { bucket = var.failed_data_bucket_name != "" ? var.failed_data_bucket_name : "chronosphere-cw-stream-failed-data-${random_id.default_bucket_name_suffix.hex}" } resource "aws_s3_bucket_public_access_block" "kinesis-firehose-stream-failed-data" { bucket = aws_s3_bucket.kinesis-firehose-stream-failed-data.id block_public_acls = true block_public_policy = true ignore_public_acls = true restrict_public_buckets = true } resource "aws_s3_bucket_server_side_encryption_configuration" "kinesis-firehose-stream-failed-data" { bucket = aws_s3_bucket.kinesis-firehose-stream-failed-data.id rule { apply_server_side_encryption_by_default { sse_algorithm = "AES256" } bucket_key_enabled = false } } resource "aws_s3_bucket_lifecycle_configuration" "kinesis-firehose-stream-failed-data" { bucket = aws_s3_bucket.kinesis-firehose-stream-failed-data.id rule { id = "Cleanup" expiration { days = 90 } status = "Enabled" } } # Kinesis Firehose Delivery Stream resource "aws_iam_role" "kinesis-firehose-stream-role" { name = "cloudwatch-firehose-stream-role" assume_role_policy = data.aws_iam_policy_document.kinesis-firehose-stream-role-trust-policy.json tags = { Name = "cloudwatch-firehose-stream-role" } } resource "aws_iam_role_policy" "kinesis-firehose-stream-role-s3-policy" { name = "KinesisFirehose-S3Access" role = aws_iam_role.kinesis-firehose-stream-role.id policy = data.aws_iam_policy_document.kinesis-firehose-stream-role-s3-policy.json } resource "aws_kinesis_firehose_delivery_stream" "kinesis-firehose-stream" { name = "chronosphere-cloudwatch-metric-stream" destination = "http_endpoint" http_endpoint_configuration { name = "chronosphere-http-endpoint" url = "https://${var.chronosphere_org_name}.chronosphere.io/data/metrics/api/v1/cloudwatch/firehose" access_key = var.chronosphere_api_token buffering_size = 1 # MiB buffering_interval = 60 # seconds role_arn = aws_iam_role.kinesis-firehose-stream-role.arn s3_backup_mode = "FailedDataOnly" retry_duration = 300 # seconds s3_configuration { role_arn = aws_iam_role.kinesis-firehose-stream-role.arn bucket_arn = aws_s3_bucket.kinesis-firehose-stream-failed-data.arn buffering_size = 10 # MiB buffering_interval = 300 # seconds compression_format = "GZIP" } request_configuration { content_encoding = "GZIP" dynamic "common_attributes" { for_each = var.common_resource_attributes iterator = attribute content { name = attribute.key value = attribute.value } } } } server_side_encryption { enabled = true } } # CloudWatch Metric Stream resource "aws_iam_role" "cloudwatch-metric-stream-role" { name = "cloudwatch-metric-stream-role" assume_role_policy = data.aws_iam_policy_document.cloudwatch-metric-stream-role-trust-policy.json tags = { Name = "cloudwatch-metric-stream-role" } } resource "aws_iam_role_policy" "cloudwatch-metric-stream-role-firehose-policy" { name = "MetricStreams-FirehosePutRecords" role = aws_iam_role.cloudwatch-metric-stream-role.id policy = data.aws_iam_policy_document.cloudwatch-metric-stream-role-firehose-policy.json } resource "aws_cloudwatch_metric_stream" "cloudwatch-metric-stream" { name = "chronosphere-metric-stream" role_arn = aws_iam_role.cloudwatch-metric-stream-role.arn firehose_arn = aws_kinesis_firehose_delivery_stream.kinesis-firehose-stream.arn output_format = "opentelemetry1.0" }
Verify CloudWatch Metric Stream ingestion
After setup, data can take from 5 to 10 minutes to arrive to Observability Platform. To verify functionality, check the operational dashboards in the AWS Management Console for the Metric Stream and Amazon Data Firehose.
Check the CloudWatch Metrics Ingestion & Health dashboard
The CloudWatch Metrics Ingestion & Health dashboard displays operational information about the health of your CloudWatch Metrics Streams integration with Observability Platform.
The CloudWatch Metric Streams and Data Firehose panel groups rely on
CloudWatch metrics sent from those services to Observability Platform. To populate
these charts, include metrics from the AWS/Firehose
and
AWS/CloudWatch/MetricStreams
namespaces in your CloudWatch Metrics Streams
configuration.
-
In Observability Platform, go to Dashboards.
-
In the search bar, enter CloudWatch, and then click on the CloudWatch Metrics Ingestion & Health dashboard.
The Observability Platform metrics ingestion panel group displays information about the CloudWatch metrics Observability Platform received.
-
Check the Data Firehose records received by Amazon Resource Name chart to confirm that the ingestion API received Data Firehose records.
-
Check the CloudWatch metric updates received chart to confirm the number of CloudWatch metric updates Observability Platform extracts from the Data Firehose records.
-
Confirm that the Transformed metrics chart shows no Rejected Data Points
-
Check the Unique time series by AWS metric namespace chart to confirm that metrics from the AWS namespaces you want are in the stream that Observability Platform received.
Check the Metric Stream Dashboard
In the AWS Management Console, check the Metric Updates chart for specific metrics to validate that metrics are streaming.
- In the AWS Management Console, go to CloudWatch > Metric Streams.
- Click
chronosphere-cloudwatch-metric-stream
to view the status and operational statistics. - Verify that the Status is
Running
. - Verify how many updates have been sent in the Metric Updates chart. If the stream is working, the chart should report a non-zero number of updates.
- Verify whether any errors were reported in the Errors chart. The value should be 0.
Check the Amazon Data Firehose status
In the AWS Management Console, check the status of several charts to ensure that metrics are streaming.
- In the AWS Management Console, go to Amazon Data Firehose > Firehose Streams.
- Select
PUT-CW-STREAM-CHRONOSPHERE
to view status and operational statistics for the Amazon Data Firehose. - The Incoming bytes, Incoming put requests, and Incoming records charts should all report non-zero values.
- The HTTP endpoint delivery success chart should report a 100% successful metric count.
- The Records delivered to HTTP endpoint chart should report a non-zero value.
Query metrics about AWS CloudWatch Metric Streams
You can also query for metrics about AWS CloudWatch Metric Streams in Observability
Platform. These metrics won't appear in the Observability Platform Metrics Explorer
if you've defined a drop rule to drop all AWS metrics. Modify the rule to allow some
metrics, such as all metrics from the AWS/CloudWatch/MetricStreams
namespace. For
examples, see Drop CloudWatch Metric Stream metrics.
- In Observability Platform, go to Explore > Metrics Explorer.
- In the query box, enter
aws_cloudwatch_metricstreams
`to view a list of AWS metrics received from the CloudWatch Metric Stream. - Select the metric you want to query to add it to the query prompt.
You can then write a query around the selected metric. For example, run this query to report the rate of CloudWatch metric updates:
rate(aws_cloudwatch_metricstreams_MetricUpdate_sum[5m])