Recording rules

Recording aggregation rules primarily improve performance on commonly run queries. Recording rules, which run on a fixed interval and save their results into new time series, ingest raw metric data into the database before reading it. They then produce the aggregated and downsampled metric data, and add it back into the database.

⚠️

Recording rules support Prometheus metrics only.

Configure a recording rule with a PromQL statement executed against the metrics data with the result stored in a new time series with a unique metric name. PromQL statements in recording rules can include any PromQL function.

Rule fields:

  • expr: The PromQL expression to evaluate.
  • interval: How often to evaluate the rule. Default is 1 minute.
  • labels: Specify label names to add to the output metric. If you attempt to add an already added label, the label isn't added a second time. For example, add (instance123:instance).
  • name: The name of the rule. If metric_name is set, this is the human-readable name. Otherwise, it's the time series to output to.
  • metric_name: The time series to output to.
  • slug: The slug for the rule. This can't change after rule creation.

Recording rules natively support adding labels to the resultant aggregated metrics. Rollup rules don't support adding labels to aggregated metrics. They also require using either a Prometheus relabel rule, or a derived metric with a label_replace function in conjunction with the rollup rule, to accomplish the same goal.

Due to architectural differences between Chronosphere and Prometheus, defining recording rules is sometimes different, especially for expensive recording rules that span many metrics.

Chronosphere uses a single data store. To enhance performance, use the following recommendations:

  • Break up the recording rules to scope to different clusters, or another label that scopes your metrics.
  • Use the metric_name field so they all get written back into the same name.

With a Prometheus or Thanos setup, Chronosphere recommends scoping the rules to the local Prometheus server (opens in a new tab) to avoid cross-Prometheus queries.

If you have issues with late-arriving data, consider using rollup rules instead.

In Prometheus, it's possible to chain recording rules by ordering them within a group.

Users cannot modify Terraform-managed resources in the Chronosphere app, with Chronoctl, or by using the API. Learn more.

The following YAML includes three rules that calculate the per-second average rate of increase for jobs that have a value for node per instance and container as measured over one minute. The example uses metric_name for the output name of the time series and name for the human readable name. For backwards compatibility, name is used for the time series to output to if metric_name isn't specified (see the third rule).

api_version: v1/config
kind: RecordingRule
spec:
  name: cpu-usage-seconds-sum-rate-1m
  slug: instance-container-cpu-usage-seconds-sum-rate1m
  prometheus_expr: sum(rate(container_cpu_usage_seconds_total{node!=""}[1m])) by (instance,
    container)
  metric_name: instance_container:cpu_usage_seconds:sum_rate1m
  interval_secs: 60
  label_policy:
    add:
      resource: cpu
---
api_version: v1/config
kind: RecordingRule
spec:
  name: network-receive-bytes-sum-rate-1m
  slug: instance-container-network-receive-bytes-sum-rate1m
  metric_name: instance_container:network_receive_bytes:sum_rate1m
  prometheus_expr: sum(rate(container_network_receive_bytes_total{node!=""}[1m])) by (instance,
    container)
  interval_secs: 60
  label_policy:
    add:
      resource: network-receive
---
api_version: v1/config
kind: RecordingRule
spec:
  name: instance_container:network_transmit_bytes:sum_rate1m
  slug: instance-container-network-transmit-bytes-sum-rate1m
  prometheus_expr: sum(rate(container_network_transmit_bytes_total{node!=""}[1m])) by (instance,
    container)
  interval_secs: 60
  label_policy:
    add:
      resource: network-transmit