instance or pod labels don’t often add value on their own,
but removing these labels from the client side isn’t always possible. You can
use rollup rules to avoid storing these labels.
Rollup rules support both Prometheus and Graphite metrics.
View rollup rules
Select from the following methods to view rollup rules.- Web
- Chronoctl
- API
In Observability Platform, view rollup rules in the
Aggregation rules UI.
Create a rollup rule
Select from the following methods to apply rollup rules. Observability Platform doesn’t limit the number of rollup rules a system can have.If you define a rollup rule using the Observability Platform app, you must download
the rule configuration and apply it with one of the supported methods.
- Web
- Chronoctl
- Terraform
- API
Create rollup rule configurations in Observability Platform from the
Aggregation rules UI.When creating a rule configuration, the Visual Editor displays by default.
When creating a rule in Metrics Analyzer, the dialog pre-populates fields based
on the user’s selected data.To create a rule configuration:
- Enter or edit data for the following fields:
- Rule Name: Add or edit the name of the rule.
- Rule Details: Either Rule Preview or Rule Enabled.
-
Matching Time Series: Time series the rule applies to. You must include a
Label, operator (
=or!=), and a Value. The value you enter maps to thefilterssection of the CreateRollupRule endpoint. For example, if you want the rollup rule to match on Prometheus gauge metrics, enter__m3_prom_type__as the label to match on, andgaugeas the value. The resulting filter looks like:Separate multiple values with a comma. You can use glob syntax, including matching multiple patterns with anOR, such asservice:{svc1,svc2}. Click Add to add another time series. - Labels to Roll Up: Discard Labels or Keep Labels. Add labels to the Input Labels text box.
-
Output Metric: The new metric’s name and aggregation configuration.
- Output Metric Name: Edit the output metric name. Clear the checkbox for Include metric name to remove the original name.
-
Input Metric Type: Select a
metric type, which
determines how the rollup rule interprets all matching data points. For
example, if you select Gauge, the rollup rule interprets all matching
data points as that data type, even if the original source isn’t a gauge
metric. This behavior means that the metric type you choose doesn’t have to
match the data type of the incoming data.
If you want to match the incoming metric to a specific type, enter two
matching time series in the rollup rule: one to match the metric, and
another to match the metric type. Use
__metric_type__to define the type of metric you want to match on. For example, if you want to match a time series namedagg_write_latencythat’s a cumulative exponential histogram, define two series that look like: - Aggregation: Select an aggregation operation.
- Sample Interval: The length of time between samples.
- Raw Data: Select the toggle to drop the raw input data after aggregation.
- When finished, click Code Config.
- Choose your rule creation method from these options:
- Chronoctl
- Terraform
- API
- Apply the changes based on your selected method.
Best practices for rule creation
Following these guidelines helps ensure your rollup rules work as intended:- Use Live Telemetry Analyzer to verify your glob syntax to ensure your query matches the correct metrics.
- Before using a rollup rule to group labels, be sure those labels aren’t used in other places, such as dashboards, monitors, or the queries you use to debug issues.
- Filters using curly braces (
{}) shouldn’t use a dash (-) in the filter for label names. The single filter identifies this as a range. For example,service_cluster: !{my-label}fails. Rewrite the filter toservice_cluster:!human-labelinstead. - Metrics can match more than one rule. Matching multiple rules can affect data
retention. If a rule matches any
drop_raw=true, raw metrics are dropped. - If a single output series receives more than 10 million unique input series, Observability Platform might stop accepting new input series specified in the rollup rule, which could result in partially aggregated metrics. To avoid this behavior, choose a label policy that writes more output series by removing fewer labels.
Chronoctl rollup rule example
Here’s an example of a rollup rule that matches time series with the valuepermits_blocked, while discarding any labels matching instance and job. It uses
a counter type metric, and aggregates as a sum using a 30-second interval.
Terraform rollup rule example
Here’s an example of a rollup rule that matches time series with the valuepermits_blocked, while discarding any labels matching instance and job.
It uses a counter type metric, and aggregates as a sum using a 30-second interval.
Delete a rollup rule
- Chronoctl
- Terraform
- API
To delete rollup rules with Chronoctl, use this command:Replace If your slug starts with a dash (
SLUG with the rule’s slug.For example, to delete the http_request_duration_by_service_and_status rule,
use this command:-), use double quotes (") around the slug
name.Rollup rule attributes
To accurately aggregate your data, rollup rules require you to both configure multiple fields and to have an understanding of aggregation operations. See the CreateRollupRule API documentation for the complete list of fields that are part of therollup_rule
object that you define when creating a rollup rule with any
of the supported methods.
Label policies
Use label policies to define which labels to preserve in the resulting metric. In the rollup rule definition, add the appropriate field to specify which labels to retain or discard.You can set only one of
group_by or exclude_by per rollup rule. Graphite metrics
support only the exclude_by rule type.Keep specified labels
To aggregate only metrics that contain all of the specified labels and discard all other labels, usegroup_by (Terraform) or keep. When using these rollup rules,
you must specify the labels to aggregate the metrics by. If a metric doesn’t include
all of the specified labels, the metric isn’t included in the rule.
If a rollup rule uses group_by or keep, the rule will match only metrics with
labels that contain these fields, even if the label filters would have matched
these metrics.
Remove specified labels
To target a group of metrics for a particular service, team, or other higher-level set of metrics, useexclude_by (Terraform) or discard. When using these rollup
rules, you specify which labels to remove from the aggregated metric, while keeping
all other labels.
Set a Graphite label policy
For Graphite metrics, you can use thegraphite_label_policy parameter to also
set a Graphite-specific label policy. This lets you define replacements for label
values without changing their positions, which can reduce cardinality without breaking
Graphite metrics’ preferred positional indexing.
For example, assume you have raw metric names that follow this pattern:
__g3__) with a new string value (INSTANCE).
This replacement aggregates these metrics as
cluster.production.instance.INSTANCE.requests_count, without changing their positional
indexing.
- Chronoctl
- Terraform
The output of the To implement the rule from the example scenario as a Chronoctl YAML resource, define
the Define multiple replacements in a single rollup rule by adding more pairs of
chronoctl rollup-rules scaffold command includes the graphite_label_policy
parameter:name and new_value in the list of replace values:name
and new_value to the replace list.Aggregation operations
Some operations can change the type of the metric during aggregation. The resulting metric type of an aggregation is called the output metric type. Even if you are ingesting data with the wrong metric type, configure your rollup rule with the metric type that the ingested data should be. For example, if Chronosphere Observability Platform ingests metrics with typeGAUGE, but the values actually
represent DELTA_COUNTER, use a metric_type=DELTA_COUNTER rollup rule to aggregate
them.
Rollup rules support the following aggregation operations:
CUMULATIVE_COUNTER
Cumulative counters support these aggregations:
-
SUM: Takes the increase of each individual input series within the configured interval, then sums the increases together according to the configured label policy. The output is the cumulative summed increase across all input series. -
COUNT: Counts the number of unique input series matched by the configured label policy (for example, cardinality).
CUMULATIVE_COUNTER.
GAUGE
Gauges support the following aggregation methods:
-
SUM: Takes the max value of each individual input series within the configured interval, then sums all final values together by the configured label policy. -
COUNT: Counts the number of unique input series matched by the configured label policy (for example, cardinality). -
MIN: Takes the minimum value of all data points within the configured interval across all series matched by the configured label policy. -
MAX: Takes the maximum value of all data points within the configured interval across all series matched by the configured label policy. -
PXX,MEAN,MEDIAN,STDEV,SUMSQ: Takes the maximum value of each individual input series within the configured interval, and then computes the value distribution.
GAUGE.
When querying a gauge metric with a range vector included in the query
downsampling might impact the accuracy of the query result. Most use cases that
fit this criteria can be converted to use counters instead, which avoids the issue.
DELTA_COUNTER
Supported aggregations:
-
SUM: Sums all values of all series matched by the configured label policy. All values must be nonnegative. -
COUNT_SAMPLES: Counts the number of input samples matched by the configured label policy.
DELTA_COUNTER.
Exceptions for DELTA_COUNTER metrics
DELTA_COUNTER metrics don’t require the following fields for rollup rules:
nameaggregationkeepdiscard
MEASUREMENT
A key feature of MEASUREMENT aggregations lies in how they treat individual
samples. Unlike other types such as GAUGE
and CUMULATIVE_COUNTER,
MEASUREMENT metrics aggregate all at once, across all samples of your matching
time series within the aggregated time interval. This enables calculation of
accurate statistics server-side, within Observability Platform.
A typical use case for MEASUREMENT aggregations is calculating statistics across
raw request latencies across all instances. This can be correctly performed through
metric_type=MEASUREMENT and aggregation=P95. Using metric_type=GAUGE in this
scenario produces results you don’t want, discarding all samples except the
per-instance max value, then computing the ninety-fifth percentile across these
per-instance max values.
Measurements support all aggregation methods:
-
SUM: Sums all values of all series matched by the configured label policy. All values must be nonnegative. The output metric type is aDELTA_COUNTER. -
COUNT_SAMPLES: Counts the number of input samples matched by the configured label policy. The output metric type is aDELTA_COUNTER. -
LAST: Takes the last value of all samples matched by the configured label policy. The output metric type is aGAUGE. -
MIN: Takes the minimum value of all samples matched by the configured label policy. The output metric type is aGAUGE. -
MAX: Takes the maximum value of all samples matched by the configured label policy. The output metric type is aGAUGE. -
PXX,MEAN,MEDIAN,STDEV,SUMSQ: Computes the value distribution across all samples matched by the configured label policy. The output metric type is aGAUGE. -
HISTOGRAM: Summarizes the distribution of values as an exponential histogram with a scale of 3. The output type is aDELTA_EXPONENTIAL_HISTOGRAM.
Histograms aggregation operations
If either the input histogram or resulting aggregation exceeds the 160-bucket limit,
Observability Platform decreases the exponential histogram scale until the bucket
count is within the limit. Downscaling reduces the exponential histogram’s resolution.
CUMULATIVE_EXPONENTIAL_HISTOGRAM
Cumulative exponential histogram aggregations operate on OpenTelemetry exponential
histograms with cumulative temporality, and on Prometheus native histograms with
an exponential bucket layout.
Cumulative exponential histograms support this aggregation method:
SUM: Merges input cumulative exponential histograms by the configured label policy. The output metric type is aCUMULATIVE_EXPONENTIAL_HISTOGRAM.
DELTA_EXPONENTIAL_HISTOGRAM
Delta exponential histogram aggregations operate on OpenTelemetry exponential histograms
with delta temporality.
Delta exponential histograms support this aggregation method:
SUM: Merges input delta exponential histograms by the configured label policy. The output metric type is aDELTA_EXPONENTIAL_HISTOGRAM.