Reduce cardinality
When you’re first using Chronosphere Observability Platform, or a new app or service comes online, you might see cardinality spikes. Cardinality spikes can occur when:
- A metric or group of metrics has unexpectedly large numbers of labels.
- A process or service creates many similarly named metrics.
Learn more about cardinality from these Chronosphere Blog posts:
Cardinality spikes can cause storage and licensing issues. To reduce cardinality, or data storage for less important metrics:
- Find a problematic metric or label.
- Review that metric or label’s usage.
- Decide what to do with it (drop, rollup).
Find a metric and inspect the associated labels
Observability Platform provides the following tools to help you understand the impact of metric growth, identify problematic metrics and labels, and assess the impact of existing aggregation rules:
- The Metric Growth dashboard can includes metrics and labels that have recently increased in cardinality.
- The Live Telemetry Analyzer provides real-time insight into current incoming metrics. Sort metrics by Unique value to find potential high cardinality.
- The Aggregation Rules UI visualizes existing shaping rules and how they affect your environment. Review these rules to understand their impact.
If you want to reduce the cardinality of a metric, you must first understand the targeted metric and its associated labels.
If you have administrative privileges:
- In the navigation menu, click Go to Admin and then select Analyzers > Live Telemetry.
- The analyzer defaults to
_name_
. Sort Label values by name, or add a label filter. - In the Labels section, inspect the incoming label keys. The Unique Values column shows how many distinct values are incoming for a given label (cardinality), and Appears In shows how frequently that label is attached to the metric.
When the number of unique values for a metric is high, that label contributes significantly to the cardinality for the metric.
Review metric and label usage
After identifying a high-cardinality label, you need to understand whether this label is meaningful, or if it can be safely removed.
To verify dropping a label is safe, use one of the following methods:
-
The Telemetry Usage Analyzer displays a Utility score, providing insight into which metrics users find important.
-
Chronoctl includes a search command to filter previously defined configurations for references to a metric regular expression. This lets you determine whether a metric is referenced within the organization, and where it’s used.
Remove the identified label
If you identify a label that isn’t used in any dashboards or alerts, consider reducing or removing the label using these methods:
- Create drop, mapping, or rollup rules to reduce stored metrics by aggregating, downsampling, or dropping unneeded metric data.
- Use the Recommendations page to identify metrics and labels with no usage or utility over the past 30 days. Apply the suggested recommendations to reduce the impact on persisted writes and persisted cardinality.
Validation
For rollup rules, preview the shaping impact to review and confirm your changes before deleting metrics and labels that still matter.
Post validation tasks
Return to the Live Telemetry Analyzer and search for your metric. If you’ve used a rollup rule, it can take some time before your rolled up metric appears.
For rolled up metrics, it often makes sense to drop raw data if that data isn’t needed. This reduces cardinality and data storage requirements.
After you’ve validated your rule, apply the rule using your selected method.