Analyze usage metrics
The Telemetry Usage Analyzer lets you view all metrics in your system ranked from least-used to most-used, alongside information about the persisted Data Points Per Second (DPPS) and cardinality of each metric. You can sort the list to find unused metrics to drop or roll up, and also find highly requested but not yet ingested metrics you can add to the system to help boost valuable signal.
Usage analysis gives you insight into the usage of the data in your system whether by dashboard, monitor, shaping rule, or Explorer query execution, so you can more confidently identify never-used data, make decisions about the shape of that data, and understand the impact of a proposed shaping rule to users of that data.
The Telemetry Usage Analyzer supports only Prometheus metrics.
In the navigation menu select Exploring > Telemetry Usage Analyzer.
Use multiple filters to search for subsets of metrics. For example, to find metrics that have high data points but aren't important to any of your users, filter by Highest persisted and then sort the table by Utility score.
The Metrics or Labels section is a list of recorded items.
Click Group by metric or Group by label to group your data. Your screen display varies, depending on the group choice selected.
Sort metrics and labels by the following values:
- Least valuable: Metrics with high DPPS but low utilization.
- Most valuable: Metrics with a high utilization to DPPS ratio.
- Most utilized: Metrics and labels with the highest utility score.
- Least utilized: Metrics and labels with the lowest utility score.
- Highest persisted: (Metrics only) Metrics with the most DPPS.
- Lowest persisted: (Metrics only) Metrics with the fewest DPPS.
- Missing metrics: (Metrics only) Metrics with both zero DPPS and the highest utility scores.
- Highest cardinality: (Labels only) Metrics with the highest number of unique labels.
- Lowest cardinality: (Labels only) Metrics ith the fewest unique labels.
Each card in the list contains the metric name or label, and other values defined in the Summary.
Usage data is based on activity from the previous 30 days to capture both current and cyclical usage patterns. Click a metric or label name to view a Summary of details about that item.
The Summary section contains a set of cards with sum totals for usage details. The cards and table vary slightly depending on what Group by you've selected.
The total of associated metrics or labels won't always match the total for the Summary cards. Summary cards are based on the selected metric or label, each of which can have multiple references in its group.
- Associated Label Keys: All labels associated with the metric name.
- Associated Metric: Each metric that uses this label.
- Configuration References: Number of times this metric appears in a dashboard, monitor configuration, or shaping rule in the previous 30 days.
- Direct Query Executions: The number of times in the past 30 days the label was used as part of a backend query, such as an Explorer query, a query executed from a dashboard panel, or a query from an external source.
- Unique or Unique Users: The number of individual users querying this metric.
- Utility Score: An aggregate number indicating relative usefulness of this metric, determined by the number of References, Query Executions, and unique users across internal and external queries. A higher score means users include this metric and its associated labels in their workflows. For example, two metrics with the same number of executions but different numbers of unique users have different utility scores. Metrics with more unique users have a higher utility score.
- Metric DPPS or DPPS: Total data points per second (DPPS) for this metric. DPPS is the average DPPS over the previous five minutes to capture the current volume.
- Unique Label Values or Unique Values: The number of unique entries for this label.
- Appears in: The percentage of incoming metrics using the selected label.
Click the three vertical dots icon at the end of each line to:
- View Usage Details for that metric or label.
- Analyze Label or Analyze Metric, depending on the selected grouping.
Clicking an analyze option selects that label or metric in the Metrics and Labels section, starting a search for the corresponding data.
For example, if you have a metric with the label
error_type, and you want to find
other metrics using that label:
- Select the three vertical dots icon at the
end of the row for
- Click Analyze Label.
The Summary table displays a list of all Associated metrics for that label by adding the label as a search term to the labels list in the left sidebar.
Use the Search box to find one or more associated labels for a selected metric, or to find metrics associated with a selected label.
Each metric and each label associated with a metric has Usage Details.
For overall details about a metric, in the body of the page, next to the metric name, click Usage Details. A dialog for specific details for this metric and where it's used displays.
You can select a value from the Label menu to filter to a specific label.
You can also click Usage Details next to an Associated Label Key to open the dialog, filtered to that label.
For each label, the following details display, with a numerical count of occurrences:
- Configuration References: Configurations with the metric and label explicitly referenced. These include Dashboards, Monitors, Recording Rules, Drop Rules, and Aggregator Rules.
- Direct Query Executions: Query executions with the metric and label explicitly referenced. These include Metrics Explorer, External Sources, Dashboards,and Unique Users.
- Utility Score: An aggregate number indicating the relative usefulness of this metric, determined by the number of References, Executions, and unique users. A higher score means users include this metric in their workflows.
- Metric DPPS: Cached data points per second for the last five minutes.
- Unique Label Values: Selecting a label displays the number of unique values that label has.
These details are also broken down by where and how many times they're used:
- Dashboard and Monitor configurations show the number of references, when they were Added, and their name. Click the name to open that dashboard or monitor. Queries from Grafana dashboards include the UUID of the Grafana dashboard.
- Recording, Drop, and Aggregator rule configurations show when they were Added or Created, and the Metric Name.
- Metrics Explorer and Dashboard direct queries show how many Executions, the Query, Date, and User. Dashboards also display the Dashboard and Panel a query executed from.
- Unique Users show the User, the Total Executions, Dashboard Executions, and Explorer Executions.
When you select Direct Query Executions from Metrics Explorer or Dashboards, a chart displays showing the distribution of queries made across the selected time range. Hold the pointer over a bar in the graph to display a tool tip with the number of queries for that day.
In the User search box, enter a user's email address to filter query executions to a particular user.
Click a date bar in the chart to filter the table to queries made on that date. A
date filter displays next to the user search box, followed by the number of executions
for that date out of the total executions. To remove the filter, click the
You can analyze the incoming data for a specific metric. Select a metric from the Metrics list, and then click View Incoming Metrics . The Live Telemetry Analyzer opens, actively profiling the selected metric.
When you find a metric you don't want to track, create a drop rule to stop collecting data for that metric.
Use one of the following methods to create a drop rule from Usage Analyzer:
- Select a metric from the Metrics list, and then click Add Drop Rule.
- Select the checkboxes for multiple metrics in the Metrics list, and then click + Drop rule.
Follow the process to create a drop rule. When selecting metrics from the Usage Analyzer, the Add Drop Rule dialog pre-populates the Name, Key, and Value fields in the Visual Editor tab. Click Code Config for code to copy or download for Chronoctl, Terraform, or the Chronosphere API.
The following examples are ways you can use the Usage Analyzer to improve your Chronosphere experience.
When reviewing your metrics, look for items with a high DPPS and a low Utility Score. Use drop rules to discard these metrics and to decrease the amount of metrics noise. Review the usage table to determine if the DPPS are for a particular label, and consider writing rollup rules to drop labels with high DPPS but low usage.
For example, the metric
container_sockets has a DPPS of
7516, while the
156187 unique values. This metric isn't used in any queries or
references and therefore might not be very valuable. Consider using a rollup rule to
reduce the number of data points kept. Use the
Aggregation Rules UI to review
the impact of roll up rules.
Increase your dashboards' value to your operators by identifying and adding metrics that people are looking for but which don't exist as displayed results.
Metrics with high Direct Query Executions indicate users want the data from that metric and are looking for the data in ways that aren't presented to them. If these metrics also have low DPPS, they're highly valuable to display.
Reviewing usage data can help you understand patterns in your system to help guide other system decisions, such as which high-value metrics to include in template dashboards that every team can use.
For example, you review the usage details of a particular metric and determine that several teams are using that metric in their individual dashboards. Consider creating a single dashboard for multiple teams.
As another example, identify expensive (high DPPS) metrics with few users. You can work with those users directly to reduce the cost of metrics in use, or to raise a metric's value to other teams. You can see what teams are using expensive metrics and encourage them to create shared dashboards.
Use the Missing metrics sorting option to identify highly utilized metrics that haven't reported any data. These metrics may no longer be useful and can be dropped, or the metric might be reporting under a different key:value, and references to that metric need updating.
If your teams are using similar labels that should be the same, consider using derived labels to search for similar labels and consolidate them.
For example, you might have metrics whose names start with
grpc_ and use the
label_0 label. Use the Live Telemetry Analyzer to search for
label_0 and decide whether
a derived label can help consolidate data from multiple metrics in a single label.