Telemetry Usage Analyzer

Analyze usage metrics

The Telemetry Usage Analyzer lets you view all metrics in your system ranked from least-used to most-used, alongside information about the persisted Data Points Per Second (DPPS) and cardinality of each metric. You can sort the list to find unused metrics to drop or roll up, and also find highly requested but not yet ingested metrics you can add to the system to help boost valuable signal.

Usage analysis gives you insight into the usage of the data in your system whether by dashboard, monitor, shaping rule, or Explorer query execution. You can more confidently identify never-used data, make decisions about the shape of that data, and understand the impact of a proposed shaping rule to users of that data.

The Telemetry Usage Analyzer supports only Prometheus metrics.

In the navigation menu click Go to Admin, and then select Analyzers > Telemetry Usage Analyzer.

Use multiple filters to search for subsets of metrics. For example, to find metrics that have high data points but aren't important to any of your users, filter by Highest persisted and then sort the table by Utility score.

Metrics and labels data

The Metrics or Labels section is a list of recorded items.

Click Group by metric or Group by label to group your data.

The following sort options are available depending on your grouping selection:

Sort orderMetricsLabelsDescription
Least valuable Yes YesMetrics with high DPPS but low utilization.
Most valuable Yes YesMetrics with a high utilization to DPPS ratio.
Most utilized Yes YesMetrics and labels with the highest utility score.
Least utilized Yes YesMetrics and labels with the lowest utility score.
Highest persisted Yes NoMetrics with the most DPPS.
Lowest persisted Yes NoMetrics with the fewest DPPS.
Missing metrics Yes NoMetrics with both zero DPPS and highest utility scores.
Highest cardinality No YesMetrics with the highest number of unique labels.
Lowest cardinality No YesMetrics with the fewest unique labels.

Each card in the list contains the metric name or label, and other values defined in the Summary.

Usage data is based on activity from the previous 30 days to capture both current and cyclical usage patterns. Click a metric or label name to view a Summary of details about that item.

Summary

The Summary section contains a set of cards with sum totals for usage details. The cards and table vary slightly depending on what Group by you've selected.

The total of associated metrics or labels won't always match the total for the Summary cards. Summary cards are based on the selected metric or label, each of which can have multiple references in its group.

  • Associated Label Keys: All labels associated with the metric name.

  • Associated Metric: Each metric that uses this label.

  • Configuration References: Number of times this metric appears in a dashboard, monitor configuration, or shaping rule in the previous 30 days.

  • Direct Query Executions: The number of times in the past 30 days the label was used as part of a backend query, such as an Explorer query, a query executed from a dashboard panel, or a query from an external source.

  • Unique or Unique Users: The number of individual users querying this metric.

  • Utility Score: An aggregate number indicating relative usefulness of this metric, determined by the number of References, Query Executions, and unique users across internal and external queries. A higher score means users include this metric and its associated labels in their workflows. For example, two metrics with the same number of executions but different numbers of unique users have different utility scores. Metrics with more unique users have a higher utility score.

    Utility scores match specific metric names or labels when calculating, and not by regular expression.

    Internal queries are queries made within the Chronosphere app. External queries are queries originating from other systems against Chronosphere data. For example, querying Chronosphere data for use in a non-Chronosphere application.

    The formula used to calculate the utility score is Utility score = (number of references)+(number of executions * %(unique users/total users)).

  • Metric DPPS or DPPS: Total data points per second (DPPS) for this metric. DPPS is the average DPPS over the previous five minutes to capture the current volume.

  • Unique Label Values or Unique Values: The number of unique entries for this label.

  • Appears in: The percentage of incoming metrics using the selected label.

Click the three vertical dots icon at the end of each line to:

  • View Usage Details for that metric or label.
  • Analyze Label or Analyze Metric, depending on the selected grouping.

Clicking an analyze option selects that label or metric in the Metrics and Labels section, starting a search for the corresponding data.

For example, if you have a metric with the label error_type, and you want to find other metrics using that label:

  1. Select the three vertical dots icon at the end of the row for error_type.
  2. Click Analyze Label.

The Summary table displays a list of all Associated metrics for that label by adding the label as a search term to the labels list in the left sidebar.

Search

Use the Search box to find one or more associated labels for a selected metric, or to find metrics associated with a selected label. Metric and label pairs that have low utility scores and no references are good candidates to consider using a rollup rule to reduce cardinality.

Usage details

Each metric and each label associated with a metric has Usage Details.

Label usage details

For overall details about a metric, in the body of the page, next to the metric name, click Usage Details. A dialog for specific details for this metric and where it's used displays.

You can select a value from the Label menu to filter to a specific label.

You can also click Usage Details next to an Associated Label Key to open the dialog, filtered to that label.

For each label, the following details display, with a numerical count of occurrences:

  • Configuration References: Configurations with the metric and label explicitly referenced. These include Dashboards, Monitors, Recording Rules, Drop Rules, and Aggregator Rules.
  • Direct Query Executions: Query executions with the metric and label explicitly referenced. These include Metrics Explorer, External Sources, Dashboards,and Unique Users.
  • Utility Score: An aggregate number indicating the relative usefulness of this metric, determined by the number of References, Executions, and unique users. A higher score means users include this metric in their workflows.
  • Metric DPPS: Cached data points per second for the last five minutes.
  • Unique Label Values: Selecting a label displays the number of unique values that label has.

These details are also broken down by where and how many times they're used:

  • Dashboard and Monitor configurations show the number of references, when they were Added, and their name. Click the name to open that dashboard or monitor. Queries from Grafana dashboards include the UUID of the Grafana dashboard.
  • Recording, Drop, and Aggregator rule configurations show when they were Added or Created, and the Metric Name.
  • Metrics Explorer and Dashboard direct queries show how many Executions, the Query, Date, and User. Dashboards also display the Dashboard and Panel a query executed from.
  • Unique Users show the User, the Total Executions, Dashboard Executions, and Explorer Executions.

Histogram chart

When you select Direct Query Executions from Metrics Explorer or Dashboards, a chart displays showing the distribution of queries made across the selected time range. Hold the pointer over a bar in the graph to display a tool tip with the number of queries for that day.

In the User search box, enter a user's email address to filter query executions to a particular user.

Click a date bar in the chart to filter the table to queries made on that date. A date filter displays next to the user search box, followed by the number of executions for that date out of the total executions. To remove the filter, click the x.

Analyze incoming metrics

You can analyze the incoming data for a specific metric. Select a metric from the Metrics list, and then click View Incoming Metrics . The Live Telemetry Analyzer opens, actively profiling the selected metric.

Create drop rules

When you find a metric you don't want to track, create a drop rule to stop collecting data for that metric.

Use one of the following methods to create a drop rule from Usage Analyzer:

  • Select a metric from the Metrics list, and then click Add Drop Rule.
  • Select the checkboxes for multiple metrics in the Metrics list, and then click + Drop rule.

Follow the process to create a drop rule. When selecting metrics from the Usage Analyzer, the Add Drop Rule dialog pre-populates the Name, Key, and Value fields in the Visual Editor tab. Click Code Config for code to copy or download for Chronoctl, Terraform, or the Chronosphere API.

Workflows

The following examples are ways you can use the Usage Analyzer to improve your Chronosphere experience.

High-volume, low-utility metrics

When reviewing your metrics, look for items with a high DPPS and a low Utility Score. Use drop rules to discard these metrics and to decrease the amount of metrics noise. Review the usage table to determine if the DPPS are for a particular label, and consider writing rollup rules to drop labels with high DPPS but low usage.

For example, the metric container_sockets has a DPPS of 7516, while the id label has 156187 unique values. This metric isn't used in any queries or references and therefore might not be very valuable. Consider using a rollup rule to reduce the number of data points kept. Use the Aggregation Rules UI to review the impact of roll up rules.

A high volume

Metrics to add

Increase your dashboards' value to your operators by identifying and adding metrics that people are looking for but which don't exist as displayed results.

Metrics with high Direct Query Executions indicate users want the data from that metric and are looking for the data in ways that aren't presented to them. If these metrics also have low DPPS, they're highly valuable to display.

Usage patterns

Reviewing usage data can help you understand patterns in your system to help guide other system decisions, such as which high-value metrics to include in template dashboards that every team can use.

For example, you review the usage details of a particular metric and determine that several teams are using that metric in their individual dashboards. Consider creating a single dashboard for multiple teams.

As another example, identify expensive (high DPPS) metrics with few users. You can work with those users directly to reduce the cost of metrics in use, or to raise a metric's value to other teams. You can see what teams are using expensive metrics and encourage them to create shared dashboards.

Missing metrics

Use the Missing metrics sorting option to identify highly utilized metrics that haven't reported any data. This list typically includes:

  • Metrics that likely aren't useful anymore and can be dropped.
  • Metrics that might be reporting under a different key:value pair and need their references updated.

Use derived labels

If your teams are using similar labels that should be the same, consider using derived labels to search for similar labels and consolidate them.

For example, you might have metrics whose names start with grpc_ and use the label_0 label. Use the Live Telemetry Analyzer to search for label_0 and decide whether a derived label can help consolidate data from multiple metrics in a single label.