Live Telemetry Analyzer

Analyze live traffic metrics

The Live Telemetry Analyzer provides a real-time view of incoming metrics grouped by label, and their relative frequency. This helps you understand how often your applications emit metrics, troubleshoot spikes in ingest rates, and ensure that the Collector is aware of particular metrics. Use the metrics telemetry analyzer as a first step in identifying opportunities to reduce the overall volume of metrics.

Use the telemetry analyzer

To use the live telemetry analyzer tool, you must have administrative privileges.

In the navigation menu click Go to Admin, and then select Analyzers > Live Telemetry Analyzer.

  • Capture live data: Click to begin gathering statistics. Click again to pause.
  • Copy link: After selecting one or more labels, click to copy the URL to share with other users.

To reduce the data displayed by the capture, use one or more of the following filters:

  • Data Phase: Select a single phase, or select Ingestion or Persistence to filter on all phases in that group.
  • Pool: Select a metric pool.
  • Priority: Select a pool priority.
  • Type in the Add Label Filter text box to choose from an autocomplete list of labels, and then add a value to filter for a specific label.

Ingestion stages and phases

Chronosphere profiles metrics in these stages:

  • Ingestion: Metrics sent directly from the Collector.
  • Persistence: Metrics sent to the database. This phase includes aggregated metrics.

The phases are:

  • Received: Not selectable.
  • Rejected By Drop Rule: Toggle metrics dropped due to drop rules. This option is relevant only for the Ingestion phase.
  • Rejected by Ingest limit: Metrics that dropped due to exceeding the ingestion or persistence phase rate limit.
  • Accepted for Matching: Metrics accepted into the system by matching an ingestion rule.
  • Rejected by Persist limit: Metrics not sent to permanent storage due to persistence limits.
  • Accepted for Storage: Metrics sent to storage.
  • Stored: Not selectable.

Group and filter metrics

The initial view displays two tables, which list all labels for all metrics. The Labels table lists all labels collected during the capture.

Use the Search text box to find a specific label. The Search text box filters as you type, reducing the label list displayed.

Select the checkbox next to any label to filter the Label Values table by the selected value.

The right table shows the Label Values. Click a label value to add it to the Add Label Filter text box.

Filter both tables by adding label key:value pairs to the Add Label Filter field by selecting them from the table on the left, or type in the field. Typing in the field displays a Label and Value text box. The Label field displays a matching list of label keys as you type. Select an option from the list at any time. Click the check icon when finished. Click any label value to edit it.

Profile metrics

Click Capture Live Data to start or pause the profiling of matching metrics. You can make changes to the groupings and filters while profiling.

Here's a guide to the column headings and what they mean:

  • Unique Values: Number of unique values for the respective label key.
  • Appears In: The percentage of metrics you're viewing that have the matching label key.
  • Avg. DPPS: Average data points per second, calculated over the previous 15 seconds.
  • Current DPPS: Current data points per second.

Click any of the columns to sort the column to help you interpret the results. For example, a high total percentage in the Appears In column with low unique values gives you a high-level breakdown of where to attribute metrics. You can also sort by the Unique Values column, which helps identify high-cardinality labels.

As an example, review the following three emitted metrics:

sign_up{location="placeA"}
sign_up{location="placeB"}
login{version="v0.1.0"}

With these metrics, the Live Telemetry Analyzer generates three rows, based on the three labels (__name__, location, and version). With every metric having a __name__ label, its percentage is 100%. There are only two unique values for __name__ (sign_up and login), causing the Unique Values column to display 2. Only two metrics have the location label, which is 66%, and there are two unique values for this label (placeA and placeB). The same applies for version.

Label KeysUnique ValuesAppears In
__name__2100%
location266%
version133%

The Live Telemetry Analyzer also generates rows for special non-label request metadata:

  • __metric_type__ displays on the incoming metric's Chronosphere metric type. Valid values are cumulative_counter, delta_counter, gauge, or measurement. This is the recommended method for determining an incoming metric's type.
  • When ingesting with Prometheus, __m3_prom_type__ displays the incoming metric's Prometheus metric type. Valid values are counter, gauge, histogram, gauge_histogram, summary, info, state_set, or quantile.
  • When ingesting with OpenTelemetry, __otel_type__ displays the incoming metric's OpenTelemetry metric type. Valid values are sum, monotonic_sum, gauge, histogram, exp_histogram, or summary.
  • When ingesting with OpenTelemetry, __otel_temporality__ displays the incoming metric's OpenTelemetry temporality. Valid values are delta or cumulative.
  • DEPRECATED: __m3_type__ displays on the incoming metric's legacy M3 type, if any. Valid values are counter, gauge, or timer.

This special non-label request metadata is available in the Live Telemetry Analyzer and for matching in rollup rules, but isn't stored.

Analyze metrics

When analyzing traffic, the following scenarios can guide you to finding the right information:

  • Review the metric names that generate the most data points per second (Avg. DPPS or Current DPPS). If those metrics are unfamiliar to you or are expensive, these might be candidates to roll up or drop.
  • Group metrics by job to identify the specific scrape jobs generating the most metrics. Filter for each job, and analyze the job's individual metrics to find opportunities for reduction. Metrics from the same job are often used together, letting you investigate metrics from a single job more quickly.
  • Review individual clusters, or data-plane versus control-plane clusters to optimize specific areas.
  • Review metrics isolated to single environments. For example, metrics available only in development or production environment metrics. These are likely to have different metric workload shapes from each other.

Troubleshoot missing metrics

If metrics don't display when running the Live Telemetry Analyzer:

  • Examine the filters to ensure they're not dropping the metrics you're searching for.
  • Review the Collectors dashboard and ensure metrics are being scraped by the Collectors.
  • Check the troubleshooting page for more help.