Analyze live traffic metrics
The Live Telemetry Analyzer provides a real-time view of incoming metrics grouped by label, and their relative frequency. This helps you understand how often your applications emit metrics, troubleshoot spikes in ingest rates, and ensure that the Collector is aware of particular metrics. Use the metrics telemetry analyzer as a first step in identifying opportunities to reduce the overall volume of metrics.
Use the telemetry analyzer
In the navigation menu, click Go to Admin and then select Analyzers > Live Telemetry Analyzer.
- Capture live data: Click to begin gathering statistics. Click again to pause.
- Copy link: After selecting one or more labels, click to copy the URL to share with other users.
To reduce the data displayed by the capture, use one or more of the following filters:
- Data Phase: Select a single phase, or select Ingestion or Persistence to filter on all phases in that group.
- Pool: Select a metric pool.
- Priority: Select a pool priority.
- Type in the Add Label Filter text box to choose from an autocomplete list of labels, and then add a value to filter for a specific label.
Ingestion stages and phases
Chronosphere profiles metrics in these stages:
- Ingestion: Metrics sent directly from the Collector.
- Persistence: Metrics sent to the database. This phase includes aggregated metrics.
The phases are:
- Received: Not selectable.
- Rejected By Drop Rule: Toggle metrics dropped due to drop rules. This option is relevant only for the Ingestion phase.
- Rejected by Ingest limit: Metrics that dropped due to exceeding the ingestion or persistence phase rate limit.
- Accepted for Matching: Metrics which aren't dropped prior to ingestion.
- Rejected by Persist limit: Metrics not sent to permanent storage due to persistence limits.
- Accepted for Storage: Metrics sent to storage.
- Stored: Not selectable.
Group and filter metrics
The initial view displays two tables, which list all labels for all metrics. The Labels table lists all labels collected during the capture.
Use the Search text box to find a specific label. The Search text box filters as you type, reducing the label list displayed. Live Telemetry Analyzer uses glob syntax.
Select the checkbox next to any label to filter the Label Values table by the selected value.
The right table shows the Label Values. Click a label value to add it to the Add Label Filter text box.
Filter both tables by adding label key:value
pairs to the Add Label Filter
field by selecting them from the table on the left, or type in the field. Typing in
the field displays a Label and Value text box. The Label field displays a
matching list of label keys as you type. Select an option from the list at any time.
Click the check icon when finished. Click any label value
to edit it.
Profile metrics
Click Capture Live Data to start or pause the profiling of matching metrics. You can make changes to the groupings and filters while profiling.
Here's a guide to the column headings and what they mean:
- Unique Values: Number of unique values for the respective label key.
- Appears In: The percentage of metrics you're viewing that have the matching label key.
- Avg. DPPS: Average data points per second, calculated over the previous 15 seconds.
- Current DPPS: Current data points per second.
Click any of the columns to sort the column to help you interpret the results. For example, a high total percentage in the Appears In column with low unique values gives you a high-level breakdown of where to attribute metrics. You can also sort by the Unique Values column, which helps identify high-cardinality labels.
As an example, review the following three emitted metrics:
sign_up{location="placeA"}
sign_up{location="placeB"}
login{version="v0.1.0"}
With these metrics, the Live Telemetry Analyzer generates three rows, based on the three
labels (__name__
, location
, and version
). With every metric having a __name__
label, its percentage is 100%. There are only two unique values for __name__
(sign_up
and login
), causing the Unique Values column to display 2
. Only
two metrics have the location
label, which is 66%
, and there are two unique
values for this label (placeA
and placeB
). The same applies for version
.
Label Keys | Unique Values | Appears In |
---|---|---|
__name__ | 2 | 100% |
location | 2 | 66% |
version | 1 | 33% |
The Live Telemetry Analyzer also generates rows for special non-label request metadata:
__metric_type__
displays on the incoming metric's Chronosphere metric type. Valid values arecumulative_counter
,delta_counter
,gauge
, ormeasurement
. This is the recommended method for determining an incoming metric's type.__metric_source__
displays on the incoming metric's source format. Valid values arecarbon
,chrono_gcp
,dogstatsd
,open_metrics
,open_telemetry
,prometheus
,signalfx
,statsd
, orwavefront
.- When ingesting with Prometheus,
__m3_prom_type__
displays the incoming metric's Prometheus metric type. Valid values arecounter
,gauge
,histogram
,gauge_histogram
,summary
,info
,state_set
, orquantile
. - When ingesting with OpenTelemetry,
__otel_type__
displays the incoming metric's OpenTelemetry metric type. Valid values aresum
,monotonic_sum
,gauge
,histogram
,exp_histogram
, orsummary
. - When ingesting with OpenTelemetry,
__otel_temporality__
displays the incoming metric's OpenTelemetry temporality. Valid values aredelta
orcumulative
. - DEPRECATED:
__m3_type__
displays on the incoming metric's legacy M3 type, if any. Valid values arecounter
,gauge
, ortimer
.
This special non-label request metadata is available in the Live Telemetry Analyzer and for matching in rollup rules, but isn't stored.
Analyze metrics
When analyzing traffic, the following scenarios can guide you to finding the right information:
- Review the metric names that generate the most data points per second (Avg. DPPS or Current DPPS). If those metrics are unfamiliar to you or are expensive, these might be candidates to roll up or drop.
- Ensure your drop and rollup rules are working as expected by reviewing your rolled up metrics, or ensuring that a dropped metric no longer displays.
- Group metrics by job to identify the specific scrape jobs generating the most metrics. Filter for each job, and analyze the job's individual metrics to find opportunities for reduction. Metrics from the same job are often used together, letting you investigate metrics from a single job more quickly.
- Review individual clusters, or data-plane versus control-plane clusters to optimize specific areas.
- Review metrics isolated to single environments. For example, metrics available only in development or production environment metrics. These are likely to have different metric workload shapes from each other.
Troubleshoot missing metrics
If metrics don't display when running the Live Telemetry Analyzer:
- Examine the filters to ensure they're not dropping the metrics you're searching for.
- Review the Collectors dashboard and ensure metrics are being scraped by the Collectors.
- Check the troubleshooting page for more help.