Analyze live traffic metrics
The Live Telemetry Analyzer provides a real-time view of incoming metrics grouped by label, and their relative frequency. This view helps you understand how often your applications emit metrics, troubleshoot spikes in ingest rates, and ensure that the Chronosphere Collector is aware of particular metrics. Use the metrics telemetry analyzer as a first step in identifying opportunities to reduce the overall volume of metrics.
Capture and analyze live profiling data
To use the metrics telemetry analyzer to capture live profiling data:
-
In the navigation menu, click Go to Admin and then select Analyzers > Live Telemetry Analyzer.
-
Click Capture live data to begin gathering statistics for data that Observability Platform accepted for matching. Two tables display with the following columns:
- Unique Values: Number of unique values for the respective label key.
- Appears In: The percentage of metrics you’re viewing that have the matching label key.
- Avg. DPPS: Average data points per second (DPPS), calculated over the previous 15 seconds.
- Current DPPS: Current data points per second.
You can make changes to the groupings and filters while profiling.
-
To modify the displayed data, select an option from the Data phase menu to show data in different stages along the pathway from ingestion to persistence.
For example, select Rejected by drop rule to view all data that Observability Platform dropped because of a configured drop rule.
-
To display data for a specific pool only, select a pool from the Pool menu to choose a specific metric pool.
Use the Priority menu to narrow the filter to a specific pool priority.
To filter on specific labels, start typing a label name in the Add Label Filter text box, choose from an autocomplete list of labels, and then add a value to filter for a specific label.
Analyze metrics
When analyzing traffic, use the following methods to help narrow your analysis and find the information you need:
- Review the metric names that generate the most data points per second (Avg. DPPS or Current DPPS). If those metrics are unfamiliar to you or are expensive, these might be candidates to roll up or drop.
- Ensure your drop and rollup rules are working as expected by reviewing your rolled up metrics, or ensuring that a dropped metric no longer displays.
- Group metrics by job to identify the specific scrape jobs generating the most metrics. Filter for each job, and analyze the job’s individual metrics to find opportunities for reduction. Metrics from the same job are often used together, letting you investigate metrics from a single job more quickly.
- Review individual clusters, or data-plane versus control-plane clusters to optimize specific areas.
- Review metrics isolated to single environments. For example, metrics available only in development or production environment metrics. These are likely to have different metric workload shapes from each other.
Ingestion stages and phases
Chronosphere profiles metrics in the ingestion and persistence stages, both of which include several phases.
Ingestion: Metrics sent directly from the Chronosphere Collector. Ingestion includes these phases:
- Received: Not selectable.
- Rejected By Drop Rule: Toggle metrics dropped due to drop rules. This option is relevant only for the Ingestion phase.
- Rejected by Ingest limit: Metrics that dropped due to exceeding the ingestion or persistence phase rate limit.
- Accepted for Matching: Metrics which aren’t dropped prior to ingestion.
Persistence: Metrics sent to the database. This stage includes aggregated metrics and the following phases:
- Rejected by Persist limit: Metrics not sent to permanent storage due to persistence limits.
- Accepted for Storage: Metrics sent to storage.
- Stored: Not selectable.
Special request metadata
The Live Telemetry Analyzer generates rows for the following special non-label request metadata. This special non-label request metadata is available in the Live Telemetry Analyzer and for matching in rollup rules, but isn’t stored.
The following label keys display for all incoming metrics:
__metric_type__
displays on the incoming metric’s Chronosphere metric type. Valid values arecumulative_counter
,delta_counter
,gauge
, ormeasurement
. This is the recommended method for determining an incoming metric’s type.__metric_source__
displays on the incoming metric’s source format. Valid values arecarbon
,chrono_gcp
,cloudwatch_metric_stream
,dogstatsd
,open_metrics
,open_telemetry
,prometheus
,signalfx
,statsd
, orwavefront
.
When ingesting data with Prometheus, the following label keys display:
__m3_prom_type__
displays the incoming metric’s Prometheus metric type. Valid values arecounter
,gauge
,histogram
,gauge_histogram
,summary
,info
,state_set
, orquantile
.
When ingesting data with OpenTelemetry, the following label keys display:
__otel_type__
displays the incoming metric’s OpenTelemetry metric type. Valid values aresum
,monotonic_sum
,gauge
,histogram
,exp_histogram
, orsummary
.__otel_temporality__
displays the incoming metric’s OpenTelemetry temporality. Valid values aredelta
orcumulative
.- DEPRECATED:
__m3_type__
displays on the incoming metric’s legacy M3 type, if any. Valid values arecounter
,gauge
, ortimer
.
Group and filter metrics
The initial view displays two tables, which list all labels for all metrics. The Labels table lists all labels collected during the capture.
Use the Search text box to find a specific label. The Search text box filters as you type, reducing the label list displayed. Live Telemetry Analyzer uses glob syntax.
Observability Platform glob syntax doesn’t support using two asterisks where one of
them is in the middle of a string. For example, *k8s*staging
isn’t valid.
Select the checkbox next to any label to filter the Label Values table by the selected value.
The right table shows the Label Values. Click a label value to add it to the Add Label Filter text box.
Filter both tables by adding label key:value
pairs to the Add Label Filter
field by selecting them from the table on the left, or type in the field. Typing in
the field displays a Label and Value text box. The Label field displays a
matching list of label keys as you type. Select an option from the list at any time.
Click the check icon when finished. Click any label value
to edit it.
Click the arrow in any of the columns to sort by that data to help interpret the results. For example, a high total percentage in the Appears In column with low unique values gives you a high-level breakdown of where to attribute metrics. You can also sort by the Unique Values column, which helps identify high-cardinality labels.
Consider the following metrics as an example:
sign_up{location="placeA"}
sign_up{location="placeB"}
login{version="v0.1.0"}
With these metrics, the Live Telemetry Analyzer generates three rows, based on the
three labels: __name__
, location
, and version
. Because every metric has a
__name__
label, the percentage for that label is 100%. There are only two unique
values for __name__
, which are sign_up
and login
, causing the Unique Values
column to display 2
. Only two metrics have the location
label, which is 66%
,
and there are two unique values for this label (placeA
and placeB
). The same
applies for version
.
Label Keys | Unique Values | Appears In |
---|---|---|
__name__ | 2 | 100% |
location | 2 | 66% |
version | 1 | 33% |
Identify rules that generate metrics
When using the Live Telemetry Analyzer, you can view which metrics were rejected by a drop rule or impacted by an aggregation rule. You can also view the specific rollup rule or drop rule that caused a metric to be aggregated or dropped. Use this information to help understand why metrics are missing, and why results are formatted in a particular way. You can also click the rule name to go directly to the rule in Observability Platform.
The Live Telemetry Analyzer displays only the first rule that affects the metric, because many rules can impact an individual metric.
To understand which drop rules are dropping certain metrics:
-
In the Data phase menu, select Rejected by drop rule.
-
Under the Labels section, select the
__drop_rule_slug__
label.The drop rule slug names display in the Label values table.
-
Click the arrow icon to navigate directly to the drop rule that caused the metrics to be dropped.
To understand which aggregation rules are producing aggregated metrics:
-
In the Data phase menu, select Accepted for storage.
-
Under the Labels section, select the
__rollup_rule_slug__
label.The aggregation rule slug names display in the Label values table.
-
Click the arrow icon to navigate directly to the aggregation rule that produced the aggregated metric.
Troubleshoot missing metrics
If metrics don’t display when running the Live Telemetry Analyzer:
- Examine the filters to ensure they’re not dropping the metrics you’re searching for.
- Review the Collectors dashboard and ensure metrics are being scraped by the Collectors.
- See the troubleshooting page for more help.