This dictionary defines metrics created by and specific to Chronosphere Observability
Platform. These metrics are often included in default dashboards, and
you can search for them anywhere you use metrics. These curated metrics can help
track basic important information about your Observability Platform instance.
The
Chronosphere Health Check
dashboard includes links to the Collectors,
Usage Dashboard, and Licensing Information dashboards.
Use the Metrics Catalog to learn
what metrics are in your tenant.
Query these metrics as their respective Prometheus type.
Collectors
The Collectors dashboard includes the following metrics that Collectors generate.
Use this dashboard to monitor the health of your Collectors.
See the Collectors tags that these metrics create during
dashboard creation.
| Metric name | Metric type | Description |
|---|
chronocollector_build_information | Gauge | Metrics relating to current build of Collectors. |
chronocollector_gateway_push_errors | Counter | Current total number of push errors from Collector. |
chronocollector_gateway_push_latency | Summary | Latency of pushed writes by Collector. |
chronocollector_gateway_push_success | Counter | Total count of requests and RPC calls successfully pushed to the Chronosphere gateway. |
chronocollector_gateway_write_success | Counter | Total count of each time series that are successfully written to the Chronosphere gateway. Counts only the metrics that the Collector scrapes, and not metrics from push-based protocols such as tracing data. |
chronocollector_k8s_gatherer_sink_targets_active | Gauge | Current number of active targets Collector is scraping. |
process_cpu_seconds_total | Counter | Current total number of seconds of CPU processing time. |
Prior to Collector v0.104.0, chronocollector_k8s_gatherer_sink_targets_active
was named chronocollector_k8s_gatherer_processor_targets_active.
The following metrics create these tags during
dashboard creation:
| Metric name | Tags |
|---|
chronocollector_build_information | branch
build_date
build_version
chronosphere_k8s_cluster
chronosphere_k8s_container_port
chronosphere_k8s_namespace
cluster
go_version
hostname
instance
job
k8s_cluster_id
pod_name
namespace
region
revision
service
|
chronocollector_gateway_push_errors | chronosphere_k8s_cluster
chronosphere_k8s_container_port
chronosphere_k8s_namespace
component
environment
hostname
instance
job
k8s_cluster_id
namespace
region
service
|
chronocollector_gateway_push_latency | chronosphere_k8s_cluster
chronosphere_k8s_container_port
chronosphere_k8s_namespace
component
environment
instance
job
k8s_cluster_id
namespace
quantile
region
service
|
chronocollector_gateway_push_success | annotationsPrefix
cluster
component
env
environment
instance
job
node
region
service
service_account
team
version |
chronocollector_gateway_write_success | annotationsPrefix
cluster
component
env
environment
instance
job
node
region
service
service_account
team
version |
chronocollector_k8s_gatherer_sink_targets_active | environment
instance
job
k8s_cluster_id
namespace
region
service
|
process_cpu_seconds_total | environment
instance
job
k8s_cluster_id
namespace
node
region
service
|
Impact statistics
The following usage metric applies to impact statistics.
This metric creates the following tags during
dashboard creation:
| Metric name | Metric type | Description |
|---|
chrono_metrics_drop_rule_dpps_matched | Counter | Provides visibility into drop rule behavior across enabled, disabled, and preview rules. |
Usage statistics
Observability Platform provides different usage metrics you can use to understand and
manage your overall metric data usage.
Shaping usage statistics
The following usage metrics apply to shaping statistics.
| Metric name | Metric type | Description | Tags provided during dashboard creation |
|---|
chrono_poolstats_count | Counter | Shaping statistics that include pool information. | drop_reason
dropped
type |
chrono_poolstats_total | Counter | Total shaping statistics without any tag information. | drop_reason
dropped
type |
chrono_poolstats_sampling | Counter | Emitted only when the number of unique usage statistics values exceeds the configured maximum allowed tags. | node
type |
Usage metrics
The Usage Dashboard includes the following usage statistics metrics. Use this
dashboard to identify who is contributing most to your Chronosphere usage and manage
your overall usage.
| Metric name | Metric type | Description | Tags provided during dashboard creation |
|---|
chronosphere_uptime | Counter | Represents the Service Level Agreement (SLA) results for SLA checks. | none |
chrono_usagestats_count | Counter | Usage statistics grouped by tags. | drop_reason
dropped
type |
chrono_usagestats_total | Counter | Total usage statistics without any grouping. | drop_reason
dropped
type |
chrono_usagestats_count_sampling | Counter | Emitted only when the number of unique usage statistics values exceeds the configured maximum allowed tags. | node
type |
chrono_metrics_persisted_cardinality_license_top_usage | Gauge | Approximate cardinality of persisted metrics, grouped by tag. | top_label_name
top_label_value |
Other usage statistics count label and metric name usage.
| Metric name | Metric type | Description |
|---|
chrono_datapoints_by_metric_per_second | Gauge | Contains the metric_name label. Emits the average data points per second over the last two minutes by metric name. |
chrono_datapoints_by_label_per_second | Gauge | Contains the label_name label. Emits the average data points per second over the last two minutes by label name. |
chrono_unique_label_values_count | Gauge | Contains the label_name label. Emits the unique values seen over the last two minutes, by label name. |
Service token usage
The following metric applies to
service account tokens:
| Metric name | Metric type | Description |
|---|
chrono_api_token_requests_total | Counter | Monitors the number of requests made with a service account token. Can be inaccurate if more than 1000 service accounts are in use. |
The email label for this metric corresponds to the email field queryable in the
service accounts API.
Alerts
To learn about alert metrics, see Querying metrics about alerts.
Google Cloud integration
The following metrics apply to Observability Platform’s
Google Cloud integration:
| Metric name | Metric type | Description |
|---|
chrono_gcp_integration_shards_total | Gauge | The number of Google Cloud metric shards successfully ingested. |
chrono_gcp_integration_active_shards_total | Gauge | The number of active Google Cloud metric shards successfully ingested. |
chrono_gcp_integration_data_points_total | Counter | The total number of data points ingested. |
chrono_gcp_integration_metric_descriptors | Gauge | List of all metric descriptors ingested, indicated by value of 1. |
Service level objective (SLO) metrics
The following metrics apply to service level objectives (SLOs) defined
in Observability Platform.
Derived metrics
These metrics are derived from recording rules over various
time windows. DURATION values are passed in time-unit syntax,
such as 5m or 28d.
Each metric includes slo_id and slo_name labels to identify their associated
SLO, and any signal or dimensional grouping labels defined in the SLO.
| Metric name | Metric type | Description |
|---|
lens:slo:burn_rate:ratio<DURATION> | Gauge | Each SLO’s burn rate over the <DURATION>, calculated as sum_over_time(lens:slo:errors[<DURATION>]) / sum_over_time(lens:slo:total[<DURATION>]). |
lens:slo:error_ratio:rate<DURATION> | Gauge | Each SLO’s error rate over the <DURATION>, calculated as sum_over_time(lens:slo:errors[<DURATION>]) / sum_over_time(lens:slo:total[<DURATION>]). |
lens:slo:error_budget | Gauge (static) | Each SLO’s total error budget, as calculated by lens:slo:info. This static value doesn’t change over time. |
lens:slo:error_budget_minutes_<DURATION> | Gauge | Each SLO’s total error budget, calculated as minutes within the DURATION. |
lens:slo:budget_remaining_ratio<DURATION> | Gauge | Each SLO’s remaining error budget, calculated as a ratio from 0 (exhausted) to 1 (unused). This is the inverse of the reliability:ratio metric. |
lens:slo:budget_remaining_minutes<DURATION> | Gauge | Each SLO’s remaining error budget, calculated as minutes. |
lens:slo:reliability:ratio<DURATION> | Gauge | Each SLO’s performance, calculated as a ratio from 0 (0% reliability) to 1 (100%). This is the inverse of the budget_remaining_ratio metric. |
Recording rules
Gauges apply sum_over_time over 5-minute intervals. Counters report the rate over
a five-minute period.
| Metric name | Metric type | Description |
|---|
lens:slo:info | Gauge (static) | List of SLOs as time series with related facts, including type, team, and objective values, as a component of lens:slo:error_budget. This static value doesn’t change over time. |
lens:slo:errors | Gauge | Each SLO’s error count, used as the numerator in lens:slo:error_ratio. |
lens:slo:total | Gauge | Each SLO’s total count of measurements, used as the denominator in lens:slo:error_ratio. |