Metrics dictionary
This dictionary defines metrics created by and specific to Chronosphere Observability Platform. These metrics are often included in default dashboards, and you can search for them anywhere you use metrics. These curated metrics can help track basic important information about your Observability Platform instance.
The Chronosphere Health Check dashboard includes links to the Collectors, Usage Dashboard, and Licensing Information dashboards.
Query these metrics as their respective Prometheus type.
Capacity limits
Licensing capacity is based on your telemetry types and usage. The following tables describe metrics used for specific sections of your license.
Chronosphere recommends creating alerts using the existing Capacity Limit metrics, which are also used in the Metrics License Consumption dashboard. Use alerts to be notified when you're close to or over 100% of your license limit and therefore at risk of experiencing drops:
Persisted cardinality
The following metrics apply to persisted cardinality, which is a cumulative measure that calculates the sum of the unique time series of the persisted writes that Observability Platform stores, seen over the last 2.5 hours only.
Metric name | Description |
---|---|
chrono_metrics_persisted_cardinality_license_limit | License limit for active persisted time series cardinality. |
chrono_metrics_persisted_cardinality_license_capacity | Capacity limit for active persisted time series cardinality. |
chrono_metrics_persisted_cardinality_license_consumed | Consumption of the persisted write cardinality limit by datapoint type. |
Query the following metric to understand if data is actively being dropped:
chrono_metrics_persisted_license_dpps_dropped{limit="persisted_cardinality"}
Persisted writes
The following metrics apply to persisted writes, which are writes to the Observability Platform database.
Metric name | Description |
---|---|
chrono_metrics_persisted_writes_license_dpps_limit | License limit for persisted write DPPS by datapoint type. |
chrono_metrics_persisted_writes_license_dpps_capacity | Capacity limit for persisted write DPPS by datapoint type. |
chrono_metrics_persisted_writes_license_dpps_consumed | Consumption rate in DPPS of persisted write license by datapoint type. |
Query the following metric to understand if data is actively being dropped:
chrono_metrics_persisted_license_dpps_dropped{limit="persisted_writes"}
Matched writes
The following metrics apply to matched writes, which are the number of writes per second being matched for transformation and reshaping by the Observability Platform aggregation tier.
Metric name | Description |
---|---|
chrono_metrics_matched_writes_license_dpps_limit | License limit for matched write DPPS by datapoint type. |
chrono_metrics_matched_writes_license_dpps_capacity | Capacity limit for matched write DPPS by datapoint type. |
chrono_metrics_matched_writes_license_dpps_consumed | Consumption rate in DPPS of matched write license by datapoint type. |
Query the following metric to understand if data is actively being dropped:
chrono_metrics_matched_license_dpps_dropped
Legacy licensing metrics
The following table explains metrics that might be present in your environment, but will be replaced by new metrics. Persisted writes, persisted cardinality, and matched writes specific metrics will replace this table.
Metric name | Metric type | Description | Tags provided during dashboard creation |
---|---|---|---|
limit_service_cardinality_count replaced by chrono_metrics_persisted_cardinality_license_consumed | Counter | Current cardinality count across all Collectors. | chronosphere_service |
limit_service_licensed_cardinality_limit replaced by chrono_metrics_persisted_cardinality_license_limit | Counter | Current cardinality limit across all Collectors. | chronosphere_service |
limit_service_licensed_persist_limit replaced by chrono_metrics_persisted_writes_license_dpps_limit | Counter | Current limit for data points persisted in the database across all Collectors, as defined in the contract. | chronosphere_service |
limit_service_capacity_limit | Counter | Current capacity limit for data points persisted in the database across all Collectors, based on grant by Chronosphere. | chronosphere_service |
limit_service_licensed_processing_limit | Counter | Current limit for processed data points across all Collectors. | chronosphere_service |
limit_service_persisted_count replaced by chrono_metrics_persisted_writes_license_dpps_consumed | Counter | Total number of data points persisted in database. | chronosphere_service |
limit_service_processed_count | Counter | Current count of processed data points across all Collectors. | chronosphere_service |
limit_service_matched_limit replaced by chrono_metrics_matched_writes_license_dpps_limit | Counter | Current license limit for matched write DPPS by datapoint type. | chronosphere_service |
limit_service_capacity_limit replaced by chrono_metrics_matched_writes_license_dpps_capacity | Counter | Current capacity limit for matched write DPPS by datapoint type. | chronosphere_service |
chronosphere_rule_metrics_matched replaced by chrono_metrics_matched_writes_license_dpps_consumed | Counter | Consumption rate in DPPS of matched write license by datapoint type. | chronosphere_service |
Collectors
The Collectors dashboard includes the following metrics that Collectors generate. Use this dashboard to monitor the health of your Collectors.
Metric name | Metric type | Description | Tags provided during dashboard creation |
---|---|---|---|
chronocollector_build_information | Gauge | Metrics relating to current build of Collectors. | branch build_date build_version chronosphere_k8s_cluster chronosphere_k8s_container_port chronosphere_k8s_namespace cluster go_version hostname instance job k8s_cluster_id pod_name namespace region revision service |
chronocollector_gateway_push_errors | Counter | Current total number of push errors from Collector. | chronosphere_k8s_cluster chronosphere_k8s_container_port chronosphere_k8s_namespace component environment hostname instance job k8s_cluster_id namespace region service |
chronocollector_gateway_push_latency | Summary | Latency of pushed writes by Collector. | chronosphere_k8s_cluster chronosphere_k8s_container_port chronosphere_k8s_namespace component environment instance job k8s_cluster_id namespace quantile region service |
chronocollector_gateway_push_success | Counter | Total number of metrics successfully pushed to the Chronosphere gateway. | annotationsPrefix cluster component env environment instance job node region service service_account team version |
chronocollector_gateway_write_success | Counter | Total number of metrics successfully written to the Chronosphere gateway. | annotationsPrefix cluster component env environment instance job node region service service_account team version |
chronocollector_k8s_gatherer_processor_targets_active | Gauge | Current number of active targets Collector is scraping. | environment instance job k8s_cluster_id namespace region service |
process_cpu_seconds_total | Counter | Current total number of seconds of CPU processing time. | environment instance job k8s_cluster_id namespace node region service |
Query overview
The Chronosphere Query Overview dashboard includes the following metrics. Use this dashboard to identify resource-intensive alert or recording groups.
Metric name | Metric type | Description | Tags provided during dashboard creation |
---|---|---|---|
permits_quota | Counter | Amount of resources used associated to querying time series. | chronosphere_k8s_namespace endpoint instance job permit pod_name source |
permits_throttled | Counter | Amount of throttling applied to queries. | chronosphere_k8s_namespace endpoint instance job permit pod_name source |
permits_wait_total | Counter | Amount of time spent waiting to access querying resources. | chronosphere_k8s_cluster chronosphere_k8s_namespace endpoint instance job permit pod_name source |
prometheus_rule_group_last_duration_seconds | Histogram | The total time the group took to complete its last iteration, in seconds. | chronosphere_k8s_cluster chronosphere_k8s_namespace instance job pod_name rule_group |
Policy statistics
The following usage metrics apply to policy statistics.
Metric name | Metric type | Description | Tags provided during dashboard creation |
---|---|---|---|
chrono_policies_count | Counter | Tracks actions for ingestion policies, grouped by the name of the policy. | dropped policy_name type |
chrono_policies_total | Counter | Tracks actions for ingestion policies with any naming policy. | dropped policy_name type |
Shaping usage statistics
The following usage metrics apply to shaping statistics.
Metric name | Metric type | Description | Tags provided during dashboard creation |
---|---|---|---|
chrono_poolstats_count | Counter | Shaping statistics that include pool information. | drop_reason dropped type |
chrono_poolstats_total | Counter | Total shaping statistics without any tag information. | drop_reason dropped type |
chrono_poolstats_sampling | Counter | Emitted only when the number of unique usage statistics values exceeds the configured maximum allowed tags. | node type |
Usage statistics
The Usage Dashboard includes the following usage statistics metrics. Use this dashboard to identify who is contributing most to your Chronosphere usage and manage your overall usage.
Metric name | Metric type | Description | Tags provided during dashboard creation |
---|---|---|---|
chrono_usagestats_count | Counter | Usage statistics grouped by tags. | drop_reason dropped type |
chrono_usagestats_total | Counter | Total usage statistics without any grouping. | drop_reason dropped type |
chrono_usagestats_count_sampling | Counter | Emitted only when the number of unique usage statistics values exceeds the configured maximum allowed tags. | node type |
Other usage statistics count label and metric name usage.
Metric name | Metric type | Description |
---|---|---|
chrono_datapoints_by_metric_per_second | Gauge | Contains the metric_name label. Emits the average data points per second over the last two minutes by metric name. |
chrono_datapoints_by_label_per_second | Gauge | Contains the label_name label. Emits the average data points per second over the last two minutes by label name. |
chrono_unique_label_values_count | Gauge | Contains the label_name label. Emits the unique values seen over the last two minutes, by label name. |
Service token usage
The following metric applies to service account tokens:
Metric name | Metric type | Description |
---|---|---|
chrono_api_token_requests_total | Counter | Monitors the number of requests made with a service account token. Can be inaccurate if more than 1000 service accounts are in use. |
The email
label for this metric corresponds to the email
field queryable in the
service accounts API.
Google Cloud integration
The following metrics apply to Observability Platform's Google Cloud integration:
Metric name | Metric type | Description |
---|---|---|
chrono_gcp_integration_shards_total | Gauge | The number of Google Cloud metric shards successfully ingested. |
chrono_gcp_integration_active_shards_total | Gauge | The number of active Google Cloud metric shards successfully ingested. |
chrono_gcp_integration_data_points_total | Counter | The total number of data points ingested. |
chrono_gcp_integration_metric_descriptors | Gauge | List of all metric descriptors ingested, indicated by value of 1 . |