OBSERVABILITY PLATFORM
Using custom PromQL functions

Using custom PromQL functions in Observability Platform

In addition to PromQL’s standard functions, Chronosphere Observability Platform also supports the following custom functions.

⚠️

All panel types except Markdown and Service topology use queries.

Queries that return an extremely large number of data points or invalid results can result in panel errors. For example, a query might return an error for exceeding server resource limits.

Observability Platform reports these errors with an icon that appears in the corner of the Preview pane of the Add panel or Edit panel interfaces, or on the panel when viewing it on the dashboard. Hold the pointer over the icon to view the error message.

cardinality_estimate

The cardinality_estimate function returns the count estimate of elements in the given instant vector. For example, cardinality_estimate(vec{}) returns the estimate cardinality of the vec metric.

Use the cardinality_estimate function in the following ways:

  • To help approximate cardinality for specific metrics, labels, or label-value pairs over time that can’t be correlated using the Persisted Cardinality Quotas dashboard.
  • To have a general trend of your cardinality growth over time, because this function can return results for millions of time series.

Don’t use this function to help understand the relative cardinality impact of a particular series on your license. Instead, use the Persisted Cardinality Quotas dashboard to understand cardinality costs across specific teams, services, and pools, and to help pinpoint specific sources of cardinality growth, such as a particular pool or priority group.

The cardinality_estimate function doesn’t measure cardinality in the same 150-minute rolling time window used by license metrics.

Instead, this function approximates relative cardinality using 120-minute disjointed blocks, which can create drift. When looking over historical periods of time, the cardinality_estimate function uses even longer blocks.

This function supports grouping time series by labels, and returns an estimate cardinality for each unique value of the label using the by clause in a query. For example, the following query returns the cardinality estimate of all time series that match the metric name with a value for the device label equal to eth0, grouped by unique values for the k8s_cluster label:

cardinality_estimate(node_network_receive_bytes_total{device="eth0"}) by (k8s_cluster)

You can’t group by derived telemetry with this function.

Counting and downsampling

The cardinality_estimate function isn’t a direct alternative to the count function. Because it’s mostly optimized for performance and low-latency use cases, results might not be exact. This function also returns results with much lower resolution than the count function. The resolution aligns with the index block size.

This function provides an alternative to the Prometheus count_over_time function, which isn’t performant when viewing time series with high cardinality.

The cardinality_estimate is affected by long term downsampling of the data it’s based on, and results might change based on the querying window’s time range. When querying the raw namespace, this function returns the count of time series over a two-hour period. However, when querying the downsampled namespace, this function returns the count of time series over a period between 24 hours and four days, which makes the volume look much larger than it actually is.

head_{agg}

head_{agg}(q, n) sorts the time series by the largest value based on the specified aggregation function and returns the top n number of series.

The list of available aggregation functions are:

  • avg
  • min
  • max
  • sum
  • count

For example, head_avg(MY_METRIC{}, 10) returns the top 10 time series sorted by the largest average of their values.

In most cases, head_{agg}() is appropriate. However, if you have time series with a high churn rate, such as metrics that track Kubernetes pod level data, use topk(). This is because the head_{agg} family of functions aggregates across all time series in the graph, and if you have a metric with high churns, you can miss outliers (depending on their values). In contrast, topk() takes the top x time series based on their value at each timestamp.

tail_{agg}

tail_{agg} sorts the time series by the largest value based on the specified aggregation function and returns the bottom n number of series.

The list of available aggregation functions are:

  • avg
  • min
  • max
  • sum
  • count

For example, tail_avg(MY_METRIC{}, 10) returns the bottom 10 time series sorted by the largest average of their values.

In most cases, tail_{agg}() is appropriate. However, if you have times series with a high churn rate, such as metrics that track Kubernetes pod level data, use bottomk(). This is because the tail_{agg} family of functions aggregates across all time series in the graph, and if you have a metric with high churns, you can miss outliers (depending on their values). In contrast, bottomk() takes the bottom x time series based on their value at each timestamp.

sum_per_second()

sum_per_second() calculates the per-second rate for a delta counter or delta histogram time series. It is equivalent to dividing the result of sum_over_time() by the sliding time window duration.

Assuming a step value of 5m, these PromQL queries return the same result:

sum_per_second(http_request_count{}[5m])

sum_over_time(http_request_count{}[5m]) / 300

To ensure the chart value at each step represents the sum of observations for each step’s start and end time, you must set the query’s step size to be equal to the sliding time window value. For more guidance, see Best practices for adding dashboard charts.

Other querying features

Observability Platform also provides querying features beyond those covered by using query languages in its user interface.

  • Delta queries: Query metrics that employ delta temporality, as opposed to cumulative temporality.
  • Prometheus API access: Interact directly with Prometheus API endpoints for programmatic workflows.