> ## Documentation Index
> Fetch the complete documentation index at: https://docs.chronosphere.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Using custom PromQL functions in Observability Platform

In addition to
[PromQL's standard functions](/investigate/querying/promql/apply-functions),
Chronosphere Observability Platform also supports the following custom functions.

<Warning>
  All panel types except Markdown, Service topology, and External use queries.

  Queries that return an extremely large number of data points or invalid results
  can result in panel errors. For example, a query might return an error for exceeding
  server resource limits.

  Observability Platform reports these errors with an icon that appears in the corner
  of the **Preview** pane of the **Add panel** or **Edit panel** interfaces, or on the
  panel when viewing it on the dashboard. Hold the pointer over the icon to view the
  error message.
</Warning>

## `__default_over_time()`

The `__default_over_time(v range-vector, defaultValue scalar, lookback scalar)`
function returns the most recent value from a range vector if it exists within the
specified lookback window. Otherwise, the function returns a default value. Use this
function when you have metrics that report intermittently and you want to distinguish
between "no recent data" and "data shows zero." The function only processes float
values and ignores any histogram samples in the input.

The lookback parameter can be specified as either a duration (such as `1m`, `5m`) or
in seconds. The lookback window must be shorter than the range selector to ensure
default values get inserted when data is missing. For instance, the following example
returns the last value if that value occurred within the past minute, and otherwise
returns `0`:

```text theme={null}
__default_over_time(metric[5m], 0, 1m)
```

In the following example, the function is looking for data points up to five minutes
from the current step, and only inserts defaults if there were no data points within five
minutes. The function can't meet this condition, because the range selector and
lookback window are the same.

```text theme={null}
__default_over_time(metric[5m], 0, 5m)
```

To make the default value appear immediately at the next step, use the `$__interval`
variable. Use this variable when building dashboards for sporadic metrics where you
want to show a sensible default, rather than a gap when data hasn't arrived recently.
For example, use `$__interval` when retrieving batch job results or completing
periodic health checks.

```text theme={null}
__default_over_time(metric[5m], 0, $__interval)
```

For long range queries, `$__interval` might exceed the range selector, which could
prevent default values from being inserted.

## `__histogram_observations()`

The `__histogram_observations(lower scalar, upper scalar, v instant-vector)` function
returns the number of observations that fall within a specified value range from
histogram metrics. This function works with both native Prometheus histograms and
[classic histograms](/control/shaping/shape-metrics/types#histogram), which helps to answer
questions like "How many requests took between 100 ms and 500 ms?"

For example, the following query calculates the number of HTTP requests over the last
hour (`1h`) with a duration between 0 and 200 ms (`0.2`):

```text theme={null}
__histogram_observations(0, 0.2, rate(http_request_duration_seconds[1h]))
```

The `lower` and `upper` arguments define the boundaries of your range (inclusive).
The function interpolates within histogram buckets to estimate the count, so the
bounds don't need to align exactly with your histogram bucket boundaries. The metric
name is dropped from the result.

Use these capabilities when you need to analyze specific segments of your histogram
distribution, such as counting requests in your SLO target range or identifying
requests in problematic latency bands.

The function is similar to the PromQL
[`histogram_fraction()` function](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_fraction),
which returns an estimated fraction of observation between the given bounds. For
instance, a less efficient query using `histogram_fraction()` that's equivalent to the
previous `__histogram_observations()` query requires multiplying the result by the
total count of observations:

```text wrap theme={null}
histogram_fraction(0, 0.2, rate(http_request_duration_seconds[1h])) * histogram_count(rate(http_request_duration_seconds[1h]))
```

## `__histogram_quantiles()`

<Note>
  The `__histogram_quantiles()` function is deprecated, and is only maintained for
  backwards compatibility. For new queries, use
  [`histogram_quantiles()`](/investigate/querying/promql/experimental-functions#histogram_quantiles)
  instead, which provides the flexibility to choose your own label name.
</Note>

The experimental `__histogram_quantiles()` function calculates and returns multiple
quantiles from Chronosphere histograms (native or exponential histogram) or classic
Prometheus histograms in a single query, eliminating the need to write multiple queries
to plot multiple quantiles.

For example, to create a dashboard panel for API latency that might need to visualize
`p50`, `p90`, `p95`, and `p99`, write the query using `__histogram_quantiles()` as:

```text wrap theme={null}
# Chronosphere histogram (native or exponential histogram)
__histogram_quantiles(sum(rate(my_histogram{foo="bar"}[5m])), .5, .9, .95, .99)

# Classic Prometheus histogram
__histogram_quantiles(sum by(le) (rate(my_histogram_seconds_bucket{foo="bar"}[5m])), .5, .9, .95, .99)
```

The `__histogram_quantiles()` function returns a result for each quantile differentiated
by `__hist_quantile__`, a synthetic label whose value is the quantile argument used
to compute the given result. For example, the example query might return:

| Time                | `__hist_quantile__` | Value                 |
| ------------------- | ------------------- | --------------------- |
| 2025-08-06 11:32:50 | 0.500               | 0.0025                |
| 2025-08-06 11:32:50 | 0.900               | 0.0045000000000000005 |
| 2025-08-06 11:32:50 | 0.950               | 0.004749999999999999  |
| 2025-08-06 11:32:50 | 0.990               | 0.00495               |

## `cardinality_estimate()`

The `cardinality_estimate()` function returns the count estimate of elements in the
given instant vector. For example, `cardinality_estimate(vec{})` returns the estimate
cardinality of the `vec` metric.

Use the `cardinality_estimate` function in the following ways:

* To help approximate cardinality for specific metrics, labels, or label-value pairs
  over time that can't be correlated using the Persisted Cardinality Quotas
  dashboard.
* To have a general trend of your cardinality growth over time, because this function
  can return results for millions of time series.

Don't use this function to help understand the relative cardinality impact of a
particular series on your license. Instead, use the
[Persisted Cardinality Quotas dashboard](/observe/dashboards/managed-dashboards#persisted-cardinality-quotas)
to understand cardinality costs across specific teams, services, and pools, and to
help pinpoint specific sources of cardinality growth, such as a particular pool or
priority group.

<Note>
  The `cardinality_estimate` function doesn't measure cardinality in the same
  150-minute rolling time window used by license metrics.

  Instead, this function approximates relative cardinality using 120-minute disjointed
  blocks, which can create drift. When looking over historical periods of time, the
  `cardinality_estimate` function uses even longer blocks.
</Note>

This function supports grouping time series by labels, and returns an estimate
cardinality for each unique value of the label using the `by` clause in a query. For
example, the following query returns the cardinality estimate of all time series that
match the metric name with a value for the `device` label equal to `eth0`, grouped by
unique values for the `k8s_cluster` label:

```text theme={null}
cardinality_estimate(node_network_receive_bytes_total{device="eth0"}) by (k8s_cluster)
```

You can't group by [derived telemetry](/investigate/querying/metrics/derived-telemetry) with
this function.

### Counting and downsampling

The `cardinality_estimate` function isn't a direct alternative to the `count`
function. Because it's mostly optimized for performance and low-latency use cases,
results might not be exact. This function also returns results with much lower
resolution than the `count` function. The resolution aligns with the index block
size.

<Note>
  This function provides an alternative to the Prometheus `count_over_time` function,
  which isn't performant when viewing time series with high cardinality.
</Note>

The `cardinality_estimate` is affected by
[long term downsampling](/control/shaping/shape-metrics/downsampling) of the data it's based on,
and results might change based on the querying window's time range. When querying the
raw namespace, this function returns the count of time series over a two-hour period.
However, when querying the downsampled namespace, this function returns the count of
time series over a period between 24 hours and four days, which makes the volume look
much larger than it actually is.

## `cumsum()`

The `cumsum(v instant-vector)` function returns the cumulative sum of values over
time for each series. Use this function to return a running total of a metric across
your query time range, rather than point-in-time values.

For example, if you have a metric tracking errors and want to see how the total error
count accumulates over a full day:

```text theme={null}
cumsum(sum_over_time(request_error_count{}[$__interval]))
```

This query transforms a series like `1, 2, 1, 3` into `1, 3, 4, 7`. You might use
this to visualize cumulative counts for delta counter metrics, like request counts,
accumulated bytes transferred, or total events processed since the start of your
query window. The function only processes float values, and ignores histograms.

## `ewma()`

The `ewma(v range-vector, span scalar)` function computes an exponentially weighted
moving average, which smooths noisy time series data by giving more weight to recent
observations while still incorporating historical values. Use this function to filter
noise in volatile metrics and expose the underlying trend.

The `span` parameter controls how quickly the average adapts to new values. A smaller
span reacts faster to changes, while a larger span provides more smoothing. The span
must be greater than `1`. For example, to apply a 10-period EWMA to smooth a noisy
memory usage metric:

```text theme={null}
ewma(container_memory_usage_bytes{}[5m], 10)
```

A shorter span like `ewma(container_memory_usage_bytes[10m], 5)` tracks changes more
closely, which helps with metrics where you want to detect shifts quickly but still
reduce noise. The smoothing factor is computed as `2 / (span + 1)`, so a span of 10
gives approximately 18% weight to each new value.

## `head_{agg}`

`head_{agg}(q, n)` sorts the time series by the largest value based on the specified
aggregation function and returns the top `n` number of series.

The list of available `head_{agg}` functions are:

* `head_avg`
* `head_min`
* `head_max`
* `head_sum`
* `head_count`

For example, `head_avg(MY_METRIC{}, 10)` returns the top 10 time series sorted by the
largest average of their values.

In most cases, `head_{agg}()` is appropriate. However, if you have time series with a
high churn rate, such as metrics that track Kubernetes pod level data, use `topk()`.
This is because the `head_{agg}` family of functions aggregates across all time
series in the graph, and if you have a metric with high churns, you can miss outliers
(depending on their values). In contrast, `topk()` takes the top `x` time series
based on their value at each timestamp.

## `piecewise_constant()`

The `piecewise_constant(v instant-vector)` function approximates your time series as
a step function with constant-valued segments, effectively identifying when your
metric shifts from one level to another. Use this function to detect capacity
changes, configuration updates, or other events that cause a metric to move between
stable states.

For example, if you want to identify when your connection count changes levels:

```text theme={null}
piecewise_constant(active_database_connections)
```

This example uses a bottom-up greedy merging algorithm that starts with small
segments, and combines adjacent ones when they have similar values. This usage
automatically detects how many distinct levels exist in your data and where
transitions occur. A metric that oscillates around a stable value is represented as a
flat line, while a metric that shifts between states (like 100 connections, then 200
connections, then back to 100) shows clear steps.

Use this function to identify when someone scaled your app (causing a step change in
resource usage) or to detect when batch processing jobs complete (causing drops in
queue depth).

## `robust_trend()`

The `robust_trend(v instant-vector)` function is similar to
[`trend_line`](#trend_line), but uses a robust regression technique (Huber loss with
Iteratively Reweighted Least Squares) that resists the influence of outliers. This
function is ideal when your data contains occasional spikes or anomalies that
shouldn't affect the overall trend calculation. When data is perfectly linear or has
no outliers, it produces results similar to `trend_line`.

For example, if your error rate has occasional large spikes that don't represent the
true trend:

```text theme={null}
robust_trend(error_rate)
```

This usage is particularly valuable for metrics like network latency that might have
occasional dramatic spikes due to transient issues, or for error rates that have
periodic anomalous bursts.

## `tail_{agg}`

`tail_{agg}` sorts the time series by the largest value based on the specified
aggregation function and returns the bottom `n` number of series.

The list of available `tail_{agg}` functions are:

* `tail_avg`
* `tail_min`
* `tail_max`
* `tail_sum`
* `tail_count`

For example, `tail_avg(MY_METRIC{}, 10)` returns the bottom 10 time series sorted by the
largest average of their values.

In most cases, `tail_{agg}()` is appropriate. However, if you have times series with
a high churn rate, such as metrics that track Kubernetes pod level data, use
`bottomk()`. This is because the `tail_{agg}` family of functions aggregates across
all time series in the graph, and if you have a metric with high churns, you can miss
outliers (depending on their values). In contrast, `bottomk()` takes the bottom `x`
time series based on their value at each timestamp.

## `trend_line()`

The `trend_line(v instant-vector)` function fits an Ordinary Least Squares (OLS)
regression through your time series data and returns the fitted trend line values.
Use this function to help identify whether a metric is generally increasing,
decreasing, or stable, even when the raw data is sporadic.

For instance, to see the linear trend of memory usage over time:

```text theme={null}
trend_line(container_memory_usage_bytes{})
```

Use this function to compare actual values against the trend, which can help detect
when a metric deviates from its expected trajectory. For example, the following
function compares current request rates to the linear trend. Values greater than one
exceed the trend, and values less than one don't meet it.

```text theme={null}
rate(requests[5m]) / trend_line(rate(requests[5m]))
```

This use of the function helps with capacity planning, understanding long-term metric
behavior, or creating baselines. The function requires at least two data points.
Single-point series are returned unchanged.

## `sum_per_second()`

`sum_per_second()` calculates the per-second rate for a delta counter or delta histogram
time series. It's equivalent to dividing the result of `sum_over_time()` by the
sliding time window duration.

Assuming a step value of `5m`, these PromQL queries return the same result:

```text theme={null}
sum_per_second(http_request_count{}[5m])

sum_over_time(http_request_count{}[5m]) / 300
```

<Note>
  To ensure the chart value at each step represents the sum of observations for each
  step's start and end time, you **must** set the query's step size to be equal to
  the sliding time window value. For more guidance, see
  [Best practices for adding dashboard charts](/investigate/querying/metrics/delta-queries#best-practices-for-adding-dashboard-charts).
</Note>

## `zscore()`

The `zscore(v instant-vector) [by|without (labels)]` function computes the standard
score (z-score) for each series in a group, telling you how many standard deviations
each value is from the group mean. Use this function to identify which members of a
group are outliers, like when services are behaving differently from the rest.

The z-score is calculated as `(value - mean) / stddev` across all series in a group
at each timestamp. Values are mapped to the following averages:

| Value | Description                          |
| ----- | ------------------------------------ |
| `0`   | The value is exactly average         |
| `+1`  | One standard deviation above average |
| `-1`  | One standard deviation below average |

Values beyond `±2` or `±3` are typically considered outliers.

For example, to find which nodes have unusual CPU usage:

```text theme={null}
zscore(node_cpu_seconds_total)
```

This query compares all nodes at each point in time and shows which ones deviate
from the group average. You can use grouping to compute z-scores within subgroups:

```text theme={null}
zscore(http_request_duration_seconds) by (datacenter)
```

This query compares services within each `datacenter` separately, so you can
identify outliers per region rather than globally. This is helpful when different
groups have different normal ranges.

To create alerts for outliers, you might use this query:

```text theme={null}
abs(zscore(response_time) by (service)) > 2
```

This alert triggers when any service's response time is more than two standard
deviations from the mean for its group, helping detect services that are performing
unusually poorly or well.

The function returns not a number (`NaN`) when the standard deviation is zero (all
values in the group are identical), or when a group contains only a single series.
This is
equivalent to the following manual calculation, but more concise:

```text theme={null}
(metric - on() group_left avg(metric)) / on() group_left stddev(metric)
```

## Other querying features

Observability Platform also provides querying features beyond those covered by using
query languages in its user interface.

* **[Delta queries](/investigate/querying/metrics/delta-queries):** Query metrics that employ
  delta temporality, as opposed to cumulative temporality.
* **[Alert metrics](/investigate/querying/metrics/alert-metrics):** Query the `ALERTS`
  metric for triggered alert series and metadata.
* **[Prometheus API access](/tooling/prometheus-api):**
  Interact directly with Prometheus API endpoints for programmatic workflows.
