Using PromQL in Observability Platform
Prometheus Query Language (PromQL) lets you select and aggregate time series data in real time.
This documentation provides an introduction to PromQL and the features and syntax most relevant to Chronosphere Observability Platform. For full details about PromQL, read the Prometheus documentation (opens in a new tab).
Observability Platform is fully compatible with PromQL (opens in a new tab). If you need labels to differentiate between clusters, namespaces, and other items, the PromQL queries in existing dashboards, alerts, and recording rules need to reflect this.
All panel types except Markdown and Service topology use queries.
Queries that return an extremely large number of data points or invalid results can result in panel errors. For example, a query might return an error for exceeding server resource limits.
Observability Platform reports these errors with an icon that appears in the corner of the Preview pane of the Add panel or Edit panel interfaces, or on the panel when viewing it on the dashboard. Hold the pointer over the icon to view the error message.
Basic querying
All PromQL expressions start with a selector, which is typically the name of the metric followed by a set of labels, values, and operators. A query returns one or more time series that match the criteria. PromQL refers to these as instant vectors because they represent a set of time series where every single data point maps to a timestamp at that instant.
|<----------metric name--------->|<---labels--->|
rate(node_network_receive_bytes_total{device=~"eth1"} [5m]) * 8
|<-------------------selector------------------>| |<->|
|<-----------------------function------------------------>| ^ operator
For example, an app updates a metric to keep a total of the bytes received by a
particular node. The app names the metric node_network_receive_bytes_total
.
The following query returns all time series that match the metric name:
node_network_receive_bytes_total{}
To return only time series with a specific label and value, include them in curly
brackets ({}
) after the metric name.
The following query returns all time series that match the metric name, and which
also have a device
label whose value is equal to eth1
:
node_network_receive_bytes_total{device="eth1"}
PromQL supports dozens of arithmetic, comparison, logical, and aggregation operators. The Prometheus documentation (opens in a new tab) provides a full list of operators.
You can combine multiple labels and values with commas.
The following query returns all time series that match the metric name, and which
also have both a device
label whose value is equal to eth1
and an instance
label whose value is equal to production
:
node_network_receive_bytes_total{device="eth1", instance="production"}
PromQL supports regular expression matching and negation in label queries:
=
: Select labels equal to the provided string.!=
: Select labels not equal to the provided string.=~
: Select labels that match the provided string based on a regular expression.!~
: Select labels that don’t match the provided string based on a regular expression.
The following query returns all time series that match the metric name, and which
also have a device
label whose value matches eth
followed by any non-zero number
of other characters:
node_network_receive_bytes_total{device=~"eth.+"}
Filter values to a range
Query results often must be filtered to a narrower time span or time-range offset. PromQL refers to these as range vectors because they represent a set of time series where every timestamp maps to a range of data points from some point in the past, as determined by the query.
The following query returns all time series with an offset
of an hour (1h
)
ago that match the metric name, and which also have a device
label whose value
matches eth0
:
node_network_receive_bytes_total{device="eth0"} offset 1h
For details about the offset
modifier, see the
PromQL documentation (opens in a new tab).
You can use PromQL to return a range vector, then use a function to perform calculations on the resulting range of time series.
The following query returns all time series from the last ten minutes that match
the metric name, and which also have a device
label whose value matches eth0
:
node_network_receive_bytes_total{device="eth0"}[10m]
Applying functions
PromQL functions (opens in a new tab) perform calculations with and on your metrics data, letting you complete complex processing with pre-built operations.
Each function takes different arguments, but typically at minimum, an instant or range vector. You can use standard arithmetic and binary comparison operators inside and outside the function such as addition, subtraction, greater than, or less than. Using operators is one of the main methods to perform calculations on a combination of different time series. However if you apply the operator to more than one instant vectors, it applies only to matching series.
You can find more information about matching series and vectors in the PromQL documentation (opens in a new tab).
PromQL has dozens of functions, but a popular one is rate()
, which calculates the
per-second average rate of increase of the multiple time series in a range vector.
The following query calculates the per-second average rate of increase over the last
10 minutes for the matching metric name with a device
label whose value is equal
to eth0
:
rate(node_network_receive_bytes_total{device="eth0"}[10m])
Aggregating time series
Use PromQL aggregation functions to reduce the elements in a vector returned by a query.
For example, a popular aggregation function is sum()
, which totals the values of
resulting time series from a query and returns one element.
The following query returns the total values of all time series with an offset of
five minutes ago that match the metric name with a value for the device
label
that matches the value eth0
:
sum(node_network_receive_bytes_total{device="eth0"} offset 5m)
Another aggregation function is avg()
, which averages the values of resulting time
series from a query and returns one element.
You can group time series by labels, returning an element for each unique value of
the label using the by
or without
clause in a query.
by
: Groups time series by the labels you specify.without
: Groups or every other labels that has differing values.
The following query returns the average values of all time series that match the
metric name with a value for the device
label equal to eth0
grouped by unique
values for the k8s_cluster
label:
avg(node_network_receive_bytes_total{device="eth0"}) by (k8s_cluster)
The interval you define in functions such as rate()
and increase()
must be greater
than or equal to the scrape interval of the metrics which you apply the function
to. The recommendation is to use at least twice the scrape interval.
Querying histograms
The Observability Platform histogram metric type persists a histogram as one data point and one time series. Query methods depend on the type of histogram you’re querying.
Querying histogram metric types
A histogram of the histogram metric type is a single structured value that contains all of the information about the histogram. The Observability Platform histogram metric type supports Prometheus native histograms and OpenTelemetry exponential histograms.
To query histograms in Observability Platform, use PromQL histogram functions (opens in a new tab).
The following querying examples use a histogram metric named http_request_duration_seconds
.
If the metric being queried instead uses delta temporality, replace uses of the
rate()
function in these examples with sum_per_second()
and ensure that the
step value equals the sliding time window’s value. For more information, see
Querying delta temporality metrics.
Rate of HTTP requests
Use the histogram_count()
function to calculate the rate of HTTP requests:
histogram_count(sum(rate(http_request_duration_seconds{}[5m])))
Average HTTP request duration
Use the histogram_avg()
function to query the average HTTP request duration:
histogram_avg(sum(rate(http_request_duration_seconds[5m])))
90th percentile HTTP request duration
Use the histogram_quantile()
function to query the 90th percentile HTTP request
duration by HTTP method and request path:
histogram_quantile(0.9, sum(rate(http_request_duration_seconds[5m])))
Percentage of HTTP requests under given latency
Service level objectives are commonly defined in tolerances by percentile, such
as delivering 90% of requests in less than 200 ms and 99% of requests in less than
500 ms. Use the histogram_fraction()
function to calculate the percentage of requests
with responses in 200 ms or less:
histogram_fraction(0, 0.2, sum(rate(http_request_duration_seconds[5m])))
Querying legacy Prometheus histograms
If you’re querying a histogram with a metric name ending in _bucket
, you’re querying
a legacy Prometheus histogram.
A legacy Prometheus histogram is composed of individual counter time series and
stored as separate time series. For example, if your histogram aggregates HTTP
request observations and is named http_request_duration_seconds
, the resulting
time series is:
http_request_duration_seconds_bucket
with a time series for each unique bucket. The time series has a label namedle
whose value represents the bucket’s upper boundary.http_request_duration_seconds_sum
, the sum of all observed values.http_request_duration_seconds_count
, the total count of all observed values.
The scrape endpoint exposes:
http_request_duration_seconds_bucket{le="0.1"} 2764
http_request_duration_seconds_bucket{le="0.25"} 3653
http_request_duration_seconds_bucket{le="0.5"} 8735
http_request_duration_seconds_bucket{le="0.75"} 12763
http_request_duration_seconds_bucket{le="1"} 13172
http_request_duration_seconds_bucket{le="+Inf"} 13865
http_request_duration_seconds_sum 7732
http_request_duration_seconds_count 13865
Using http_request_duration_seconds
as an example, you can write the following
PromQL queries:
Rate of HTTP requests (legacy)
Use the rate()
function and the _count
time series to calculate the rate of HTTP
requests:
sum(rate(http_request_duration_seconds_count[5m]))
Average HTTP request duration (legacy)
Query the average HTTP request duration by diving the sum of observations by the count of observations:
sum(rate(http_request_duration_seconds_sum[5m])) / sum(rate(http_request_duration_seconds_count[5m]))
90th percentile HTTP request duration (legacy)
Use the histogram_quantile() function to get the 90th percentile HTTP request duration by HTTP method and request path:
histogram_quantile(0.9, sum by(le)(rate(http_request_duration_seconds_bucket[5m])))
Experimental PromQL functions and operators
Prometheus makes experimental functions and operators available behind a feature flag. The Prometheus project might change the name, syntax, or semantics of experimental functions or remove them entirely.
Observability Platform accepts a subset of PromQL experimental functions determined to be relatively stable, and which provide significant value while also being at a low risk of being removed by the Prometheus project.
When the upstream Prometheus project makes breaking changes, Observability Platform will either preserve backward compatibility if possible or notify users and provide a deprecation timeline.
The following experimental functions are available in Observability Platform:
double_exponential_smoothing()
Use the double_exponential_smoothing()
experimental function to smooth a time series
based on the importance you assign to older data and possible trends. See the
PromQL documentation (opens in a new tab).
holt_winters()
(deprecated)
As of Prometheus 3.0, this function is named double_exponential_smoothing()
.
While Observability Platform aliases holt_winters()
to provide backward compatibility,
you should replace it with the new name to avoid issues when Observability Platform
removes the holt_winters()
alias.
histogram_*()
(native histogram functions)
All Prometheus native histogram functions remain behind a feature flag in the Prometheus project. Observability Platform has made the native histogram functions generally available for users to query OpenTelemetry exponential histograms and Prometheus native histograms.
For details on using these functions in Observability Platform and examples, as well as other supported histogram types, see Querying histograms.
Other querying features
Observability Platform also provides querying features beyond those covered by using query languages in its user interface.
- Delta queries: Query metrics that employ delta temporality, as opposed to cumulative temporality.
- Prometheus API access: Interact directly with Prometheus API endpoints for programmatic workflows.