Querying

Overview of querying

When you create dashboards, monitors, or use the metrics explorer, you use a query to select and filter the metrics in which you're interested, and then apply functions to them.

Chronosphere supports these languages for querying metrics data:

  • Prometheus Query Language (PromQL) - Used with Prometheus-compatible metrics. Default.
  • Graphite - Used with StatsD metrics data. Enabled on request.

Contact Chronosphere Support to set up the data sources you want to use.

To toggle the query language in use, use the method based on what you're working on:

  • Dashboards: Use the language toggle in the Query panel of a dashboard.
  • Monitors: Use the language toggle by the Query box.
  • Metrics Explorer: Use the language toggle to the right of the Explore page title.

Although Chronosphere supports both PromQL and Graphite, this document provides information only about the use of PromQL. For more details about Graphite, read the full documentation (opens in a new tab).

PromQL overview

PromQL lets you select and aggregate time series data in real time.

This documentation covers an introduction to PromQL and the features and syntax most relevant to Chronosphere. For full details about PromQL, read the documentation (opens in a new tab).

Chronosphere is 100% compatible with PromQL (opens in a new tab). If you need labels to differentiate between clusters, namespaces, and other items, the PromQL queries in existing dashboards, alerts, and recording rules need to reflect this.

Basic querying

All PromQL expressions start with a selector, which is typically the name of the metric followed by a set of labels, values, and operators. A query returns one or more time series that match the criteria. PromQL refers to these as instant vectors, as they represent a set of time series where every single data point maps to a timestamp at that instant.

|<------------name------------->|<---labels--->|

rate(node_network_receive_bytes_total{device=~"eth1"} [5m]) * 8

|<-------------------selector------------------>|   |<->|
|<-----------------------function------------------------>|  ^ operator

For example, an app updates a metric to keep a total of the bytes received by a particular node. The app names the metric node_network_receive_bytes_total.

The following query returns all time series that match the metric name:

node_network_receive_bytes_total{}

To return only time series with a specific label and value, include them in curly brackets ({}) after the metric name.

The following query returns all time series that match the metric name with a value for the device equal to eth1:

node_network_receive_bytes_total{device="eth1"}

PromQL supports dozens of arithmetic, comparison, logical, and aggregation operators. You can find a full list in the documentation (opens in a new tab).

You can combine multiple labels and values with commas.

The following query returns all time series that match the metric name with a value for the device label equal to eth1 and a value for the instance label equal to production:

node_network_receive_bytes_total{device="eth1", instance="production"}

PromQL supports regular expression matching and negation in label queries:

  • = : Select labels equal to the provided string.
  • != : Select labels not equal to the provided string.
  • =~ : Select labels that match the provided string based on a regular expression.
  • !~ : Select labels that don't match the provided string based on a regular expression.

The following query returns all time series that match the metric name with a value that begins with eth followed by any other character for the device label:

node_network_receive_bytes_total{device=~"eth.+"}

Filter values to a range

Typically you need to filter results to a narrower range based on a time range or time range offset. PromQL refers to these as range vectors, as they represent a set of time series where every timestamp maps to a range of data points from some point in the past determined by the query.

The following query returns all time series with an offset of an hour ago that match the metric name with a value for the device label that matches the value eth0:

node_network_receive_bytes_total{device="eth0"} offset 1h

Find more details about the offset modifier in the PromQL documentation (opens in a new tab).

Generally you use PromQL to return a range vector to then use with a function to perform calculations on the resulting range of time series.

The following query returns all time series from the last ten minutes that match the metric name with a value for the device label that matches the value eth0:

node_network_receive_bytes_total{device="eth0"}[10m]

Applying functions

PromQL functions (opens in a new tab) let you perform calculations with and on your metrics data, allowing you to perform complex processing with pre-built operations.

Each function takes different arguments, but typically at minimum, an instant or range vector. You can use standard arithmetic and binary comparison operators inside and outside the function such as addition, subtraction, greater than, or less than. Using operators is one of the main methods to perform calculations on a combination of different time series. However if you apply the operator to more than one instant vectors, it only applies to matching series.

You can find more information about matching series and vectors in the PromQL documentation (opens in a new tab).

PromQL has dozens of functions, but a popular one is rate() that calculates the per-second average rate of increase of the multiple time series in a range vector.

The following query calculates the per-second average rate of increase over the last 10 minutes to the matching metric name with a value for the device label equal to eth0:

rate(node_network_receive_bytes_total{device="eth0"}[10m])

Aggregating time series

Use PromQL aggregation functions to reduce the elements in a vector returned by a query.

For example, a popular aggregation function is sum() that totals the values of resulting time series from a query and returns one element.

The following query returns the total values of all time series with an offset of five minutes ago that match the metric name with a value for the device label that matches the value eth0:

sum(node_network_receive_bytes_total{device="eth0"} offset 5m)

Another aggregation function is avg() that averages the values of resulting time series from a query and returns one element.

You can group time series by labels, returning an element for each unique value of the label using the by or without clause in a query.

  • by: Groups time series by the labels you specify.
  • without: Groups or every other labels that has differing values.

The following query returns the average values of all time series that match the metric name with a value for the device label equal to eth0 grouped by unique values for the k8s_cluster label:

avg(node_network_receive_bytes_total{device="eth0"}) by (k8s_cluster)

The interval you define in functions such as rate() and increase() must be greater than or equal to the scrape interval of the metrics which you apply the function to. The recommendation is to use at least twice the scrape interval.

Custom functions

In addition to the standard PromQL functions are the following custom functions:

  • head_{agg}

    head_{agg}(q, n) sorts the time series by the largest value based on the specified aggregation function and returns the top n number of series.

    The list of available aggregation functions are:

    • avg
    • min
    • max
    • sum
    • count

    For example, head_avg(foo{}, 10) returns the top 10 time series sorted by the largest average of their values.

    In most cases, head_{agg}() is appropriate. However, if you have time series with a high churn rate, such as metrics that track Kubernetes pod level data, use topk(). This is because the head_{agg} family of functions aggregates across all time series in the graph, and if you have a metric with high churns, you can miss outliers (depending on their values). In contrast, topk() takes the top x time series based on their value at each timestamp.

  • tail_{agg}

    tail_{agg} sorts the time series by the largest value based on the specified aggregation function and returns the bottom n number of series.

    The list of available aggregation functions are:

    • avg
    • min
    • max
    • sum
    • count

    For example, tail_avg(foo{}, 10) returns the bottom 10 time series sorted by the largest average of their values.

    In most cases, tail_{agg}() is appropriate. However, if you have times series with a high churn rate, such as metrics that track Kubernetes pod level data, use bottomk(). This is because the tail_{agg} family of functions aggregates across all time series in the graph, and if you have a metric with high churns, you can miss outliers (depending on their values). In contrast, bottomk() takes the bottom x time series based on their value at each timestamp.

  • cardinality_estimate

    cardinality_estimate returns the count estimate of elements in the given instant vector. For example, cardinality_estimate(vec{}) returns the estimate cardinality of the vec metric.

    cardinality_estimate supports grouping time series by labels, returning an estimate cardinality for each unique value of the label using the by clause in a query.

    • by: Groups time series by the labels you specify.

    Grouping with without isn't supported.

    The following query returns the cardinality estimate of all time series that match the metric name with a value for the device label equal to eth0 grouped by unique values for the k8s_cluster label:

    cardinality_estimate(node_network_receive_bytes_total{device="eth0"}) by (k8s_cluster)

    cardinality_estimate isn't a direct alternative of the count function. It's mostly optimized for performance and low-latency use-cases, don't trust the results as exact. It also returns results with much lower resolution than count function. The resolution aligns with the index block size. cardinality_estimate is affected by long term downsampling of the data it's based on, and results may change based on the querying window's time range.

    The cardinality_estimate function is useful if you want to have a general trend of your cardinality growth over time, because it's able to quickly return results for millions of time series.

Access the Prometheus APIs

Chronosphere exposes the Prometheus API endpoints. Read the API overview documentation for more details.