Metrics Explorer

Metrics Explorer

The Metrics Explorer helps you to investigate query requests and responses, and query statistics. Use the Metrics Explorer to help debug queries before you add them to a dashboard or monitor.

To use the Metrics Explorer tool, you must have administrative privileges. In the navigation menu, click Go to Admin and then select Explorers > Metrics Explorer.

Metrics sources

The Metrics Explorer defaults to Prometheus metrics. To switch your metrics source, select an option from the dropdown next to Explore. For example, if you're ingesting Graphite metrics, switch the data source to Chronosphere Graphite. The display changes slightly depending on your source.

Query field

The Query field supports autocomplete for metric names and functions. Click an autocomplete result to update a query with a suggestion. To execute a query, press Shift+Enter (Shift+Return on macOS).

The query field makes suggestions regarding how to improve a query based on these factors:

  • For counters (monotonically increasing metrics), a rate function is suggested.
  • For buckets, a histogram function is suggested.

Chronosphere provides internal metrics for troubleshooting queries.

⚠️

When typing in the query field with a label filter, some special characters such as a plus sign ( + ) or open parenthesis ( ( ) might cause Chronosphere to display an error message, like:

An unexpected error happened
DetailsSyntaxError: Invalid regular expression:

To work around this issue, use the dropdown to select a value, or copy and paste the value from another location.

To run an additional query, click Add query.

Click Remove query to remove a secondary query, or click Clear all to remove all query conditions.

Explore metrics

For Prometheus metrics, on the left side of the query field, the Metrics menu opens the Metrics Explorer that shows a hierarchical menu with metrics grouped by their prefix. This is a good starting point if you want to explore which metrics are available.

Metrics menu

Selecting metrics from the Metrics menu adds them to the Query field.

Click Edit in Query Builder to open queries in the Query Builder.

Chronosphere includes a Query Builder you can use to construct, optimize, and debug queries before saving and using them, and for sharing queries with your team.

For details, see Query Builder.

For Graphite, click the text box next to Series and select a metric. To add functions, click the + next to Functions and edit the selected function to include your values.

Query type

To update the type of query, select from the following options:

  • Range
  • Instant
  • Both

Add a Step size in the text field to change the query step size.

Split and compare queries

The split view provides a way to compare queries and their graphs and tables side-by-side or to review related data together on one page.

To open the split view, click Split to duplicate the current query and split the page into two side-by-side queries.

Split query

In split view, you can link the time pickers for both panels by clicking one of the time-sync buttons attached to the time pickers.

To close the newly created query, click Close Split.

Share queries

Complicated queries can generate long URLs that are difficult to share effectively. Click Copy URL to clipboard to copy a short URL to your clipboard you can share with other users. When a query matches a query that was previously shortened, the existing short URL is reused.

The Chronosphere app permanently stores short URLs in your tenant so that they don't expire.

Query history

Click Query history for a list of queries that you have used in Explore local to your browser.

Query history overview

For each individual query, you can:

  • Click Run query to re-run a query.
  • Click the text bubble to create or edit a comment.
  • Copy a query to the clipboard.
  • Click the trash can to remove a query from the list.
  • Click the Star to save a query. Starred queries show in the Starred tab.

By default, query history shows the most recent queries. You can change this from the Sort queries by dropdown menu. The menu defaults to Newest first.

Search previous queries using the search box at the top of the list.

Use the slider to the left of the query list to filter queries over time.

Query history settings

You can change the following settings for query history from the Settings tab:

  • The period of time to save query history (default: 1 week)
  • The default active tab (default: Query history tab)
  • Only show queries for active data source (default: true)
  • Clear query history

Inspector

The Inspector helps you understand and troubleshoot queries. Available options are:

  • An overview of Stats for the query, including:
    • Total request time
    • Data processing time
    • Number of queries
    • Total number rows
  • The Query inspector, which lets you to inspect the raw data.
  • The JSON tab, to export the query as JSON.
  • The Data tab, which shows raw data. Click Download CSV to export the data to as a comma-separated values (CSV) file.

Available metrics for troubleshooting

Additional metrics in your environment track the overall health of alerting and recording rules that you've configured. The following examples are based on Prometheus queries and troubleshooting.

Each metric has multiple labels you can use for slicing and monitoring, in the following format:

  • metric_name: metric_description
  • label_name: label_description + use

Visit Prometheus metric naming recommendations for more details about naming metrics.

Chronosphere provides the following metrics for troubleshooting:

  • prometheus_rule_group_last_duration_seconds: a gauge metric that holds the total time the group took to complete its last iteration, in seconds
    • rule_group: the group that this rule belongs to
  • prometheus_rule_evaluation_duration_seconds: a summary metric to track the average time an individual rule takes to evaluate
  • prometheus_rule_evaluations_total: the total number of individual rule evaluations that occur
    • rule_group: the group that this rule belongs to
  • prometheus_rule_group_iterations_missed_total: the total number of rule group evaluations missed due to slow rule group evaluation
    • rule_group: the group that this rule belongs to
  • prometheus_rule_group_iterations_total: the total number of scheduled rule group evaluations, whether executed or missed
    • rule_group: the group that this rule belongs to
  • prometheus_rule_eval_failures_total: the total number of individual rule evaluation failures
    • rule_group: the group that this rule belongs to
    • type: alerting or recording depending on the type of rule
    • identifier: the slug for the given rule
    • status_code: the status code associated with the given evaluation failure

Examples

The following examples explain how to create alerts for particular situations:

  • Consistent rule failures

    To receive an alert whenever an individual rule consistently fails for five minutes, create a monitor with the following query:

    sum by (identifier) (rate(prometheus_rule_eval_failures_total[1m]))

    with a Sustain of 5m.

    You can also create an alert for monitoring individual rule failures:

    sum by (identifier) (rate(prometheus_rule_eval_failures_total{type="<alerting|recording>"}[1m])
  • Create alerts by rule type If you want to alert on certain types of rules, you can do something like:

    sum by (identifier) (rate(prometheus_rule_eval_failures_total{identifier=~"<your-regex-here>"}[1m]))

    For example, to create an alert only on a specific category of rules, you can do something like:

    sum by (identifier) (rate(prometheus_rule_eval_failures_total{type="<alerting|recording>"}[1m])