Trace Metrics

You can use Trace Explorer to identify the root cause of an issue. As you define your query and determine what caused the problem you're investigating, you can create a trace metric from your query to display that metric in a dashboard for faster discovery. You can then configure monitors and alerts on that metric to notify other on-call engineers if that metric identifies anomalies in your trace data.

This ability to navigate from a trace metric to a specific query in Trace Explorer helps other on-call engineers address problems faster by using predefined queries to explore issues.

Use the following metric types to help track information for collected traces:

  • Counter metric: Increments any time a trace that matches a filter gets collected. This metric also includes a label describing whether or not the trace contains an error.
  • Histogram metric: Collects information about the distribution of request durations for traces that match a filter.

Chronosphere provides a default Trace Metrics dashboard with panels that display the requests, errors, and durations of requests associated with a created trace metric. The dashboard also includes a service map of the requests.

To explore tracing data in Chronosphere, you must either install and configure the Chronosphere-provided Collector or the OpenTelemetry Collector to receive trace data from your services.

View existing trace metrics

You can view trace metrics in the Chronosphere app and open the related query in Trace Explorer.

To return a list of defined trace metrics without additional information, use Chronoctl.

Viewing a trace metric opens the related query in Trace Explorer. Chronosphere takes the contextual information in metrics from a dashboard and uses it to build links to search for traces. Clicking the link opens Trace Explorer and replaces the variables with matching criteria defined in the link.

To view your existing trace metrics:

  1. In the navigation menu select Exploring > Trace Metrics.
  2. In the Rule Name column, click the name of a trace metric to view the dashboard for that metric.
  3. From the dashboard, click anywhere in one of the graph panels and then click Query Traces to navigate to the related query in Trace Explorer.

Create a trace metric

Chronosphere uses trace queries as the basis for trace metrics. You define a search query in Trace Explorer, and create a trace metric based on the matching criteria.

You can search for any labels you add to your trace metric in Metrics Explorer when investigating query requests and responses.

To create a new trace metric:

  1. In the navigation menu select Exploring > Trace Explorer.

  2. Apply one or more filters to define a query. The Trace Explorer applies the filters represented in the query before updating metrics for the created trace metric.

    You can group your trace query search results by up to two dimensions that can include service, operation, and any other span tag. Dimensions to group by are inherited from the defined search query.

    Trace metrics don't support regular expression operators, such as the match (=~) or doesn't match (!~) operators.

  3. Click Create Trace Metric.

  4. In the Create Trace Metric dialog, enter a display name (the equivalent of Prometheus metric __name__) and system name (the trace metric Rule Name) for your trace metric.

  5. In the Group by Labels section, enter a metric label to display the aggregated results of your query grouped by that attribute.

    For each dimension in the Group by Labels field, you must define a metric label key. Use the same key as the span tag, or a key that aligns with your existing metrics data. Chronosphere adds metric label keys to your trace metric, and associates the label value with the span tag.

    For example, if you group your query by the span tag Service and enter container_service as the label key, Chronosphere adds a label to your trace metric where the label values are the same as the span tag Service values. You can then query by that label anywhere you search for metric labels, such as in Metrics Explorer or Telemetry Usage Analyzer.

    ⚠️

    Avoid including high cardinality dimensions in your trace metric. Trace metrics only capture data for the first 1,000 values of unique series observed. Selecting a dimension to group by with high cardinality can overflow the 1,000 value trace metric cardinality limit and cause incorrect data to display in your trace metric dashboard.

  6. Optional: Expand the Add Labels section and enter key:value pairs in the Static Metric Labels section to add static labels to your trace metric. Static labels are like metadata you add to your trace metric.

  7. Click Create.

    After creating the trace metric, the Trace Metric Created dialog displays.

    It can take several minutes for the dashboard to display data about the trace metric.

  8. Click Go to Trace Metrics to view the list of available trace metrics. When your trace metric is available, you can select it from this list to view a dashboard for the trace metric that includes requests, errors, durations, and a topology map.

  9. When viewing your trace metric, click a data point on any of the provided graphs and then click Query Traces to open Trace Explorer with the query you defined for your trace metric.

Update a trace metric

To edit or update a trace metric:

  1. In the navigation menu select Exploring > Trace Metrics.
  2. To the right of the date of the trace metric, click the icon, and then click Edit.
  3. In the Update Trace Metric dialog, edit the query that generates the trace metric or make any other needed changes.
  4. To preview your updates to the query, click the View Query in Trace Explorer link.
  5. Click Save to save your changes.

Delete a trace metric

To delete a trace metric:

  1. In the navigation menu select Exploring > Trace Metrics.
  2. To the right of the date of the trace metric, click the icon, and then click Delete.

Chronosphere removes the metric from the Trace Metrics page. Deleting a trace metric also removes access to the metric in the Trace Metrics dashboard.

Trace metric rule fields

A trace metric rule consists of the following fields, which are properties of the trace_metrics_rule object. Each field is required unless otherwise noted. See the CreateTraceMetricsRule (opens in a new tab) API for a complete list of supported fields.

  • group_by: Labels for grouping and narrowing search results to specific attributes. See Group and narrow results for more information about grouping related attributes.

    • key: The key to group by.
      • named_key: The name of the key to group by.
      • type: The type of key to group by.
    • label: The dimension for displaying the aggregated results of your query grouped by that attribute in resulting trace metrics. Defaults to the selected service.
  • histogram_buckets_seconds: Optional: An array of custom buckets measured in seconds for duration histogram metrics. You can set these if the default buckets aren't appropriate for your data. The default buckets are [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10].

  • metric_labels: Optional: Static labels applied to the metrics, defined as an object of one or more label name/value pairs.

  • metric_name: The metric's name as stored in the metrics database. Use this name to refer to the metric in PromQL queries, such as in dashboards, monitors, and the Metrics Explorer. The metric_name must follow Prometheus metric naming rules (opens in a new tab).

    Although you can modify the metric_name after creation, any data points generated before the name change aren't renamed. The new metric name is instead used to generate new data points starting from that point in time. The old metric name doesn't add new data, but you can still query it for past data points.

  • name: The name of the trace metric rule. You can modify this name after creating the trace metric rule.

  • slug: Optional: The slug for the trace metric rule. After you create the rule, the slug is immutable. If omitted, Chronosphere generates a slug at creation time that's based on the name.

  • trace_filter: A filter object that evaluates traces against several criteria and emits metrics for only those that match. The filter's capabilities are similar to the filter in the Trace Explorer, except it supports only exact matches on string fields instead of regular expressions. This object supports additional optional object fields.

Optional trace filter object fields

The required trace_filter object supports the following optional fields, and you can use any combination of them:

Unlike most Terraform resources, several child fields of trace and span are objects with a single field named value.

  • trace: Applies trace-level filtering. You can specify only one trace filter.

    • duration: Matches traces with a specified duration. You can specify only one of min_seconds and max_seconds.
      • min_seconds: Matches traces with a duration greater than or equal to this value.
      • max_seconds: Matches traces with a duration less than or equal to this value.
    • error: Object with a single field named value. If specified, matches traces with an error flag equal to this Boolean value. Refer to assign values with value for more information.
  • span: Applies span-level filtering. You can specify multiple span filters, but Chronosphere evaluates a trace filter match only if all span filters match.

    • service: Object with a single field named value. Matches spans with a service field equal to this value. Refer to assign values with value for more information.
    • operation: Object with a single field named value. Matches spans with an operation field equal to this value. Refer to assign values with value for more information.
    • parent_service: Object with a single field named value. Matches spans whose parent span's service field is equal to this value. Refer to assign values with value for more information.
    • parent_operation: Object with a single field named value. Matches spans whose parent span's operation field is equal to this value. Refer to assign values with value for more information.
    • duration: Matches spans with a specified duration. You can specify only one of min_seconds and max_seconds.
      • min_seconds: Matches spans with a duration greater than or equal to this value.
      • max_seconds: Matches spans with a duration less than or equal to this value.
    • error: Object with a single field named value. If specified, matches traces with an error flag equal to this Boolean value. Refer to assign values with value for more information.
    • tag: Matches on a span's tag keys and values. You must specify the key, and you can specify multiple tag filters, but Chronosphere evaluates a span filter match only if all tag filters match.
      • key: Matches spans whose tags match the specified key.
      • value: Object with a single field named value. Matches spans when the tag with a matching key also matches the specified value. If omitted, Chronosphere evaluates all spans matching the key as matches. Refer to assign values with value for more information.
    • span_count: Specifies how many spans can match. You can set only one of min and max. By default, Chronosphere evaluates a span filter as a match only if at least one span matches.
      • min: If set, at least this number of spans must match the parent span filter's conditions for Chronosphere to evaluate the entire trace filter as a match.
      • max: If set, Chronosphere evaluates a trace filter match only if the number of spans matching the parent span filter's conditions is equal to or less than the max value.
    • match_type: Specifies the span filter's match type. Valid values are "include" and "exclude". The "include" match type is the default, and evaluates a trace filter match if all its fields match any span, or multiple spans if you specify a span_count. An "exclude" match type evaluates a trace filter match if no spans within that trace match all of the span filter's conditions.

Assign values with value

If a child field of a trace or span are objects with a single field named value, assign the value to the required child value field of these objects, instead of directly assigning the value like a field.

For example, to set the error field's value in a trace filter, use:

trace {
  error {
    value = true
  }
}

Also, the value of a tag filter, unlike the key, is also an object with a value field:

span {
  tag {
    key = "region"
    value = {
      value = "us-east"
    }
  }
}

Although these objects have only the one value field, this structure allows Chronosphere to plan future features for these criteria. For more examples of this structure, see the filter examples.

Filter examples

Matches traces marked as error that also took more than five seconds:

trace_filter {
    trace {
      duration {
        min_seconds = 5
      }
      error {
        value = true
      }
    }
  }

Matches traces with at least one span from the service named "cupcake-factory":

trace_filter {
    span {
      service {
        value = "cupcake-factory"
      }
    }
  }

Matches traces with at least one span tagged region:us-east and at least one span tagged region:us-west:

trace_filter {
    span {
      tag {
        key = "region"
        value = {
          value = "us-east"
        }
      }
    }
    span {
      tag {
        key = "region"
        value = {
          value = "us-west"
        }
      }
    }
  }

Matches traces with at least one span tagged both region:us-east and stack:production:

trace_filter {
    span {
      tag {
        key = "region"
        value = {
          value = "us-east"
        }
      }
      tag {
        key = "stack"
        value = {
          value = "production"
        }
      }
    }

Matches traces with at least 10 spans containing the "db-query" operation:

trace_filter {
    span {
      operation {
        value = "db-query"
      }
      span_count {
        min = 10
      }
    }
  }