Trace Explorer

Use the Trace Explorer to search for traces and spans so that you can identify, triage, and understand the root cause of problems. You can view all trace data that's relevant to a particular issue, and compare that trace against a previous time period to better understand where errors are occurring.

To explore traces, in the navigation menu select Exploring  > Trace Explorer.

Prerequisites

To explore tracing data in Chronosphere, you must either install and configure the Chronosphere-provided Collector or the OpenTelemetry Collector to receive trace data from your services.

Search and filter trace data

Use the Span characteristics to search traces by their unique ID, or by the details of the individual spans in a given trace. You can add one or more span filters to additionally refine the trace results. These filters match spans within a trace based on the criteria that you select. You can choose to include or exclude operations, services, or tags from your search:

  • An include statement narrows trace results to only include those that contain at least one span that matches the query.
  • An exclude statement narrows trace results to only those with no spans that match the span query.

Including multiple span criteria constitutes an AND evaluation. If you enter multiple criteria in the same query, Chronosphere evaluates them in order. If you specify multiple queries, Chronosphere evaluates all criteria in the first query and then evaluates criteria in subsequent queries.

⚠️

If any criteria or combination of selected criteria don't match at least one span, the query returns zero results.

For example, you might want to search for any traces that include a span with an operation named /paymentstore.PaymentStore/Capture, which captures all payment information to your app. You can then add a span filter to exclude a tag like geo=fr to filter out any payments made in France.

To search traces:

  1. Define the criteria of your search:

    • Choose the time period to display results for, which defaults to the last 15 minutes.

      When choosing a time window, selecting a period longer than one day greatly impacts load time for search results. For optimal results, scope search queries to shorter time periods.

    • Select the duration of traces (in seconds) to include in your search.

      Trace duration is calculated as the time elapsed from the earliest to the latest observed timestamps across all spans for a given trace.

    • Choose whether to include or exclude error states.

  2. In the Span characteristics bar, use the search query syntax to enter span criteria to match on, which can be a combination of values for service, operation, and tag.

    For example, the following query returns all traces where at least one span includes a service called billing-svc, and an operation that starts with execute, and a tag named build.version where the value equals 1.54:

    service="billing-svc" operation=~"^execute*." tag:build.version="v1.54"

    You can also enter a unique trace ID to search for a specific trace, such as 19323ae9283b8fea000f63bff84524a0.

  3. To refine your search, click + Add Span Filter and specify additional span properties to include or exclude. Each row is a separate span filter that Chronosphere evaluates against individual spans. When you include multiple span filter rows, Chronosphere combines the filter rows as AND operations and evaluates the filters sequentially.

    As you modify your search, the search timeline graph updates in real time, along with the statistics, traces, and topology views. The timeline shows an overview of the current search, with the error and success counts plotted over time.

    A screenshot of a graph of a traces search

  4. To copy a short URL to your clipboard you can share with other users, click Copy URL to clipboard. Use this capability for sharing links of complicated queries that can generate long URLs.

    The Chronosphere app permanently stores short URLs in your tenant so that they don't expire.

  5. To learn more about the spans in a trace, click the Traces tab. Click the name of a trace in the Trace column to open the trace details page, which displays information for each of the spans that comprise the trace.

    In some instances, a trace might contain a missing span. Chronosphere identifies these spans for the selected trace as Missing in the Stats panel. To search for traces containing a missing span, in the Span characteristics field in Trace Explorer, enter parent_missing: "true".

  6. On the trace details page, filter the displayed data with the following options:

    • Use the Service and Operation menus to scope spans to specific services or operations.
    • Use the Only Errors toggle to display only segments of each trace that contain errors.
    • Use the Only Critical Path toggle to highlight segments of each span that impact the total duration of a trace. These segments help to identify latency issues within the trace. Resolving these issues can lead to the greatest improvement in latency of the entire trace.
  7. Use the quick copy button to copy the service name, operation name, or span ID, which you can enter in the Span characteristics of the Trace Explorer page.

    You can also hold the pointer over different attributes in the Span Details pane, click the more icon , and select Add to Filter to automatically navigate to the Trace Explorer page with that attribute included in your search criteria.

  8. Use the following filters in Span characteristics to limit search results to specific criteria:

    • To filter search results to the critical path, click Trace Explorer. In the the Span characteristics, select Include and then enter critical_path="true". The search results update to display spans that can help to identify latency issues within the trace. Resolving these issues can lead to the greatest improvement in latency of the entire trace.

    • To filter search results to leaf errors, in the Span characteristics, enter leaf_error="true". The search results update to display error spans that have no failing child spans. Leaf errors help filter out propagated errors, and provide clearer signals about the source of an error that might be causing the entire trace to fail.

Search query syntax

To define the properties you want to search for, you must use a particular search syntax. This syntax is a key followed by an operator and a value, surrounded by double quotes.

KEY =|=~ "VALUE"

Use the following syntax to match specific properties:

TypeOperatorExample
Exact matches=operation="operation123"
Regular expressions=~service=~"production-.*"
Duration> | <duration>"3.2"
Span count> | >= | < | <=span_count>="100"
Boolean valuestrue | falseerror="true"
Tagstag:tag:environment="production"
Tags with a numeric valuetag: > | >= | < | <= | !=tag:sale_price>"10"

Access recent searches

When investigating issues, you might run the same search frequently. Rather than redefining the search query, use Recent Searches to access previous searches. You can run a search from a previous time period and inspect the parameters of a previous search by clicking the search from the list.

For example, you can run a fully defined search query from one day ago for a particular time period. Queries run based on the relative time parameters from the recent search. If the selected time window of the query is the Last 15 minutes, the query pulls data from the last 15 minutes in current time, and not the last 15 minutes from when the search ran initially.

To access recent searches:

  1. In the navigation menu select Exploring  > Trace Explorer.

  2. In the Search for Traces pane, click View searches.

  3. Click the Recent tab, locate the search that you want to run, and click it.

    The parameters in the recent search override any trace or span information in the current Span characteristics.

Save frequent searches

If you find that you're accessing a recent search frequently, you can save the search so that it's always available to you. Saved searches are like bookmarks that you can reference whenever you need them.

To save a search:

  1. In the navigation menu select Exploring > Trace Explorer.
  2. Define the criteria of your search.
  3. Click Save.
  4. In the Save Search Criteria panel, enter a name for your search.
  5. Click Save.

To view your saved search, click View searches. Your saved search displays in the Saved tab.

To share your search, click the copy icon to copy a short URL to your clipboard that you can share with other users.

Group and narrow results

After an initial search returns results, you can also group and narrow your results to specific attributes.

Use the Group By field to retrieve and display the aggregated results grouped by up to two different attributes, such as service and operation. This ability to group related attributes helps to highlight and better understand existing issues such as anomalous error rates, based on specific properties.

You can additionally narrow your results with the Narrow Scope by Service Tag field. Enter a specific attribute to narrow your results to only that attribute. You can use this field independently or in conjunction with the Group By field. This field works with the Statistics table, Top Tags, and Topology View.

For example, you might have a payment app that includes a service named payment-gateway-service, and you want to explore issues with that service.

To group and narrow results:

  1. In the Span characteristics, enter service="payment-gateway-service" to include that service

  2. In the Group By menu, select Operation to group results by operation.

    In the Statistics table, you notice that the /paymentstore.PaymentStore/PaymentWasAuthorized operation has a high request rate. This operation tracks all authorized payments to your app.

  3. In the Statistics table, click the /paymentstore.PaymentStore/PaymentWasAuthorized operation and then click Include in Span Filter to add that attribute to your query.

  4. In the Narrow Scope by Service Tag field, enter payment-gateway-service to focus your results to only that attribute.

    You want to view all credit card brands that users made payments with through the /paymentstore.PaymentStore/PaymentWasAuthorized operation.

  5. In the Group By field, clear the Service option, and then enter card.brand to additionally group your results.

You identify the credit card brand that has the highest request rate amongst all cards that users make payments with, and can investigate and remediate related issues.

Create links to related information

When viewing span details for a selected trace, you can add links to related information, such as dashboards within Chronosphere, external services such as related tracing logs stored in your cloud provider, or links to other observability tools.

You can dynamically generate links to external services using templated variables, such as {{ trace_id }}, {{ service }}, and {{ operation }}. When you click one of these links that contains a variable, Chronosphere interpolates the variables with information from the selected span. For example, Chronosphere replaces the {{ service }} variable with the name of the service from the selected span.

These links persist across all traces in your Chronosphere app.

  1. In Trace Explorer, define the criteria for your search.

  2. To view a specific trace related to your search, click the Traces tab and then select an individual trace to open the trace details page.

  3. On the trace details page, under Span links, click + Add link.

  4. Enter a display name for the link, and then define a URL for your link.

    For example, the following link opens the Services Overview dashboard scoped to the service in the selected span. When you click the link, Chronosphere replaces the {{ service }} variable with the name of the service from the selected span.

    https://example.chronosphere.io/v3/dashboards/services-overview/services-overview?orgId=1&var-svc={{ service }}&var-root_svc={{ service }}
  5. Click Save to save your link.

Link to tracing data from a dashboard

You can create data links from a Grafana dashboard to relevant traces. These data links use Grafana variables (opens in a new tab) and Chronosphere parameters to refer to series fields, labels, and values. You define a URL in a data link to express a search in Trace Explorer. Chronosphere takes the contextual information in dashboard metrics and uses it to build links to search for traces. Clicking the link navigates you to Trace Explorer and replaces the variables with matching criteria defined in the search.

Data links operate on both native metrics and metrics derived from traces within Chronosphere.

The Trace Metrics dashboard Chronosphere provides by default contains a data link for each of the included panels. Use this URL as the basis for creating a data link:

${chrono_domain}/traces/?d_closeto=${__value.time}&d_metricname=${metric}

To create a data link:

  1. In any dashboard panel, hold the pointer over the panel, click the dropdown arrow, and click Edit.

  2. In the Edit Panel screen, click the Field tab.

  3. Under Data links, click Add link.

  4. In the Edit link window, define the Title and URL for your data link, which can link to traces and other external data sources that use the context in the graph as inputs.

    For example, the following URL uses the value of the to_svc and to_op labels as inputs, based on the specific series in the chart (if the chart shows multiple series). This example also includes the value-specific Grafana variables value.time, which is the point in time on the graph, and value.numeric, which is the value of the metric at that point.

    ${chrono_domain}/traces/explorer?d_closeto=${__value.time}&d_minduration=${__value.numeric}&d_service=${__field.labels.to_svc}&d_operation=${__field.labels.to_op}
  5. Click Save to save your changes.

  6. On the main dashboard screen, click Save to apply and save your changes.

  7. In the Save dashboard window, add a note about your changes and then click Save.

After saving your data link, you can click the time series in any chart, and then click Query Traces to view the contextual link to traces.

Data link parameters

Use any of the following parameters when creating a data link from a dashboard to tracing data. You can specify parameters in conjunction with Grafana data link variables (opens in a new tab) to refer to series fields, labels, and values.

Any parameters in the Trace Explorer search that you define when creating the trace metric override parameters you specify in the data link URL.

  • d_closeto: Identifies a value ten minutes before and after the specified point you select in the graph. For example, d_closeto=${__value.time} identifies a time window that includes ten minutes before and after the time you select in the graph.

  • d_error: Specifies whether to include the error count in the data link. For example, d_error=true.

  • d_metricname: The trace metric name to pull from the dashboard panel. For example, d_metricname=${metric}.

  • d_minduration: The minimum duration for the selected metric. Use this parameter for charts that measure duration or latency, such as p99 or p50 duration. For example, &d_minduration=${__value.numeric}.

  • d_operation: The operation to include in the data link. For example, &d_operation=${__field.labels.to_op} identifies the value of the to_op label for the selected time series on the dashboard panel.

  • d_service: The service to include in the data link. For example, d_service=${__field.labels.to_svc}.

  • d_tagname: The name of a tag. For example, d_tagname=environment.

  • d_tagvalue: The value of a tag. For example, d_tagvalue=production.

Features overview

Trace Explorer includes the following features.

Statistics

The charts on the Statistics tab provide summaries of key information about spans within the trace set that match the current search criteria.

This section aggregates and groups the top service values by their requests, error count, and duration (or latency). The default display groups statistics by service. You can group and narrow results by up to two different attributes, which can include service, operation, and tags. For example, you can select a service like frontend and also include a tag like deployment.environment to display your spans with these attributes grouped together.

Select the Only critical path toggle to only analyze spans that most impact the total duration of a trace, and display them grouped by the selected property. These attributes help to identify latency issues within the trace.

Click any item in one of the provided lists to include or exclude that grouping in the current results. Each of the following statistics update dynamically based on your choices:

The Sparklines view visualizes how trace statistics change over time. Use this view to better understand the state of your services and how requests and errors change in a specified time period.

The default setting compares the current requests against trace data from one hour prior. You can compare the current requests or errors against a defined time in the past to answer questions like, "How do requests to the frontend service in my production environments differ between now and one hour ago?" To change the comparison, select a time period in the Compare against field, which updates the sparkline graphs.

The Immediately before option compares the value you set in the Time Window to a time period of the same length that begins before the current time. For example, you might choose a time period within the past 30 minutes, which begins on 11:52:30AM. Choosing Immediately before compares the 30-minute window starting on 11:52:30AM to a 30-minute period starting on 11:22:30AM (exactly 30 minutes prior).

To modify the displayed data, select one of the following items from the Metric menu. The sparkline graphs update for each row based on the selected metric and the data groupings you select in the Group By menu.

You must select at least one attribute in the Group By menu to display data.

  • Requests: Counts of all spans within the selected group, divided by the number of seconds in the time range. Ranked in descending order.

  • Errors: The number of spans that indicate an error outcome. Ranked in descending order.

  • Leaf Errors: Error spans that have no failing child spans. These spans are often the potential cause of a trace's failure. Navigating directly to leaf errors helps filter out propagated errors, and provides clearer signals about the source of an error that might be causing the entire trace to fail. Ranked in descending order.

  • Median Duration P50: Ranks the spans of each group in order of duration, and selects the duration of the span in the middle of the list (fiftieth percentile). Ranks groups in descending order of this duration.

  • Tail Duration P99: Tail refers to the statistical notion of the upper tail of a distribution. This statistic ranks the spans of each group in order of duration, and selects the duration of the span that's 99% of the way through the list, meaning, a span that typically has a high duration. Ranks service and operation in descending order of this duration.

Top tags

The Top Tags tab displays a tabulated view of the top tags related to the current search results. By default, results are sorted by the Requests column, but you can sort the results by any column in ascending or descending order by clicking the arrow that appears when holding the pointer over the text of the column heading.

Click any item in one of these lists to include or exclude that grouping in the current results.

Each of the following statistics update dynamically based on your choices:

  • Requests: The number of spans out of all the matched traces that contain this tag key/value pair.

  • Error percentage: The percentage of spans containing this tag key/value pair that resulted in an error.

  • Median duration (P50): Groups the top tag key/value pairs included in any spans matching the search criteria, and then sorts the spans in those groups by the middle duration value (the fiftieth percentile).

  • Tail duration (P99): Groups the top tag key/value pairs included in any spans matching the search criteria, and then sorts the spans in those groups by the duration value that's 99% of the way through the list. These tags typically indicate a span with a high duration.

Traces

After defining your search, the Traces tab displays a list of the most relevant traces for the search along with the duration, spans, and error states of spans.

View trace details

On the Traces tab, select an individual trace to open the trace details page. The header displays details at both the root level and span level. The root level details encompass all spans contained in the trace. The span level details show data that's scoped to the selected span only. Selecting a different span in the list won't change the root level details, but the span level details update to reflect information about the selected trace.

The trace details include the service name, operation name, trace ID or span ID, start time, duration, and additional statistics for the trace at the root and span level. Use the quick copy button to copy the service name, operation name, and span ID. You can take that data to the Trace Explorer page to include in your overall search.

You can narrow the displayed data with the following options:

  • Use the Service and Operation menus to display specific services or operations.
  • Use the Only Errors toggle to display only segments of each trace that contain errors.
  • Use the Only Critical Path toggle to highlight spans that impact the total duration of a trace. These segments help to identify latency issues within the trace.

Select an individual span from the list to display its Span Details, which include the following information about the span:

  • Links to external services, such as related tracing logs stored in your cloud provider, or links to other observability tools. You can dynamically generate links to external services using templated variables, such as {{ trace_id }}. Click + Add Link to add a link.

  • Stats for the span, including the span ID, start time, and duration. Child spans have a Parent span ID field that indicates parent spans. Hold the pointer over the parent span ID, click the more icon (), and then click Go to span to navigate directly to the parent span.

    A child span can have multiple parent spans, such as when a batch operation runs multiple jobs that it receives from other operations. If a span in the selected trace has more than one linked parent, both parent span IDs display. You can navigate directly to the parent span you want to view. Because linked operations might complete asynchronously, the linked trace process might not be immediately available and can take several minutes to display.

    In some instances, a trace might contain a missing span. Chronosphere identifies these spans for the selected trace as Missing in the Stats panel. To search for traces containing a missing span, in the Span characteristics field in Trace Explorer, enter parent_missing: "true". Identifying missing spans can help to fix instrumentation issues or drop non-critical traces containing missing spans.

  • Tags attached to a trace. Point to any tag and select the more icon () to add or exclude that tag from your span filter.

  • Process tags that are common to a set of traces. Point to any tag and select the more icon () to add or exclude that tag from your span filter.

  • Span logs are unique to a trace, and indicate events, process status within a span, or other instrumentation data.

When you select a trace, the name of a trace to view the included service, operation, and any errors, which display with a red error symbol. When you choose a service and operation, the span details update with the following information:

Sorting results

By default, the list sorts results by Start time in descending order. You can sort the results by any column in ascending or descending order by clicking the arrow that appears when holding the pointer over the text of the column heading.

Hover the mouse pointer at the far end of each column heading to display the three vertical dots icon . This menu provides access to advanced sorting and filtering options and lets you reverse the sorting order, filter values, and hide and show columns.

Topology view

Use the Topology View to visualize how traces from services and operations in the current search results cascade from each other.

To narrow scoped of the topology map, select Service level, Operation level, or Error focus from the Showing list. Use the Metric list to change which metric the topology view displays. To search for specific services, use the Search services field.

Select the Only Critical Path toggle to highlight segments of each span that impact the total duration of a trace. These segments help to identify latency issues within the trace.

Hold the pointer over one of the service or operation nodes to highlight the other nodes directly connected to it. Hold the pointer over a segment connecting one or more nodes to highlight the nodes directly connected to the segment.

Click any node to display node details, such as incoming and outgoing requests, and the median and tail duration of related spans. With a node selected, click Include or Exclude to include or exclude the selected service or operation in your Span characteristics.

Edges, which are lines between services, provide details about connected services. Click an edge to view the requests and trace duration between two services.

Topology View with the eventingester service selected