Trace Explorer features overview

Trace Explorer includes the following features.

Statistics

The charts on the Statistics tab provide summaries of key information about spans within the trace set that match the current search criteria.

This section aggregates and groups the top service values by their requests, error count, and duration (or latency). The default display groups statistics by service. You can group and narrow results by up to two different attributes, which can include service, operation, and tags. For example, you can select a service like frontend and also include a tag like deployment.environment to display your spans with these attributes grouped together.

Select the Only critical path toggle to only analyze spans that most impact the total duration of a trace, and display them grouped by the selected property. These attributes help to identify latency issues within the trace.

Click any item in one of the provided lists to include or exclude that grouping in the current results. Each of the following statistics update dynamically based on your choices:

The Sparklines view visualizes how trace statistics change over time. Use this view to better understand the state of your services and how requests and errors change in a specified time period.

The default setting compares the current requests against trace data from one hour prior. You can compare the current requests or errors against a defined time in the past to answer questions like, "How do requests to the frontend service in my production environments differ between now and one hour ago?" To change the comparison, select a time period in the Compare against field, which updates the sparkline graphs.

The Immediately before option compares the value you set in the Time Window to a time period of the same length that begins before the current time. For example, you might choose a time period within the past 30 minutes, which begins on 11:52:30AM. Choosing Immediately before compares the 30-minute window starting on 11:52:30AM to a 30-minute period starting on 11:22:30AM (exactly 30 minutes prior).

To modify the displayed data, select one of the following items from the Metric menu. The sparkline graphs update for each row based on the selected metric and the data groupings you select in the Group By menu.

You must select at least one attribute in the Group By menu to display data.

  • Requests: Counts of all spans within the selected group, divided by the number of seconds in the time range. Ranked in descending order.

  • Errors: The number of spans that indicate an error outcome. Ranked in descending order.

  • Leaf errors: Error spans that have no failing child spans. These spans are often the potential cause of a trace's failure. Navigating directly to leaf errors helps filter out propagated errors, and provides clearer signals about the source of an error that might be causing the entire trace to fail. Ranked in descending order.

  • Median duration P50: Ranks the spans of each group in order of duration, and selects the duration of the span in the middle of the list (fiftieth percentile). Ranks groups in descending order of this duration.

  • Tail duration P99: Tail refers to the statistical notion of the upper tail of a distribution. This statistic ranks the spans of each group in order of duration, and selects the duration of the span that's 99% of the way through the list, meaning, a span that typically has a high duration. Ranks service and operation in descending order of this duration.

Top tags

The Top tags tab displays a tabulated view of the top tags related to the current search results. By default, results are sorted by the Requests column, but you can sort the results by any column in ascending or descending order by clicking the arrow that appears when holding the pointer over the text of the column heading.

Click any item in one of these lists to include or exclude that grouping in the current results.

Each of the following statistics update dynamically based on your choices:

  • Requests: The number of spans out of all the matched traces that contain this tag key/value pair.

  • Error percentage: The percentage of spans containing this tag key/value pair that resulted in an error.

  • Median duration (P50): Groups the top tag key/value pairs included in any spans matching the search criteria, and then sorts the spans in those groups by the middle duration value (the fiftieth percentile).

  • Tail duration (P99): Groups the top tag key/value pairs included in any spans matching the search criteria, and then sorts the spans in those groups by the duration value that's 99% of the way through the list. These tags typically indicate a span with a high duration.

Traces

After defining your search, the Traces tab displays a list of the most relevant traces for the search along with the duration, spans, and error states of spans.

View trace details

On the Traces tab, select an individual trace to open the trace details page. The header displays details at both the root level and span level. The root level details encompass all spans contained in the trace. The span level details show data that's scoped to the selected span only. Selecting a different span in the list won't change the root level details, but the span level details update to reflect information about the selected trace.

The trace details include the service name, operation name, trace ID or span ID, start time, duration, and additional statistics for the trace at the root and span level. Use the quick copy button to copy the service name, operation name, and span ID. You can take that data to the Trace Explorer page to include in your overall search.

You can narrow the displayed data with the following options:

  • Use the Service and Operation menus to display specific services or operations.
  • Use the Only errors toggle to display only segments of each trace that contain errors.
  • Use the Only critical path toggle to highlight spans that impact the total duration of a trace. These segments help to identify latency issues within the trace.

Select an individual span from the list to display its Span details, which include the following information about the span:

  • Links to external services, such as related tracing logs stored in your cloud provider, or links to other observability tools. You can dynamically generate links to external services using templated variables, such as {{ trace_id }}. Click + Add Link to add a link.

  • Stats for the span, including the span ID, start time, and duration. Child spans have a Parent span ID field that indicates parent spans. Hold the pointer over the parent span ID, click the more icon (), and then click Go to span to navigate directly to the parent span.

    A child span can have multiple parent spans, such as when a batch operation runs multiple jobs that it receives from other operations. If a span in the selected trace has more than one linked parent, both parent span IDs display. You can navigate directly to the parent span you want to view. Because linked operations might complete asynchronously, the linked trace process might not be immediately available and can take several minutes to display.

    In some instances, a trace might contain a missing span. Chronosphere Observability Platform identifies these spans for the selected trace as Missing in the Stats panel. To search for traces containing a missing span, in the Span characteristics field in Trace Explorer, enter parent_missing: "true". Identifying missing spans can help to fix instrumentation issues or drop non-critical traces containing missing spans.

  • Tags attached to a trace. Point to any tag and select the more icon () to add or exclude that tag from your span filter.

  • Process tags that are common to a set of traces. Point to any tag and select the more icon () to add or exclude that tag from your span filter.

  • Span logs are unique to a trace, and indicate events, process status within a span, or other instrumentation data.

When you select a trace, the name of a trace to view the included service, operation, and any errors, which display with a red error symbol. When you choose a service and operation, the span details update with the following information:

Sorting results

By default, the list sorts results by Start time in descending order. You can sort the results by any column in ascending or descending order by clicking the arrow that appears when holding the pointer over the text of the column heading.

Hold the pointer over the far end of each column heading to display the three vertical dots icon . This menu provides access to advanced sorting and filtering options and lets you reverse the sorting order, filter values, and hide and show columns.

Topology view

Use the topology view to visualize how traces from services and operations in the current search results cascade from each other.

You can access the topology view from either Trace Explorer or from the dependency map of an individual service page. The scope of the dependency map in a service page is specific to the selected service. Click View full map to open the topology view in Trace Explorer, scoped to the selected service.

To narrow scoped of the topology map, select Service level, Operation level, or Error focus from the Showing list. Use the Metric list to change which metric the topology view displays. To search for specific services, use the Search services field.

Select the Only critical path toggle to highlight segments of each span that impact the total duration of a trace. These segments help to identify latency issues within the trace.

Hold the pointer over one of the service or operation nodes to highlight the other nodes directly connected to it. Hold the pointer over a segment connecting one or more nodes to highlight the nodes directly connected to the segment.

Click any node to display node details, such as incoming and outgoing requests, and the median and tail duration of related spans. With a node selected, click Include or Exclude to include or exclude the selected service or operation in your Span characteristics.

Edges, which are lines between services, provide details about connected services. Click an edge to view the requests and trace duration between two services.

Topology View with a service selected