Examples for tracing
Use Chronosphere Observability Platform tracing when you want to locate a service operation causing latency issues to other services that rely on it. The following examples use data from the OpenTelemetry Astronomy Shop Demo (opens in a new tab), which is an open source, microservice-based distributed system that illustrates the implementation of OpenTelemetry in a near real-world environment.
On-call triage
The following example highlights annotations, which you can use to link to tracing data from a dashboard. The example assumes that you received a notification from an alert that triggered for a monitor.
-
You click a link in the notification, which directs you to the Order Service Latency monitor. This monitor tracks requests and errors for the
ordering-svc
service.On the Order Service Latency monitor, in the Query Results chart, you notice a continual spike in queries to the
ordering-svc
service. -
In the Annotations section of the monitor, you click a link to a dashboard.
-
In the Order Service Overview dashboard, you notice a wave of spikes in requests to the
/ordering.Ordering/Checkout
operation of theordering-svc
service. -
On the Requests chart, click any point and then click Query Traces.
The link opens Trace Explorer with a predefined search query that includes the service and operation you want to explore.
-
On the Trace Explorer page, click the Topology View tab to view a mapping of affected upstream and downstream services.
-
In the Search services box, enter
ordering-svc
to scope the view to that service. -
Click the
ordering-svc
node to display details.In the Node Details panel, you see 176 errors incoming and 119 errors outgoing connected to the
ordering-svc
service. As you zoom in on the topology view, you notice that the edge connecting to thebilling-svc
service is thicker than the others. -
Click the
billing-svc
.In the Node Details panel for the
billing-svc
, you notice that outgoing requests to thepayment-gateway-svc
are high. -
In the Node Details panel, click Include to include the
billing-svc
in your search query. Your search query now includes:operation:/ordering.Ordering/Checkout
service:ordering-svc
service:billing-svc
You determined that the
billing-svc
service is generating the most errors, which is also impacting thepayment-gateway-svc
service. -
On the Trace Explorer page, click the three vertical dots icon, and then select Create Metric to create a trace metric for detecting future issues with the
billing-svc
service.Other on-call engineers can use this trace metric to open a predefined query in Trace Explorer and help reduce the time to identify and fix issues with this service.
Start with trace data
The following example begins in Trace Explorer. Maybe you navigated here from Trace Metrics, a dashboard, or a monitor, and now you're exploring trace data to identify where issues are occurring.
-
In the navigation menu select Explorers > Trace Explorer.
-
In the time picker dropdown, select Last 30 minutes.
-
Select the Failed traces only radio button and then click Run.
This search returns too many traces to narrow down the issue. You think the issue relates to the
frontend
service, but don't know which related operation is the culprit. Modify the search criteria to narrow your search. -
In the Query builder search bar, enter
frontend
, click that service from the search results to add it to your query, and then click Run.Your search narrows the results and scope to only spans that include the frontend service. On the Statistics tab, you notice that the loadgenerator service has a high error rate.
-
On the Statistics tab, click loadgenerator, and then click Include in Span Filter in the resulting dialog to add the
loadgenerator
service to your search query.You know that the
loadgenerator
service is contributing to your trace latency, but still aren't sure what the main issue is. -
Click the Traces tab to view a list of the most relevant traces for your search.
-
In the Trace column, click loadgenerator > HTTP GET to display the trace details for that service and operation combination.
You notice errors in operations for two additional services related to the
loadgenerator
service. TheGET
operation on both theloadgenerator
andfrontend
services have high latency. -
Click the frontend service, which updates the Span details panel with information specific to that service and operation combination.
You now have detailed information about the specific services and operations causing latency issues. Choose Formatted (default) to display a tabular view, or Raw to view span details in JSON format.
In the Links section, click + Add Link to add a link based on a template to your external logging service, which provides other users access to the logs related to this span.
In the Process section, you identify
k8s.pod.name
, which is the Kubernetes pod theGET
request originates. You can begin investigating that specific operation to remediate the issue. -
To the right of the value for
k8s.pod.name
, click the three vertical dots icon and then click Add to Filter to add the value of that process to your search query. -
On the Trace Explorer page, you can click the more icon and then select Create Metric to create a trace metric based on your updated search. You can use trace metrics to create dashboards and monitors for key metrics that you want to track and get alerts for.