On-call triage
The following example highlights annotations, which you can use to link to tracing data from a dashboard. The example assumes that you received a notification from an alert that triggered for a monitor.-
You click a link in the notification, which directs you to the
Order Service Latency monitor. This monitor tracks requests and errors for the
ordering-svcservice. On the Order Service Latency monitor, in the Query Results chart, you notice a continual spike in queries to theordering-svcservice. - In the Annotations section of the monitor, you click a link to a dashboard.
-
In the Order Service Overview dashboard, you notice a wave of spikes in
requests to the
/ordering.Ordering/Checkoutoperation of theordering-svcservice. - On the Requests chart, click any point and then click Query Traces. The link opens Trace Explorer with a predefined search query that includes the service and operation you want to explore.
- On the Trace Explorer page, click the Topology View tab to view a mapping of affected upstream and downstream services.
-
In the Search services box, enter
ordering-svcto scope the view to that service. -
Click the
ordering-svcnode to display details. In the Node Details panel, you see 176 errors incoming and 119 errors outgoing connected to theordering-svcservice. As you zoom in on the topology view, you notice that the edge connecting to thebilling-svcservice is thicker than the others. -
Click the
billing-svc. In the Node Details panel for thebilling-svc, you notice that outgoing requests to thepayment-gateway-svcare high. -
In the Node Details panel, click Include to include the
billing-svcin your search query. Your search query now includes:operation:/ordering.Ordering/Checkoutservice:ordering-svcservice:billing-svc
billing-svcservice is generating the most errors, which is also impacting thepayment-gateway-svcservice. -
On the Trace Explorer page, click the three vertical dots icon,
and then select Create Metric to create a trace metric for detecting future
issues with the
billing-svcservice. Other on-call engineers can use this trace metric to open a predefined query in Trace Explorer and help reduce the time to identify and fix issues with this service.
Start with trace data
The following example begins in Trace Explorer. Maybe you navigated here from Trace Metrics, a dashboard, or a monitor, and now you’re exploring trace data to identify where issues are occurring.- In the navigation menu select Explorers > Trace Explorer.
- In the time range selector, select Last 30 minutes.
-
Select the Failed traces only radio button and then click Run.
This search returns too many traces to narrow down the issue. You think the issue
relates to the
frontendservice, but don’t know which related operation is the culprit. Modify the search criteria to narrow your search. -
In the Query builder search bar, enter
frontend, click that service from the search results to add it to your query, and then click Run. Your search narrows the results and scope to only spans that include the frontend service. On the Span statistics tab, you notice that the loadgenerator service has a high error rate. -
On the Span statistics tab, click loadgenerator, and then click
Include in Span Filter in the resulting dialog to add the
loadgeneratorservice to your search query. You know that theloadgeneratorservice is contributing to your trace latency, but still aren’t sure what the main issue is. - Click the Trace list tab to view a list of the most relevant traces for your search.
-
In the Trace column, click loadgenerator > HTTP GET
to display the trace details for that service and operation combination.
You notice errors in operations for two additional services related to the
loadgeneratorservice. TheGEToperation on both theloadgeneratorandfrontendservices have high latency. -
Click the frontend service, which updates the Span details
panel with information specific to that service and operation combination.
You now have detailed information about the specific services and operations
causing latency issues. Choose Formatted (default) to display a tabular view,
or Raw to view span details in JSON format.
In the Process section, you identifyIn the Links section, click + Add Link to add a link based on a template to your external logging service, which provides other users access to the logs related to this span.
k8s.pod.name, which is the Kubernetes pod theGETrequest originates. You can begin investigating that specific operation to remediate the issue. -
To the right of the value for
k8s.pod.name, click the three vertical dots icon and then click Add to Filter to add the value of that process to your search query. - On the Trace Explorer page, you can click the more icon and then select Create Metric to create a trace metric based on your updated search. You can use trace metrics to create dashboards and monitors for key metrics that you want to track and get alerts for.