Troubleshooting

Troubleshooting ingestion

If you determine some or all of your data isn't displaying in Chronosphere, use the following information to ensure your Collector is ingesting metrics or traces.

Verifying incoming data

Metrics and traces have different verification processes.

Verifying traces

Trace Analyzer provides a real-time view of incoming traces grouped by tag and their relative frequency. Use Trace Analyzer to display the stream of incoming spans from your trace data.

  1. In the navigation menu, select Exploring > Trace Analyzer.
  2. Click Run to display the stream of incoming spans. The default grouping shows spans by service (__svc__), but you can include additional tags to group by such as __trace_id__ and __span_id__.

Verifying metrics

You can use the Live Telemetry Analyzer to inspect the incoming stream of metrics ingested by Chronosphere.

  1. In the navigation menu select Exploring > Live Telemetry Analyzer.
  2. Click Live to display streaming metrics.
  3. In the Keys list, select the __name__ and instance label keys.
  4. In the Values filter, enter a key:value pair for a metric the OpenTelemetry exporter is sending, or any specific value the Chronosphere Collector is sending.

The OpenTelemetry Ingestion & Health dashboard provides two charts which show the rate of metric data points per received and rejected by the Chronosphere OTLP API.

  • Metric Data Points Received
  • Metric Data Points Rejected

No data displays in Chronosphere

Common reasons and corresponding log messages for data not appearing in Chronosphere:

Missing API token

If your API key is missing, the following message displays in the OpenTelemetry Collector logs :

Exporting failed. Try enabling retry_on_failure config option to retry on retryable errors	{"kind": "exporter", "data_type": "traces", "name": "otlp/chronosphere", "error": "Permanent error: rpc error: code = Unauthenticated desc = missing auth header: API-Token"}

Ensure you've added the API-Token header to the otlp/chronosphere exporter configuration and that you're correctly setting and passing API token value.

Invalid API token

If your API key is invalid, the following message displays in the OpenTelemetry Collector logs:

Exporting failed. Try enabling retry_on_failure config option to retry on retryable errors	{"kind": "exporter", "data_type": "traces", "name": "otlp/chronosphere", "error": "Permanent error: rpc error: code = Unauthenticated desc = invalid auth token"}

The Exporter is sending an API-Token, but it's invalid. Ensure you've correctly copied the API Token value and that you're correctly setting and passing the value in for the API token.

Incorrect permission

If your service account doesn't have write permission, the following message displays in the OpenTelemetry Collector logs:

Exporting failed. Try enabling retry_on_failure config option to retry on retryable errors	{"kind": "exporter", "data_type": "metrics", "name": "otlp/chronosphere", "error": "Permanent error: rpc error: code = PermissionDenied desc = Permission denied"}

The exporter requests an API token with write access. The exporter doesn't require read access, so you can resolve this issue by creating a new service account with a write-only scope.

Some metrics are missing in Chronosphere

The following sections can help you when you're ingesting some metrics, but some of the expected metrics are missing.

Required labels to create resource attributes are missing

When using the OpenTelemetry Collector, Chronosphere requires service.name and service.instance.id as labels on all metrics to construct the Prometheus resource attributes. Metrics without these labels will be rejected and the following error message displays in your logs:

Exporting failed. Try enabling retry_on_failure config option to retry on retryable errors	{"kind": "exporter", "data_type": "metrics", "name": "otlp/chronosphere", "error": "Permanent error: rpc error: code = InvalidArgument desc = resource for service \"frontend-proxy\" is missing attribute service.instance.id (category=INVALID_REQUEST_ERROR code=BAD_REQUEST)"}

Chronosphere sends the first error condition as the error response. Other metric data points might fail for the same reason. To resolve this, add an attributes processor to your metrics pipeline to map an existing label that can be used as the unique instance label for your environment. See Mapping resource attributes for additional information.

To troubleshoot this issue, investigate metric data points and their attributes to determine what attributes the metric includes.

Add the debug logging exporter with detailed verbosity in the OpenTelemetry Collector config exporters ConfigMap. Then, add the logging exporter to your metrics pipeline.

For example:

exporters:
  # adds the logging exporter with detailed verbosity to output data points to log output
  logging:
    verbosity: detailed
# ... snippet
service:
  # ... snippet
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [transform, batch, resourcedetection, resourceattributes/instance]
      exporters: [logging, otlp/chronosphere]

The following is sample output for a metric datapoint logged using the logging exporter. When you review the resource attributes list, you can see the required service.name is available, but service.instance.id is missing. The host.name attribute is a unique attribute which can be used as the service.instance.id. See Mapping resource attributes to learn how to add attribute mapping to copy the host.name attribute to service.instance.id.

2023-10-18T16:38:46.172Z	info	ResourceMetrics #0
Resource SchemaURL: https://opentelemetry.io/schemas/1.6.1
Resource attributes:
     -> service.namespace: Str(opentelemetry-demo)
     -> service.name: Str(currencyservice)
     -> telemetry.sdk.version: Str(1.10.0)
     -> telemetry.sdk.name: Str(opentelemetry)
     -> telemetry.sdk.language: Str(cpp)
     -> host.name: Str(3d181cdaa016)
     -> os.type: Str(linux)
ScopeMetrics #0
ScopeMetrics SchemaURL:
InstrumentationScope app_currency 1.3.0
Metric #0
Descriptor:
     -> Name: app_currency_counter
     -> Description:
     -> Unit:
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Delta
NumberDataPoints #0
Data point attributes:
     -> currency_code: Str(USD)
StartTimestamp: 2023-10-18 16:38:45.103115256 +0000 UTC
Timestamp: 2023-10-18 16:38:46.106640465 +0000 UTC
Value: 2