OBSERVABILITY PLATFORM
Chronosphere-managed dashboards

Chronosphere-managed dashboards

Chronosphere Observability Platform includes several dashboards to visualize information about itself and its usage within your organization.

Chronosphere-managed dashboards use native dashboard tools built for Observability Platform, which gives them a slightly different appearance and new features not present in older Grafana-based dashboards.

For example, Observability Platform dashboards let you pin tooltips and view nearby series when holding the pointer over a tooltip.

To pin a tooltip, click while a tooltip is visible in a dashboard's panel. This causes the tooltip to persist in place, rather than following the cursor as you move it across the visualization or disappearing after you move the cursor out of the panel.

To unpin a tooltip, click the tooltip again.

Chronosphere-managed dashboards are read-only. You can't edit or delete these dashboards.

Ingest dashboards

Use the following dashboards to monitor metric ingestion and health in your Observability Platform tenant.

Chronosphere Health Check

The Chronosphere Health Check dashboard monitors the stability and efficiency of metric ingestion in your Observability Platform environment.

The Uptime Percentages (30d) panel depicts the trailing 30-day product uptime of ingest, query, and console services as a percentage, which provides a snapshot of the reliability of Observability Platform

The Historical Uptime panel depicts the product uptime averages over time.

The Ingest Latency panel depicts the latency from the moment a metric is scraped to the moment it's persisted, with lines for those latencies at notable percentiles. This data helps you understand the efficiency and speed of metric ingestion, and helps your organization identify and address potential bottlenecks by correlating ingestion changes latency spikes.

CloudWatch Metrics Ingestion & Health

The CloudWatch Metrics Ingestion & Health dashboard displays operational information about the health of your CloudWatch Metrics Streams integration with Observability Platform. Use this dashboard to ensure that CloudWatch metrics are streaming to Observability Platform.

Collectors

The Collectors dashboard provides a high-level visualization of metrics and resources used by Collector instances.

OpenTelemetry Ingestion & Health

The OpenTelemetry Ingestion & Health dashboard provides details about how much data the OpenTelemetry Collector is ingesting and how many system resources it's using. For metrics, this dashboard includes statistics on ingested data points by metric type. The Collector Health section of the dashboard includes graphs for statistics like memory and CPU core usage. When using Traces, data on trace spans might be available.

Google Cloud Integration

The Google Cloud Platform integration dashboard provides details about how much Google Cloud data Observability Platform is ingesting. This includes data including the Top 5 Metric Descriptors, service status, and meta-information about the ingested metrics.

Query dashboards

Use the following dashboards to monitor queries, identify resource intensive alert or recording groups, and visualize metrics created for traces.

Chronosphere Query Overview

The Chronosphere Query Overview dashboard measures resource-intensive or long-running alert, dashboard, and recording rule queries.

Query Accelerator

The Query Accelerator dashboard visualizes query usage within Observability Platform. The query throttling panel helps you identify users whose queries are being rate-limited. To improve the performance of such queries, specify a shorter time window or use aggregation rules.

Trace Metrics

The Trace Metrics dashboard visualizes metrics created for traces.

Cardinality dashboards

Use the following dashboards to visualize cardinality produced by jobs in a given namespace, and identify information you can use to help reduce cardinality in your Observability Platform tenant.

Persisted Cardinality Quotas

The Persisted Cardinality Quotas dashboard displays cardinality consumption breakdown by individual metric pools and priority. Persisted cardinality is a cumulative measure that calculates the sum of the unique time series of the persisted writes that Observability Platform stores, seen over the last 2.5 hours only.

Use this dashboard to understand cardinality costs across specific teams, services, and pools, and to help pinpoint specific sources of cardinality growth, such as a particular pool or priority group. This dashboard includes the following panels, which are backed by specific persisted cardinality metrics:

  • Current Total Consumption displays the persisted cardinality consumed across all metric pools divided by your persisted license capacity, expressed as a percentage.
  • Total Consumption displays the same data as Current Total Consumption, plotted on a line graph over the last week.
  • Total Dropped displays the total metrics dropped during a penalty period if your organization exceeds 100% of their Persisted Cardinality Capacity Limit.

In addition to these metrics, you can configure metric pools and priorities to display Consumption by Pool and Consumption by Priority panels in this dashboard, which use the same pools and priorities configured in your Matched Writes Quotas dashboard. If you didn't configure that dashboard, the Persisted Cardinality Quotas dashboard displays persisted cardinality consumption only.

To further identify sources of cardinality cost increases, the Persisted Cardinality Quotas dashboard includes panels that display top metrics and values for your most critical usage tags defined in the Usage Dashboard. For each of your usage tags, the Persisted Cardinality Quotas dashboard includes two panels:

  • A line graph that displays the top label values
  • A bar chart with a percentage breakdown of those values in descending order.

Use this information to identify the top metrics and label values for your usage statistics that are contributing to your persisted cardinality license, in addition to the metrics for your predefined metrics pools.

To modify the usage tags that Observability Platform uses to generate the top metrics and labels that display in the Persisted Cardinality Quotas dashboard, contact Chronosphere Support.

Cardinality Overview

The Cardinality Overview dashboard provides a visualization of cardinality produced by jobs in a given namespace. Use this dashboard to understand what jobs are causing high cardinality, and how the metrics in a job contribute to cardinality.

Metric Growth

Chronosphere provides the Metric Growth dashboard to help you identify the potential sources of metric growth in your system by highlighting the following panels:

  • Overview displays the top metrics and labels by volume in DPPS.
  • Change Over Comparison Period Using Averages
    • The top 10 metrics with the highest rate of growth over a selected comparison period by data points per second.
    • The top 10 labels with the highest rate of growth over a selected comparison period, by data points per second and unique values.
  • Drilldown
    • This panel highlights additional information about the selected metrics or labels labels, including cardinality and DPPS growth over time.
    • Data won't load if there are no metrics or labels selected.

Use this dashboard to understand when:

  • A newly added high-cardinality metric or label appears with high DPPS. You can understand it's new by comparing it to an older time range.
  • An existing metric/label with more unique values appears to have grown in DPPS. For example, a histogram metric whose buckets were increased greatly.

You can then use this information to reduce cardinality.

Licensing Information

Chronosphere provides dashboards to monitor data usage against your licensing quotas. Use these dashboards to help identify usage trends across data types. You can then proactively manage data usage to avoid exceeding your organization's licensing limits.

For more information about the licensing terms used on these dashboards, and details about how Chronosphere calculates these values, see Licensing concepts.

Metrics License Consumption

The Metrics License Consumption dashboard monitors your persisted writes, matched writes, and persisted cardinality consumption against your Standard Metrics License and Histogram Metrics License quotas over a given time span, defaulting to the last seven days.

The dashboard's sections for each license type provide these graphs and usage percentages for each corresponding quota:

  • The Current Persisted Writes Quota displays the percentage of your persisted writes of metrics against your license limit. To limit how many metrics your organization writes, use metric quotas and pools.
  • The Current Matched Writes Quota displays the percentage of your matched writes against your license limit. Learn more about matched writes.
  • The Current Persisted Cardinality Count Quota displays the percentage of the total cardinality, or the number of unique time series persisted to the Observability Platform database at a given time, against your license limit. Aggregation rules can help you reduce your cardinality.

Your limits and usage are both indicated on the quota line graphs with each percentage, and the panels include tables showing your average usage and limits over the selected time span.

Customers are at risk for dropping data when they exceed 100% of their capacity limit indicated by the green percentage numbers on each graph. The data dropped depends on the license limit exceeded.

Metrics license capacity

License capacity compares your current data usage against your maximum allowed data usage.

The limits displayed for each dashboard include:

  • The Actual data, which is the average and maximum number of actual values recorded.
  • The License Limit defined by your contract with Chronosphere.
  • A Capacity Limit, which is your current limit as defined by Chronosphere. This can differ from your license limit due to a temporary capacity increase by Chronosphere.

Matched Writes Quotas

The Matched Writes Quotas dashboard lets you view consumption per pool against allocations, matched writes versus the capacity limit, and drops by pool and priority. If you exceed your matched writes quota, use this dashboard to help understand which pools breached the set limits and are at risk of dropping data. For each pool in a penalty state, you can view how much data is being dropped, separated by low, medium, and high priority data.

In addition to the default panels, you can configure panels that include custom allocations for your pools by using the pool_name attribute. These custom allocations let you focus on important pools so you can quickly view consumption details across your organization.

This dashboard includes a Summary panel group containing panels for the following statistics:

  • Current Total Consumption displays the current total consumption rate as a percentage, which is calculated as the total matched writes consumed divided by the license capacity.
  • Total Consumption displays the consumption rate of matched writes versus your total capacity limit for matched writes.
  • Total Dropped displays the total amount of dropped data, separated by low, medium, and high priority data.
  • Pool Quota Breakdown displays the consumption rate in DPPS of matched write license by pool name.

This dashboard also includes panel groups for each of your defined metric pools. Each panel group contains panels for the following statistics:

  • Current Consumption: displays the current total consumption rate of the selected pool as a percentage, which is calculated as the total matched writes consumed divided by the license capacity.
  • Consumption: displays the consumption rate of matched writes for the selected pool versus your total capacity limit for matched writes.
  • Consumption by Priority: displays the matched writes consumed in DPPS by priority within the pool, separated by low, medium, and high priority data.
  • Dropped by Priority: displays the matched writes dropped in DPPS by priority with in the pool, separated by low, medium, and high priority data.

See matched writes in the metrics dictionary for more information about the metrics that Observability Platform uses to create the statistics displayed in this dashboard.

Tracing licensing information

The Tracing License Consumption panels monitor your processed data and persisted data against your account limits for a given time span, defaulting to the last seven days.

The dashboard provides these consumption percentages for each limit:

  • The Processed License Limit displays the percentage of processed tracing bytes against your license limit. This percentage includes all bytes of trace data sent to and processed by Chronosphere. To limit the amount of tracing data your organization sends to Chronosphere, use head sampling.
  • The Persisted License Limit displays the percentage of your persisted trace bytes against your license limit. To limit the amount of tracing data your organization persists, use tail sampling or trace behaviors to apply a set of fine-grained rules after any head sampling decisions.
  • The Processed to Persisted Ratio displays a decimal ratio of processed to persisted data against your license limit. Use this information to understand the relationship between your processed and persisted data consumption.

This dashboard also includes two line graphs that display your daily data consumption breakdown and your cumulative data consumption breakdown. The daily data consumption graph clears each day to accurately represent your consumption for that period. Use these graphs to identify any spikes in data consumption to help identify where you can reduce the amount of ingested or persisted tracing data.

Events License Consumption

The Events License Consumption dashboard monitors your event data against both capacity and license limits, defaulting to the last seven days.

The dashboard provides these consumption percentages for each limit:

  • The Persisted capacity limit displays the number of events that can persist per minute in your tenant. This limit is enforced and can incur penalties if exceeded. In certain circumstances, this limit can exceed the license limit temporarily.
  • The Persisted license limit displays the number of events that can persist per minute, as defined by your the license in your contract with Chronosphere.
  • The Latest consumption percentage displays a decimal ratio of persisted data against your per-minute license limit for the selected time period. This value is calculated by dividing the persisted events per minute by the number of events that can persist per minute, defined by the license limit. Use this information to understand the relationship between your persisted data consumption and the defined license limit.

This dashboard also includes a line graph that displays your daily data consumption of events per minute compared against your license and capacity limits. Use this graph to identify any spikes in data consumption to help identify where you can reduce the amount of persisted events data.

Usage Dashboard

The Usage Dashboard visualizes ingested, persisted, and dropped metrics by user source across your organization.