OBSERVABILITY PLATFORM
Licensing concepts

Licensing concepts

Chronosphere uses the following terms when describing licensing concepts in Chronosphere Observability Platform.

To track your telemetry usage against your licensing quotas, use the Chronosphere-provided managed dashboards. For more information about each of these dashboards, see Licensing information.

Metric license types

Observability Platform defines two types of metric licenses: the Standard Metrics License and Histogram Metrics License.

Standard Metrics License

The Standard Metrics License measures aggregations, persisted writes, and persisted cardinality license consumption for the following Observability Platform metric types:

  • Cumulative counter
  • Delta counter
  • Gauge

Because Observability Platform aggregates and persists legacy Prometheus histograms and OpenTelemetry explicit bucket layout histograms as cumulative or delta counters, these metrics consume Standard Metrics License capacity.

Histogram Metrics License

The Observability Platform histogram metric type supports both OpenTelemetry exponential histograms and Prometheus native histograms.

The Histogram Metrics License measures aggregations, persisted writes, and persisted cardinality license consumption for the following Observability Platform metric types:

  • Cumulative exponential histogram
  • Delta exponential histogram

Aggregations

Your license usage is determined by your database writes.

Matched writes are the number of writes per second being matched for transformation and reshaping by the Observability Platform aggregation tier.

The aggregator counts the number of data points matched into each aggregator rule, whether rollup or downsampling. If a data point matches one rule, that's one matched write. If a data point matches two rules, that's two matched writes. The sum of the matched data points per rule equals the total matched writes for the aggregator.

A high level formula for this limit is:

Sum (number of data points matched per-rule)

Writes also depend on your Collector scrape interval. Reducing the scrape interval produces fewer writes, but can reduce visibility.

If your organization exceeds 100% of their Matched Writes Capacity Limit, data that would otherwise match aggregation rules is dropped. The data that drops depends on your metric pool allocations, and the priority set for each pool.

Persisted writes

The number of persisted writes to the Observability Platform database consists of the following:

(Number of unaggregated, raw data points written to the database)
+ (Number of aggregated data points written to the database)
+ (Number of non-Prometheus non-rolled up aggregated data points written to the database)

If you exceed 100% of your Persisted Writes Capacity limit, data points might be dropped.

Quotas determine the data which drops first. You can split the total system-persisted writes per second into quota allocations on a per-pool basis. Pools generally align with groups or teams, depending on your internal organization. Read more about configuring quotas. If you set up per-pool quotas, you can review the quotas in a dedicated dashboard.

To improve and enhance performance, stability, and features, Observability Platform adds time series to your database. These data points aren't counted against your license quota.

Persisted cardinality

Matched writes and persisted writes measure the count of data points at any given moment in time. Persisted cardinality operates differently, because it's a cumulative measure that calculates the sum of the unique time series of the persisted writes that Observability Platform stores, seen over the last 2.5 hours only.

Because this measure is cumulative, changes you make to reduce persisted cardinality won't be immediately reflected as a decrease in the persisted cardinality consumption percentage. For example, you can use rollup rules to downsample and aggregate metrics before they're stored. However, inactive time series don't expire until they stop counting towards the rolling window, so changes aren't reflected until 2.5 hours after the series was last seen by Observability Platform.

Read more about how persisted cardinality limits work, and then learn about how you can manage persisted cardinality limits and avoid persisted cardinality limits.

How persisted cardinality limits work

Persisted cardinality is comparable to a leaky bucket. Over time, new series can be added until the bucket is full. When the bucket is at maximum capacity, there's no space for new time series, so they're rejected. When existing time series expire, they make room for new series.

In the following example, the persisted cardinality capacity is five unique time series. The animated image shows the lifecycle of six, unique time series (A, B, C, D, E, and F) as new data points are added, and as other data points expire.

Animated image showing data points being introduced. When the persisted cardinality limit is reached, no more time series are accepted.

As data points are introduced, they're either accepted or rejected based on whether the persisted cardinality bucket is full (reached maximum capacity), and whether the related time series already exists in the bucket:

  • If the bucket is at maximum capacity and the series already exists, the data point is accepted.
  • If the bucket is at maximum capacity and the series doesn't exist, the series is rejected.

The following table shows how data points A3, E1, and F1 are processed, based on the bucket status:

Data pointStatusDescription
A3Time series A is in the bucket, so data point A3 is accepted.
E1Time series E isn't in the bucket, but the bucket has space for one more time series, so data point E1 is accepted in time series E.
F1Time series F isn't in the bucket, and the bucket is at capacity, so data point F1 is rejected.

Over time, data points expire based on when they entered the bucket. When data points exceed the 2.5 hour window, they're excluded from the persisted cardinality bucket.

In the example, data points A1 and D1 expired, so they're excluded from the bucket. When data point C1 expires, it's also excluded. Because data point C1 is the last data point in time series C, the entire series is removed, making space for a new time series in the bucket.

Data pointStatusDescription
A1Data point A1 expired, so it's excluded from the bucket.
D1Data point A1 expired, so it's excluded from the bucket.
C1Data point C1 is expiring, so it's excluded from the bucket.

Manage persisted cardinality limits

If your organization exceeds 100% of their Persisted Cardinality Capacity Limit, data points for any new time series not seen in the last 2.5 hours will be dropped until you're below this limit. Data points for existing time series will continue to be persisted.

Series that are more stable or regularly emitted aren't at risk of being dropped because they're always in the system, and are not categorized as new series. For example, series that don't change any labels are considered more stable.

To fully resolve a penalty period, the rate of new series must be less than the rate of expiring series. The higher the differential between these rates, the faster the penalty resolves.

To manage persisted cardinality limits:

The 2.5 hour expiration window is a rolling window, which means the constant rate of expiring series makes room for an equal rate of new series to be added. This behavior means the penalty period you experience can be much shorter than 2.5 hours.

Avoid persisted cardinality limits

Use the following tools and techniques to avoid hitting persisted cardinality limits:

  • Review the Usage Dashboard and Metric Growth dashboard to understand the source of cardinality growth.
  • Learn about different methods to reduce cardinality.
  • Proactively create drop rules and aggregation rules like mapping rules and rollup rules ahead of potential overages to evict older time series and make room for new ones.
  • If your organization knows which new metrics services are generating, try to control the rate that new series are introduced through smaller, more incremental deploys.

Capacity limits

Capacity limits indicate your maximum license capacity for metrics data in Observability Platform. Exceeding your capacity limits incurs penalties, which can result in dropped metrics. Dropped metrics can affect dashboards, alerts, and other reports.

The license limit indicates your contractual system license with Chronosphere.

The capacity and license limits display in the Metrics License Consumption dashboard.

Tracing licenses

View tracing license information in the Tracing Licensed Consumption dashboard.

Retention policies

Retention policies define the amount of time Observability Platform retains telemetry data. Contact Chronosphere Support to configure the intervals used for your system. These policies might be based on your contract or license.

Metrics retention policies

To view metric retention policies, in the navigation menu, click Go to Admin and then select Control > Ingest Configuration.

For example, your system's retention policies might look like:

  • Five days for raw data, and resolutions of 15, 30, and 60 seconds.
  • 120 days for one-minute data.
  • 180 days for one-hour data.
  • 1825 days for 24-hour data.

Metrics rules use the existing, configured set of intervals in rule definitions.

Change events policies

Change events have a default retention policy of 90 days. This value is fixed and can't be altered. For information about ingest limits for change events, see Change event limits.