Observability Platform licensing
Chronosphere uses the following terms when describing licensing concepts and usage in Chronosphere Observability Platform.
To track your telemetry usage against your licensing quotas, use the License Overview. Chronosphere provides this overview to monitor data usage against your licensing quotas. Use the overview charts to help identify usage trends across data types and proactively manage data usage to avoid exceeding your organization's licensing limits.
License overview
The License Overview is available in your Observability Platform tenant. In the navigation menu, click Go to Admin and then select License Overview.
The License Overview page consists of these sections, selectable by tab:
The Consumption tab tracks your consumption for each telemetry type, compared to your license limits. These statistics display across two selectable tabs for Snapshot and overall Trends:
-
The Snapshot is an overview of your current license usage. When selected, the page displays each licensing statistic as a percentage of your contract limit, along with a graph for that statistic over a pre-selected time period.
-
The Trends page explains how your license usage has changed over the selected time period, grouped and graphed by telemetry type.
License dimensions currently exceeding capacity highlight in red, and licenses close to exceeding capacity are orange.
Hold the pointer over a graph to display the three vertical dots icon. Select this icon to display a menu where you can select a relevant Observability Platform tool for to use for detailed analysis.
Metric license types
Observability Platform defines two types of metric licenses: the Standard Metrics License and Histogram Metrics License.
Standard Metrics License
The Standard Metrics License measures aggregations, persisted writes, and persisted cardinality license consumption for the following Observability Platform metric types:
- Cumulative counter
- Delta counter
- Gauge
Because Observability Platform aggregates and persists legacy Prometheus histograms and OpenTelemetry explicit bucket layout histograms as cumulative or delta counters, these metrics consume Standard Metrics License capacity.
Histogram Metrics License
The Observability Platform histogram metric type supports both OpenTelemetry exponential histograms and Prometheus native histograms.
The Histogram Metrics License measures aggregations, persisted writes, and persisted cardinality license consumption for the following Observability Platform metric types:
- Cumulative exponential histogram
- Delta exponential histogram
Use the License Overview Trends page to observe histogram persisted writes, matched writes, and persisted cardinality in the Metrics consumption trends graph.
Aggregations
Your license usage is determined by your database writes.
Matched writes are the number of writes per second being matched for transformation and reshaping by the Observability Platform aggregation tier.
The aggregator counts the number of data points matched into each aggregator rule, whether rollup or downsampling. If a data point matches one rule, that's one matched write. If a data point matches two rules, that's two matched writes. The sum of the matched data points per rule equals the total matched writes for the aggregator.
A high level formula for this limit is:
Sum (number of data points matched per-rule)
Writes also depend on your Collector scrape interval. Reducing the scrape interval produces fewer writes, but can reduce visibility.
See your current Matched Writes level in the License Overview Snapshot in the Metrics Consumption section. On the Trends page, review usage over time in the Metrics Consumption Trends graph.
If your organization exceeds 100% of their Matched Writes Capacity Limit, data that would otherwise match aggregation rules is dropped. The data that drops depends on your metric pool allocations, and the priority set for each pool.
Persisted writes
The number of persisted writes to the Observability Platform database consists of the following:
(Number of unaggregated, raw data points written to the database)
+ (Number of aggregated data points written to the database)
+ (Number of non-Prometheus non-rolled up aggregated data points written to the database)
If you exceed 100% of your Persisted Writes Capacity limit, data points might be dropped.
Quotas determine the data which drops first. You can split the total system-persisted writes per second into quota allocations on a per-pool basis. Pools generally align with groups or teams, depending on your internal organization. Read more about configuring quotas. If you set up per-pool quotas, you can review the quotas in a dedicated dashboard.
To improve and enhance performance, stability, and features, Observability Platform adds time series to your database. These data points aren't counted against your license quota.
You can review your current usage in the Persisted writes graph on the License Overview Snapshot page, in the Metrics consumption section. To see changes over time, select Trends, and review the Metrics consumption trends graph.
Persisted cardinality
Matched writes and persisted writes measure the count of data points at any given moment in time. Persisted cardinality operates differently, because it's a cumulative measure that calculates the sum of the unique time series of the persisted writes that Observability Platform stores, seen over the last 2.5 hours only.
Because this measure is cumulative, changes you make to reduce persisted cardinality won't be immediately reflected as a decrease in the persisted cardinality consumption percentage. For example, you can use rollup rules to downsample and aggregate metrics before they're stored. However, inactive time series don't expire until they stop counting towards the rolling window, so changes aren't reflected until 2.5 hours after the series was last seen by Observability Platform.
Read more about how persisted cardinality limits work, and then learn about how you can manage persisted cardinality limits and avoid persisted cardinality limits.
To see persisted cardinality license usage changes over time, in the License overview select Trends, and review the Metrics consumption trends graph.
How persisted cardinality limits work
Persisted cardinality is comparable to a leaky bucket. Over time, new series can be added until the bucket is full. When the bucket is at maximum capacity, there's no space for new time series, so they're rejected. When existing time series expire, they make room for new series.
In the following example, the persisted cardinality capacity is five unique time series. The animated image shows the lifecycle of six, unique time series (A, B, C, D, E, and F) as new data points are added, and as other data points expire.
As data points are introduced, they're either accepted or rejected based on whether the persisted cardinality bucket is full (reached maximum capacity), and whether the related time series already exists in the bucket:
- If the bucket is at maximum capacity and the series already exists, the data point is accepted.
- If the bucket is at maximum capacity and the series doesn't exist, the series is rejected.
The following table shows how data points A3, E1, and F1 are processed, based on the bucket status:
Data point | Status | Description |
---|---|---|
A3 | Time series A is in the bucket, so data point A3 is accepted. | |
E1 | Time series E isn't in the bucket, but the bucket has space for one more time series, so data point E1 is accepted in time series E. | |
F1 | Time series F isn't in the bucket, and the bucket is at capacity, so data point F1 is rejected. |
Over time, data points expire based on when they entered the bucket. When data points exceed the 2.5 hour window, they're excluded from the persisted cardinality bucket.
In the example, data points A1 and D1 expired, so they're excluded from the bucket. When data point C1 expires, it's also excluded. Because data point C1 is the last data point in time series C, the entire series is removed, making space for a new time series in the bucket.
Data point | Status | Description |
---|---|---|
A1 | Data point A1 expired, so it's excluded from the bucket. | |
D1 | Data point A1 expired, so it's excluded from the bucket. | |
C1 | Data point C1 is expiring, so it's excluded from the bucket. |
Manage persisted cardinality limits
If your organization exceeds 100% of their Persisted Cardinality Capacity Limit, data points for any new time series not seen in the last 2.5 hours will be dropped until you're below this limit. Data points for existing time series will continue to be persisted.
Series that are more stable or regularly emitted aren't at risk of being dropped because they're always in the system, and are not categorized as new series. For example, series that don't change any labels are considered more stable.
To fully resolve a penalty period, the rate of new series must be less than the rate of expiring series. The higher the differential between these rates, the faster the penalty resolves.
To manage persisted cardinality limits:
- Review the Persisted Cardinality Quotas dashboard, the Usage Dashboard and the Metric Growth dashboard to understand the source of cardinality growth.
- Create drop rules and aggregation rules like mapping rules and rollup rules to roll away sources of growth. Old series remain in the cardinality window for 2.5 hrs.
The 2.5 hour expiration window is a rolling window, which means the constant rate of expiring series makes room for an equal rate of new series to be added. This behavior means the penalty period you experience can be much shorter than 2.5 hours.
Avoid persisted cardinality limits
Use the following tools and techniques to avoid hitting persisted cardinality limits:
- Review the Persisted Cardinality Quotas dashboard, the Usage Dashboard and the Metric Growth dashboard to understand the source of cardinality growth.
- Learn about different methods to reduce cardinality.
- Proactively create drop rules and aggregation rules like mapping rules and rollup rules ahead of potential overages to evict older time series and make room for new ones.
- If your organization knows which new metrics services are generating, try to control the rate that new series are introduced through smaller, more incremental deploys.
Capacity limits
Capacity limits indicate your maximum license capacity for metrics data in Observability Platform. Exceeding your capacity limits incurs penalties, which can result in dropped metrics. Dropped metrics can affect dashboards, alerts, and other reports.
The license limit indicates your contractual system license with Chronosphere.
The capacity and license limits display License Overview Contracts page. These limits are broken down into individual limit graphs:
- Persisted writes
- Matched writes
- Persisted cardinality
- Histogram persisted writes
- Histogram matched writes
- Histogram persisted cardinality
Tracing licenses
View tracing license information in the Tracing License Consumption dashboard.
The Traces consumption section of the License Overview displays aspects of Processed GB and Persisted GB for the current month.
These include:
- Daily average rates per second
- Month to date cumulative trends
On the Trends page, you can view trace consumption trends in higher resolution.
View your Processed and Persisted license limits on the Contract page.
Events
The Events consumption graph for Persisted Capacity displays the percentage of consumed events versus your events license.
On the Trends page, you can view event consumption trends in higher resolution.
The Persisted capacity limit displays the number of events that can persist per minute in your tenant.
This limit is enforced and can incur penalties if exceeded. In certain circumstances, this limit can exceed the license limit temporarily. The *Capacity Limit displays the number of events that can persist per minute, as defined by your the license in your contract with Chronosphere.
The Latest consumption percentage displays a decimal ratio of persisted data against your per-minute license limit for the selected time period. This value is calculated by dividing the persisted events per minute by the number of events that can persist per minute, defined by the license limit. Use this information to understand the relationship between your persisted data consumption and the defined license limit.
Ingestion limits and retention policies
Retention policies define the amount of time Observability Platform retains telemetry data. Contact Chronosphere Support to configure the intervals used for your system. These policies might be based on your contract or license.
View retention policies in your License Overview Contract page.
Ingestion limits define the amount of raw data Observability Platform can ingest.
Metrics
Your system's Metrics Ingest Retention Policies might look like:
- Five days for raw data, and resolutions of 15, 30, and 60 seconds.
- 120 days for one-minute data.
- 180 days for one-hour data.
- 1825 days for 24-hour data.
Metrics rules use the existing, configured set of intervals in rule definitions.
Change events policies
Change events have a default retention policy of 90 days. This value is fixed and can't be altered. For information about ingest limits for change events, see Change event limits.
The Contract page also displays the Persisted Capacity per minute for events.
Traces license limits
The Contract page displays your license limits for Persisted Writes and Matched Writes, and your Traces Ingest Retention Policies.
Traces have a default retention policy of 30 days of raw data.
Logs license limits
The Contract page displays your Persisted Logs Limit and Logs Ingest Retention Policies*.
Logs have a default retention policy of 30 days of raw data.