OBSERVABILITY PLATFORM
Manage pools

Manage metric pools

In the Metrics Quotas page of Chronosphere Observability Platform, you can review current and preview potential impacts to your quota allocations by pool. Use this feature to shape and plan your pool sizes and future usage.

To get started with pools, learn about how to define a pool.

Define a metric pool

Before adding a metric pool, review these concepts and how you apply them to help shape how you define the pool:

  • Pool allocations define how much of your total persisted writes license you want allocated to each pool. You can define allocations for each of your licenses individually, or have the same allocations across all pools.
  • Pool priorities let you selectively decide which metrics within a given pool to drop first during a penalty scenario. You can set priorities at a global level (for all pools), or individually.
  • Pool thresholds are an optional capability that provide more strict control on persisted cardinality on individual pools. Configure thresholds as a proactive measure to strictly enforce dropping data, even when the overall capacity limit hasn’t been exceeded.

Any applied quota configuration displays in the Metrics Quotas page, which shows how the pool’s traffic interacts with the pool’s quota, without penalizing that pool if the system goes over its limit. For some quotas, such as Matched Writes Quotas and Persisted Cardinality Quotas, see the License Overview

Pool allocations

You can define pool allocations either as a percentage (percent_of_license) that applies to all pools for all licenses, or as a fixed value (fixed_values) in data points per second (DPPS) for individual licenses. Any remaining capacity within each license is allocated to the default pool, after subtracting the sum of allocations across pools for that license.

You can specify any combination of percent_of_license and fixed_values for each license dimension. However, all pools within a license dimension must use the same units. For example, if matched writes uses percent_of_license, all pools must use that unit for matched writes. Similarly, if persisted writes uses fixed_values, all pools must use fixed value for persisted writes.

  • percent_of_license: Specify the percentage of the license to allocate to a pool. This value applies to any license dimensions without fixed_values defined.

  • fixed_values: Specify a fixed value for a license dimension in DPPS. You can set a single fixed value per license dimension. Any fixed_values take precedence over percent_of_license for a given license.

    If you set any fixed_values, you can specify allocations for both matched writes license and persisted writes. These allocations are available for both standard and histogram metrics. See the CreateResourcePools endpoint for more information.

    ⚠️

    The sum of fixed values across all defined pools must be less than or equal to the total allotted capacity, defined by the capacity limit. If your organization exceeds the capacity limit, where the sum of fixed values exceeds total capacity, a penalty is applied to all pools proportional to fixed allocations. A validation in Terraform penalizes any pool that exceeds its allotted quota.

    In this penalty state, the default pool receives no allocation, and other pools are adjusted down proportionally so that the sum of fixed values is equal to the capacity limit.

Configure priority

If you configured metrics quotas and your system exceeds its license limit, Observability Platform drops metrics from pools that exceed their respective quotas until all pools meet their quotas. Observability Platform penalizes only pools that exceed their persisted writes quota.

To more selectively decide which metrics within a given pool to drop first during a penalty scenario, specify priorities for each pool:

  • High: Metrics dropped last.
  • Low: Metrics dropped first.
  • Default: Metrics dropped after low priority metrics, but before high priority metrics.

For persisted writes and matched writes, Observability Platform uses these priorities to determine the order of drops if your organization exceeds their capacity limit. These priorities are also used in conjunction with setting thresholds.

Priority values support glob syntax.

Chronosphere recommends assigning all low-priority traffic to the same pool. When low-priority data is split between pools, higher-priority traffic can drop from a penalized pool, even though there is lower-priority traffic in a different, non-penalized pool.

Configure global priority

You can change global pool quota configurations by metric label. Any changes to quota configuration labels require updates to all pools.

to change global pool quota configurations by metric label. Any changes to quota configuration labels require updates to all pools.

  1. On the Metrics Quotas page, click Configure Quotas.

  2. Click Edit Global Settings.

  3. In the Edit Global Pool Settings dialog, select a label from the Quota Configuration Label dropdown. This label is the label key that defines which keys can be used to create a pool.

  4. Select Configure Globally to apply the pool filtering globally, and complete the following fields:

    • Prioritization label: Select a label to change its priority.
    • High priority values: Add a label value, such as production* to ensure metrics with that label value are retained.
    • Low priority values: Add a label value, such as test* to drop metrics of lower importance first.
  5. Click Done when finished.

  6. Click the Code Config tab.

  7. Click Copy to copy the file, or Download to download the file to your computer.

  8. Add the definition to a Terraform file, or create a new Terraform file.

  9. Run this command to apply the resource:

    terraform apply

Configure priority per pool

You can configure priority for each pool instead of configuring priority globally. Complete the following steps to configure priority for each pool individually.

  1. On the Metrics Quotas page, click Configure Quotas.
  2. Click Edit Global Settings.
  3. In the Edit Global Pool Settings dialog, select a label from the Quota Configuration Label dropdown. This label is the label key that defines which keys can be used to create a pool.
  4. Select Configure per pool to set priority independently for each pool.
  5. Click Done when finished.
  6. Edit each pool to set priorities.

Pool thresholds

This feature is available only to specific Chronosphere Observability Platform users, and has not been announced or officially released. Do not share or discuss this feature, or information about it, with anyone outside of your organization.

After configuring pool priorities, administrative users can optionally configure thresholds on individual pools to better manage persisted cardinality.

Thresholds let you strictly enforce certain pools when they exceed their defined allocation, even if the overall capacity limit wasn’t exceeded. By proactively limiting series in strictly enforced pools, thresholds prevents those series from consuming portions of the overall cardinality budget and inadvertently affecting pools that haven’t exceeded their allocation.

Administrative users can configure the following thresholds to take a more proactive approach to budget optimization, enabling them to implement incremental steps to prevent an overage before it occurs.

To help solve the “noisy neighbor” problem and isolate the impact of changes to individual teams, strictly enforce the all priorities threshold:

  • All priorities threshold: Stop accepting data of any priority at the specified threshold, if consumption for the pool exceeds the threshold value. This threshold limits churn in a pool from exceeding the defined threshold and inadvertently affecting other pools.

To proactively limit churn for low and medium-priority series, configure low and medium priority thresholds to ensure there’s room in your license for high-priority series:

  • Low priority threshold: Stop accepting low priority data only at the specified threshold, if low priority data exceeds the threshold value. This threshold limits churn in low-priority series that exceed the threshold.
  • Low and medium priority threshold: Stop accepting low and medium priority data at the specified threshold, if low and medium priority data combine to exceed the threshold value. This threshold limits churn in low and medium-priority series that exceed the threshold.

To determine the threshold values for each pool, use the Persisted Cardinality Quotas dashboard to identify usage trends. The data in this dashboard can help inform recommended thresholds for low, medium, and high priority series in each pool. For example,

  • If the control team consistently uses approximately 95% of its pool, and you want to prevent an overage from resulting in drops in other pools, set a strict threshold (all_priorities) to 100%.
  • If the control team contains data for your most important pool, and you’d rather drop data from all other pools except this one, set strict thresholds on all other pools except the control team pool.
  • If the control team often experiments with new series in its development and staging environments, set proactive thresholds to 10% for low priority and 20% for combined low and medium priority data to preserve space for high-priority series.

Configure pool thresholds

Configuring pool thresholds is supported only in Terraform and the CreateResourcePools endpoint.

Complete the following steps to set pool thresholds. See the example for how to configure thresholds within a pool.

  1. Add the priority_thresholds object to your existing metric pools definition with either Terraform or the CreateResourcePools endpoint.

  2. Define the license you want the threshold to operate on. Thresholds support these licenses:

    • PERSISTED_CARDINALITY_STANDARD: Refers to the standard metric license, which measures the current consumption rates across persisted writes, matched writes, and persisted cardinality license dimensions measured against the capacity limit.
    • PERSISTED_CARDINALITY_HISTOGRAM: Refers to the histogram metrics license, which measures the current consumption rate across all histogram metrics license dimensions measured against the capacity limit.
  3. Define the thresholds you want to configure, which can be one of the following values:

    • all_priorities: Stop accepting any data (low, medium, and high) at the specified threshold if consumption for the pool exceeds threshold value.
    • default_and_low_priority: Stop accepting low and medium priority data at the specified threshold if low and medium priority data combined exceed threshold value.
    • low_priority: Stop accepting only low priority data at the specified threshold if low priority data exceeds threshold value.
  4. Save and apply your metric pools definition.

After making changes, use the Persisted Cardinality Quotas dashboard to track which pools are approaching or exceeding defined thresholds, identify where drops are occurring, and view which priority levels are affected.

After updating definitions for priorities or pools, only new inbound time series adhere to the new rules immediately. Any existing, inactive series that were already attributed to a changed pool might continue to count towards your cardinality limit until they naturally expire in the 150 minute rolling window. This means that it might take 150 minutes for cardinality per pool and per priority to accurately reflect counts.

Pool thresholds example

In the following example, priority thresholds are set for

  pool {
    name = "Control Services"
    allocation {
      percent_of_license = 16
      priority_thresholds {
        license = "PERSISTED_CARDINALITY_STANDARD"
        all_priorities {
          percent_of_pool_allocation = 100
        }
        default_and_low_priority {
          percent_of_pool_allocation = 50
        }
        low_priority {
          percent_of_pool_allocation = 25
        }
      }
      priority_thresholds {
        license = "PERSISTED_CARDINALITY_HISTOGRAM"
        all_priorities {
          percent_of_pool_allocation = 100
        }
        low_priority {
          percent_of_pool_allocation = 25
        }
      }
    }
    match_rules = ["service:{${join(",", control_services)}}"]
    priorities {
      high_priority_match_rules = ["cluster:production*"]
      low_priority_match_rules  = ["cluster:test*"]
    }
  }

Add a metric pool

Select from the following methods to add a metric pools. You can define a pool in Observability Platform, but must use Terraform to apply the changes.

You can have a maximum of 20 metric pools.

Although actual management of pools is handled using Terraform, this interface helps you understand what changes to make to reduce guesswork and repeated updates to your system.

To create quota pools, you must have administrative privileges:

  1. In the navigation menu, click Go to Admin and then select Control > Metrics Quotas.

  2. Click Configure Quotas.

  3. Click + Add Pool.

  4. The following fields display on the page. Update editable fields to modify your pool configuration:

    • Pool name: (Editable) Change the pool name.

    • Data matching: The values the selected pool uses to match data.

      • Quota configuration label: The label matching this pool.
      • Data matching values: (Editable) The specific values for the label, which match this pool. Add a value and press Enter to view a list. This value supports glob syntax.
    • Observed label consumption: Displays the total average DPPS for this pool, broken down by label value.

    • Quota allocation: (Editable) Set a quota percentage or DPPS for the selected pool, similar to the Preview quota allocations section. The value you enter, whether percentage or DPPS, applies to both Standard and Histogram Metrics.

    • Prioritization: (Conditionally editable) Add the Priority Label and high or low priority values.

      Pools using a global priority setting can’t change their priorities on an individual pool page.

  5. In the Quota Allocation chart, select or clear the checkboxes next to each pool name to display or remove that pool from the bar chart.

    This panel includes a chart that displays a graph for total consumption, and a bar chart for consumption by pool.

  6. Click Done after completing your changes.

  7. Click the Code Config tab.

  8. Click Copy to copy the file, or Download to download the file to your computer.

  9. Add the definition to a Terraform file, or create a new Terraform file.

  10. Run this command to apply the resource:

    terraform apply

Terraform pool example

The following code is an example of a Terraform file used to create quotas and priorities:

resource "chronosphere_resource_pools_config" "resource_pools" {
  default_pool {
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
 
  pool {
    name = "Tracing Services"
 
    allocation {
      percent_of_license = 10
 
      fixed_value {
        license = "PERSISTED_WRITES_STANDARD"
        value = 6500
      }
 
      fixed_value {
        license = "PERSISTED_WRITES_HISTOGRAM"
        value = 2500
      }
    }
 
    match_rules = ["service:{spanhandler,traceingester}"]
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
 
  pool {
    name = "M3 Services"
 
    allocation {
      percent_of_license = 25
    }
 
    match_rules = ["service:m3*"]
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
 
  pool {
    name = "Gateway Services"
 
    allocation {
      percent_of_license = 4
    }
 
    match_rules = ["service:gateway*"]
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
}

Edit a pool

Select from the following methods to edit pools. You can also configure global priorities to change global pool quota configurations by metric label.

To edit an existing pool:

  1. In the navigation menu, click Go to Admin and then select Control > Metrics Quotas.

  2. Click Configure Quotas.

  3. Click any row in the Pools table to display the Edit Pool page.

    The Edit Pool page contains information specific to the selected pool. These fields match the Add Pool screen, and some values can be edited.

  4. Make any necessary changes, and then click Done after completing your changes.

  5. Click the Code Config tab.

  6. Click Copy to copy the file, or Download to download the file to your computer.

  7. Add the definition to a Terraform file, or create a new Terraform file.

  8. Run this command to apply the resource:

    terraform apply

Understanding pool usage

The Pools section of the Configure Quotas page describes what pools you have, and how they’re configured. This includes:

  • Pool name: The display name of the pool.
  • Data matching: The label values the pool matches.
  • Allocation: The percentage or Data Points Per Second (DPPS) of total traffic guaranteed to the pool before it might be penalized.
  • Consumption: The percentage or DPPS of total traffic the pool is consuming over the selected time range.

Allocation and Consumption DPPS describes Standard Metrics License allocation and consumption. Histogram Metrics License allocation and consumption aren’t included in the pool’s reported DPPS.

Quota allocations and consumption

The Quota Allocations vs Current Consumption graph is a bar chart used to visualize the current quota allocation in Data Points Per Second (DPPS) for each pool, and what the pool is actually consuming. The top bar for each pool is the Allocation. The second bar is the Current Consumption for the pools. Using these two bars, you can determine consumption in relation to the allocated quota. Point to any group of bars to display a dialog with exact values.

Quota consumption by pools (per second)

The Quota Consumption by pools graph separates the pools so you can see each pool’s Average consumption and its Current quota limit.

Understand quota consumption trends

The Quota Consumption graph is a running line of data points and quota limits, where the previous graph displays only a single value. Point to any point on the graph to see exact data. Drag in a graph to focus in on the selected time period.

Preview quota allocations

The Pools include a group of text fields corresponding to each created pool. These boxes contain values with the assigned percentage (%) or DPPS for each pool. Use these to set your general pool allocations.

To preview a new quota allocation, change a number in the box for the pool to be updated. Click outside the boxes to update the total.

Quota settings must meet the following criteria:

  • Quotas must add up to 100%. Changing any pool’s quota causes a related change in the default pool to ensure total quota is 100%.
  • The UI supports values greater than or equal to 0.01%.
  • The API supports values greater than or equal to 0.001%.

Changing an assigned quota displays a third bar in the Quota consumption by pools chart. Use the new bar to determine if new quota assignments meet the needs of each of your pools.

If one pool is consistently over quota and the other pools aren’t, use the preview to adjust assigned quotas to better meet the needs of each pool.

Click the Reset quotas icon to return to the existing configuration.

Delete a pool

Select from the following methods to delete pools.

To edit an existing pool:

  1. In the navigation menu, click Go to Admin and then select Control > Metrics Quotas.

  2. Click Configure Quotas.

  3. Click any row in the Pools table to display the Edit Pool page.

  4. At the bottom of the Edit Pool page, click Remove Pool.

    The pool is removed from the Configure Quotas page, and no longer displays in the resource definition of the Code Config tab.

Best practices

To keep penalty behavior and cost accounting transparent and predictable, pools should be hard partitions of your system, with no one time series matching more than one pool. The following processes help ensure pools have the correct data:

  • Chronosphere recommends selecting a single usage tag as the pool assignment mechanism. Picking a single tag reduces the possibility where one pool matches serviceX, and a second pool matches environmentY, where time series might match either or both definitions.
  • Use exact match values for the selected label to decrease the chances of other tags or a regular expression match allowing a time series to fit into more than one pool.
  • Chronosphere Observability Platform uses match-ordering in pools. If a time series matches more than one pool, it becomes part of the first pool in the list that it matches. A time series might match more than one pool’s criteria, but a first-match policy ensures that a time series is accounted for consistently in a single pool.
  • If you see a pool that doesn’t match the expected penalty behavior, open the pool in the profiler and compare it with the Terraform configuration file. A match rule value might be incorrect.

Create an alert for a pool in penalty

When a pool is in a penalty state, it might drop metrics to reduce usage. For higher priority pools, this can result in the loss of important data. To reduce or prevent data loss, create a monitor to alert on pool usage and notify the appropriate team to take preventative action.

  1. Create a notification policy with criteria like warn if value is > 90% for 5 minutes.

  2. Create a notifier and align with your internal alerting policies to route the alerts to the right team.

  3. Create a monitor.

  4. In the monitor query, add the query to return each pool’s percent utilization. You can set up alerts using the chrono_poolstats_count metric. A query that returns each pool’s percent utilization of its quota might look like this:

    (sum by (metrics_class) (rate(chrono_poolstats_count{type="persisted", dropped="no"}[2m]))) /
    ((sum by(metrics_class) (coordinator_scheduler_metrics_class_weight{})) * on() group_left() limit_service_licensed_persist_limit{})
  5. In the Signals section, select Per time series (many alerts) as the signal.

  6. Define any additional fields, such as a condition and sustain period.

  7. Click Save to save the monitor.