Manage metric pools

In the Metrics Quotas page of the Chronosphere Observability Platform web app, you can review current and preview potential impacts to your quota allocations by pool. Use this feature to shape and plan your pool sizes and future usage.

Pool allocations

You can define pool allocations either as a percentage (percent_of_license) or as a fixed value (fixed_values) in Data Points Per Second (DPPS). Any remaining capacity within each license is allocated to the default pool, after subtracting the sum of allocations across pools for that license.

You can specify any combination of percent_of_license and fixed_values for each license dimension. However, all pools within a license dimension must use the same units. For example, if matched writes uses percent_of_license, all pools must use that unit for matched writes. Similarly, if persisted writes uses fixed_values, all pools must use fixed value for persisted writes.

  • percent_of_license: Specify the percentage of the license to allocate to a pool. This value applies to any license dimensions without fixed_values defined.

  • fixed_values: Specify a fixed value for a license dimension in DPPS. You can set a single fixed value per license dimension. Any fixed_values take precedence over percent_of_license for a given license.

    If you set any fixed_values, you can specify allocations for both matched writes license and persisted writes. These allocations are available for both standard and histogram metrics. See the CreateResourcePools endpoint for more information.

    ⚠️

    The sum of fixed values across all defined pools must be less than or equal to the total allotted capacity, defined by the capacity limit. If your organization exceeds the capacity limit, where the sum of fixed values exceeds total capacity, a penalty is applied to all pools proportional to fixed allocations. A validation in Terraform penalizes any pool that exceeds its allotted quota.

    In this penalty state, the default pool receives no allocation, and other pools are adjusted down proportionally so that the sum of fixed values is equal to the capacity limit.

Create a pool

Select from the following methods to create pools.

When creating a pool with Terraform, Observability Platform uses dry run mode by default. Any applied quota configuration appears in the Metrics Quotas Dashboard, which shows how the pool's traffic interacts with the pool's quota, without penalizing that pool if the system goes over its limit. Use dry run mode to experiment with different pool configurations and quota levels before you enable enforcing mode.

You can have a maximum of 20 metric pools.

Although actual management of pools is handled using Terraform, this interface helps you understand what changes to make to reduce guesswork and repeated updates to your system.

To create quota pools, you must have administrative privileges:

  1. In the navigation menu, click Go to Admin and then select Control > Metrics Quotas.

  2. Click Configure Quotas.

  3. Click + Add Pool.

  4. The following fields display on the page. Update editable fields to modify your pool configuration:

    • Pool name: (Editable) Change the pool name.

    • Data matching: The values the selected pool uses to match data.

      • Quota configuration label: The label matching this pool.
      • Data matching values: (Editable) The specific values for the label, which match this pool. Add a value and press Enter to view a list. This value supports glob syntax.
    • Observed label consumption: Displays the total average DPPS for this pool, broken down by label value.

    • Quota allocation: (Editable) Set a quota percentage or DPPS for the selected pool, similar to the Preview quota allocations section. The value you enter, whether percentage or DPPS, applies to both Standard and Histogram Metrics.

    • Prioritization: (Conditionally editable) Add the Priority Label and high or low priority values.

      Pools using a global priority setting can't change their priorities on an individual pool page.

  5. In the Quota Allocation chart, select or clear the checkboxes next to each pool name to display or remove that pool from the bar chart.

    This panel includes a chart that displays a graph for total consumption, and a bar chart for consumption by pool.

  6. Click Done after completing your changes.

  7. Click the Code Config tab.

  8. Click Copy to copy the file, or Download to download the file to your computer.

  9. Add the definition to a Terraform file, or create a new Terraform file.

  10. Run this command to apply the resource:

    terraform apply

Terraform pool example

The following code is an example of a Terraform file used to create quotas and priorities:

resource "chronosphere_resource_pools_config" "resource_pools" {
  default_pool {
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
 
  pool {
    name = "Tracing Services"
 
    allocation {
      percent_of_license = 10
 
      fixed_value {
        license = "PERSISTED_WRITES_STANDARD"
        value = 6500
      }
 
      fixed_value {
        license = "PERSISTED_WRITES_HISTOGRAM"
        value = 2500
      }
    }
 
    match_rules = ["service:{spanhandler,traceingester}"]
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
 
  pool {
    name = "M3 Services"
 
    allocation {
      percent_of_license = 25
    }
 
    match_rules = ["service:m3*"]
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
 
  pool {
    name = "Gateway Services"
 
    allocation {
      percent_of_license = 4
    }
 
    match_rules = ["service:gateway*"]
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
}

Edit a pool

Select from the following methods to edit pools. You can also edit global settings to change global pool quota configurations by metric label.

To edit an existing pool:

  1. In the navigation menu, click Go to Admin and then select Control > Metrics Quotas.

  2. Click Configure Quotas.

  3. Click any row in the Pools table to display the Edit Pool page.

    The Edit Pool page contains information specific to the selected pool. These fields match the Add Pool screen, and some values can be edited.

  4. Make any necessary changes, and then click Done after completing your changes.

  5. Click the Code Config tab.

  6. Click Copy to copy the file, or Download to download the file to your computer.

  7. Add the definition to a Terraform file, or create a new Terraform file.

  8. Run this command to apply the resource:

    terraform apply

Understanding pool usage

The Pools section of the Configure Quotas page describes what pools you have, and how they're configured. This includes:

  • Pool name: The display name of the pool.
  • Data matching: The label values the pool matches.
  • Allocation: The percentage or Data Points Per Second (DPPS) of total traffic guaranteed to the pool before it might be penalized.
  • Consumption: The percentage or DPPS of total traffic the pool is consuming over the selected time range.

Allocation and Consumption DPPS describes Standard Metrics License allocation and consumption. Histogram Metrics License allocation and consumption aren't included in the pool's reported DPPS.

Quota allocations and consumption

The Quota Allocations vs Current Consumption graph is a bar chart used to visualize the current quota allocation in Data Points Per Second (DPPS) for each pool, and what the pool is actually consuming. The top bar for each pool is the Allocation. The second bar is the Current Consumption for the pools. Using these two bars, you can determine consumption in relation to the allocated quota. Point to any group of bars to display a dialog with exact values.

Quota consumption by pools (per second)

The Quota Consumption by pools graph separates the pools so you can see each pool's Average consumption and its Current quota limit.

Understand quota consumption trends

The Quota Consumption graph is a running line of data points and quota limits, where the previous graph displays only a single value. Point to any point on the graph to see exact data. Drag in a graph to focus in on the selected time period.

Preview quota allocations

The Pools include a group of text fields corresponding to each created pool. These boxes contain values with the assigned percentage (%) or DPPS for each pool. Use these to set your general pool allocations.

To preview a new quota allocation, change a number in the box for the pool to be updated. Click outside the boxes to update the total.

Quota settings must meet the following criteria:

  • Quotas must add up to 100%. Changing any pool's quota causes a related change in the default pool to ensure total quota is 100%.
  • The UI supports values greater than or equal to 0.01%.
  • The API supports values greater than or equal to 0.001%.

Changing an assigned quota displays a third bar in the Quota consumption by pools chart. Use the new bar to determine if new quota assignments meet the needs of each of your pools.

If one pool is consistently over quota and the other pools aren't, use the preview to adjust assigned quotas to better meet the needs of each pool.

Click the Reset quotas icon to return to the existing configuration.

Edit global settings

Click Edit Global Settings to change global pool quota configurations by metric label. Any changes to quota configuration labels require updates to all pools.

  1. Add a Quota Configuration Label Select an existing configured pool to update.

  2. Add a Prioritization The following settings affect which traffic is considered most or least important in an overage situation:

  3. Select Configure Globally or Configure per pool to apply the pool filtering globally, or to a single pool.

    When selecting Configure per pool, edit each pool to set priorities.

    For global configurations, select:

    • Prioritization label: Select a label to change its priority.
    • High priority values: Add a label value, such as production* to ensure metrics with that label value are retained.
    • Low priority values: Add a label value, such as test* to drop metrics of lower importance first.
  4. Click Done when finished.

  5. Click the Code Config tab.

  6. Click Copy to copy the file, or Download to download the file to your computer.

  7. Add the definition to a Terraform file, or create a new Terraform file.

  8. Run this command to apply the resource:

    terraform apply

Priorities

If you have set up metrics quotas and your system goes over its persisted writes license, Observability Platform drops metrics from pools that exceed their respective quotas until all pools meet their quotas. Observability Platform penalizes only pools that exceed their persisted writes quota. To more selectively decide which metrics within a given pool to drop first during a penalty scenario, specify priorities for each pool.

  • High: Metrics dropped last.
  • Low: Metrics dropped first.
  • Default: Metrics dropped after Low priority metrics, but before High priority metrics.

Priority values support glob syntax.

Chronosphere recommends assigning all low-priority traffic to the same pool. When low-priority data is split between pools, higher-priority traffic can drop from a penalized pool, even though there is lower-priority traffic in a different, non-penalized pool.

Delete a pool

Select from the following methods to delete pools.

To edit an existing pool:

  1. In the navigation menu, click Go to Admin and then select Control > Metrics Quotas.

  2. Click Configure Quotas.

  3. Click any row in the Pools table to display the Edit Pool page.

  4. At the bottom of the Edit Pool page, click Remove Pool.

    The pool is removed from the Configure Quotas page, and no longer displays in the resource definition of the Code Config tab.

Best practices

To keep penalty behavior and cost accounting transparent and predictable, pools should be hard partitions of your system, with no one time series matching more than one pool. The following processes help ensure pools have the correct data:

  • Chronosphere recommends selecting a single usage tag as the pool assignment mechanism. Picking a single tag reduces the possibility where one pool matches serviceX, and a second pool matches environmentY, where time series might match either or both definitions.
  • Use exact match values for the selected label to decrease the chances of other tags or a regular expression match allowing a time series to fit into more than one pool.
  • Chronosphere Observability Platform uses match-ordering in pools. If a time series matches more than one pool, it becomes part of the first pool in the list that it matches. A time series might match more than one pool's criteria, but a first-match policy ensures that a time series is accounted for consistently in a single pool.
  • If you see a pool that doesn't match the expected penalty behavior, open the pool in the profiler and compare it with the Terraform configuration file. A match rule value might be incorrect.

Create an alert for a pool in penalty

When a pool is in a penalty state, it might drop metrics to reduce usage. For higher priority pools, this can result in the loss of important data. To reduce or prevent data loss, create a monitor to alert on pool usage and notify the appropriate team to take preventative action.

  1. Create a notification policy with criteria like warn if value is > 90% for 5 minutes.

  2. Create a notifier and align with your internal alerting policies to route the alerts to the right team.

  3. Create a monitor.

  4. In the monitor query, add the query to return each pool's percent utilization. You can set up alerts using the chrono_poolstats_count metric. A query that returns each pool's percent utilization of its quota might look like this:

    (sum by (metrics_class) (rate(chrono_poolstats_count{type="persisted", dropped="no"}[2m]))) /
    ((sum by(metrics_class) (coordinator_scheduler_metrics_class_weight{})) * on() group_left() limit_service_licensed_persist_limit{})
  5. In the Signals section, select Per time series (many alerts) as the signal.

  6. Define any additional fields, such as a condition and sustain period.

  7. Click Save to save the monitor.