Manage metric pools

In the Metrics Quotas section of the Chronosphere app, you can review current and preview potential impacts to your quota allocations by pool. Use this to shape and plan your pool sizes and future usage.

Understanding pool usage

The Pools section of the page describes what pools you have, and how they're configured. This includes:

  • Pool name: The display name of the pool.
  • Data matching: The label values the pool matches.
  • Allocation: The percentage or DPPS of total traffic guaranteed to the pool before it might be penalized.
  • Consumption: The percentage or DPPS of total traffic the pool is consuming over the selected time range.

Quota allocations and consumption

The Quota Allocations vs consumption graph is a bar chart used to visualize the current quota allocation in Data Points Per Second (DPPS) for each pool, and what the pool is actually consuming. The top bar for each pool is the Allocation. The second bar is the current consumption for the pools. Using these two bars, you can determine consumption in relation to the allocated quota. Point to any group of bars to display a dialog with exact values.

Quota consumption by pools (per second)

The Quota consumption by pools graph separates the pools so you can see each pool's Average consumption and its Current quota limit.

Understand quota consumption trends

The Quota consumption trend graphs are a running line of data points and quota limits, where the previous graph displays only a single value. Point to any point on the graph to see exact data. Drag in a graph to focus in on the selected time period.

Preview quota allocations

The Pools include a group of text fields corresponding to each created pool. These boxes contain values with the assigned percentage (%) or DPPS for each pool. Use these to set your general pool allocations.

To preview a new quota allocation, change a number in the box for the pool to be updated. Click outside the boxes to update the total.

Quota settings must meet the following criteria:

  • Quotas must add up to 100%. Changing any pool's quota causes a related change in the default pool to ensure total quota is 100%.
  • The UI supports values greater than or equal to 0.01%.
  • The API supports values greater than or equal to 0.001%.

Changing an assigned quota displays a third bar in the Quota consumption by pools chart. Use the new bar to determine if new quota assignments meet the needs of each of your pools.

If one pool is consistently over quota and the other pools aren't, use the preview to adjust assigned quotas to better meet the needs of each pool.

Click the Reset quotas icon to return to the existing configuration.

Create or change a pool

Although actual management of pools is handled using Terraform, this interface helps you understand what changes to make to reduce guesswork and repeated updates to your system. Chronosphere uses dry run mode by default. Any applied quota configuration appears in the Metrics Quotas Dashboard and shows you how the pool's traffic interacts with the pool's quota, without penalizing that pool if the system goes over its limit. Use dry run mode to experiment with different pool configurations and quota levels before you enable enforcing mode.

To create or edit quota pools, you must have administrative privileges:

  1. Click Go to Admin.
  2. Select Control > Metrics Quotas and then click Configure quotas.
  3. Click + Add Pool.
  4. The following fields display on the page. Update editable fields to modify your pool configuration.
    • Pool name: (Editable) Change the pool name.
    • Data matching: The values the selected pool uses to match data.
      • Quota configuration label: The label matching this pool.
      • Data matching values: (Editable) The specific values for the label, which match this pool. Add a value and press Enter to view a list. This value supports glob syntax.
    • Observed label consumption: Displays the total average DPPS for this pool, broken down by label value.
    • Quota allocation: (Editable) Set a quota percentage for the selected pool, similar to the Preview quota allocations section.
    • Prioritization: (Conditionally editable) Add the Priority Label and high or low priority values.

      Pools using a global priority setting can't change their priorities on an individual pool page.

  5. Click Done after completing your changes.
  6. Use Terraform to apply your changes.

You can also view the graph for total consumption, and a bar chart for consumption by pool. Select or clear the checkboxes next to each name to display or remove that pool from the bar chart.

When you have completed your changes, click Done.

To edit an existing pool:

  1. Click any row in the Pools table to edit that pool.
  2. The Edit Pool page contains information specific to the selected pool. These fields match the Add Pool screen, and some values can be edited.
  3. Click Done after completing your changes.+
  4. Use Terraform to apply your changes.

Priorities

If you have set up metrics quotas and your system goes over its persisted writes license, Chronosphere drops metrics from pools that exceed their respective quotas until all pools meet their quotas. Chronosphere penalizes only pools that exceed their persisted writes quota. To more selectively decide which metrics within a given pool to drop first during a penalty scenario, specify priorities for each pool.

  • High: Metrics dropped last.
  • Low: Metrics dropped first.
  • Default: Metrics dropped after Low priority metrics, but before High priority metrics.

Priority values support glob syntax.

Edit global settings

Click Edit Global Settings to change global pool quota configurations by metric label. Any changes to quota configuration labels require updates to all pools.

  1. Add a Quota Configuration Label Select an existing configured pool to update.

  2. Add a Prioritization The following settings affect which traffic is considered most or least important in an overage situation:

  3. Select Configure Globally or Configure per pool to apply the pool filtering globally, or to a single pool.

    When selecting Configure per pool, edit each pool to set priorities.

    For global configurations, select:

    • Prioritization label: Select a label to change its priority.
    • High priority values: Add a label value, such as production* to ensure metrics with that label value are retained.
    • Low priority values: Add a label value, such as test* to drop metrics of lower importance first.
  4. Click Done when finished.

  5. Use Terraform to apply your changes.

Best practices

To keep penalty behavior and cost accounting transparent and predictable, pools should be hard partitions of your system, with no one time series matching more than one pool. The following processes help ensure pools have the correct data:

  • Chronosphere recommends selecting a single usage tag as the pool assignment mechanism. Picking a single tag reduces the possibility where one pool matches serviceX, and a second pool matches environmentY, where time series might match either or both definitions.
  • Use exact match values for the selected label to decrease the chances of other tags or a regular expression match allowing a time series to fit into more than one pool.
  • Chronosphere uses match-ordering in pools. If a time series matches more than one pool, it becomes part of the first pool in the list that it matches. A time series might match more than one pool's criteria, but a first-match policy ensures that a time series is accounted for consistently in a single pool.
  • If you see a pool that doesn't match the expected penalty behavior, open the pool in the profiler and compare it with the Terraform configuration file. A match rule value might be incorrect.

Apply changes with Terraform

After you have configured your changes in the app:

  1. Click Code config.

  2. Copy or download the file to your computer.

  3. Add the definition to a Terraform file.

  4. Run this command to create the resource:

    terraform apply

The following code is an example of a Terraform file used to create quotas and priorities:

resource "chronosphere_resource_pools_config" "resource_pools" {
  default_pool {
    allocation {
      percent_of_license = 61
    }
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
 
  pool {
    name = "Tracing Services"
 
    allocation {
      percent_of_license = 10
    }
 
    match_rules = ["service:{spanhandler,traceingester}"]
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
 
  pool {
    name = "M3 Services"
 
    allocation {
      percent_of_license = 25
    }
 
    match_rules = ["service:m3*"]
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
 
  pool {
    name = "Gateway Services"
 
    allocation {
      percent_of_license = 4
    }
 
    match_rules = ["service:gateway*"]
 
    priorities {
      high_priority_match_rules = ["chronosphere_k8s_cluster:production*"]
      low_priority_match_rules  = ["chronosphere_k8s_cluster:rc*"]
    }
  }
}}

Create an alert for a pool in penalty

When a pool is in a penalty state, it might drop metrics to reduce usage. For higher priority pools, this can result in the loss of important data. To reduce or prevent data loss, create a monitor to alert on pool usage and notify the appropriate team to take preventative action.

  1. Create a notification policy with criteria like warn if value is > 90% for 5 minutes.

  2. Create a notifier and align with your internal alerting policies to route the alerts to the right team.

  3. Create a monitor.

    1. In the monitor query, add the query to return each pool's percent utilization. You can set up alerts using the chrono_poolstats_count metric. A query that returns each pool's percent utilization of its quota might look like this:

      (sum by (metrics_class) (rate(chrono_poolstats_count{type="persisted", dropped="no"}[2m]))) / ((sum by(metrics_class) (coordinator_scheduler_metrics_class_weight{})) * on() group_left() limit_service_licensed_persist_limit{})
    2. In the Signals section, select Per time series (many alerts) as the signal.