Signal groups

Signal groups

When a monitor alert triggers or is resolved, Chronosphere sends a notification.

Signal groups use a unique set of labels to create groups of notifications. The groups display information from your monitor queries that aggregation may otherwise remove. Use signal groups to reduce the number of notifications sent, or to improve visibility by showing each time series alert.

All signal groups use the following defaults:

  • Alerts are sent for the initial and any subsequent triggers for all signal grouping types.
  • All monitors are set to per monitor when created if no other option is selected.
  • Chronosphere sends an additional notification every five minutes when a new time series alert triggers.
  • If an alert remains active but no new time series trigger, Chronosphere sends a notification every hour.
  • Signal groups use the notification policies set in Chronosphere.

Use the following options to group alerts, which affects how many notifications Chronosphere sends:

  • Per monitor (one alert): Sends one notification containing all time series that meet the conditions. Useful if you want only one notification that contains all time series.
  • Per signal group (multiple alerts): Sends one notification for each group in your monitor. Helpful for logically grouping time series into the same alert.
  • Per time series (many alerts): Sends one notification for every time series returned by the monitor query. This option sends the most alerts, but is useful if you want to be notified for every time series that triggers your query.

Configure a signal group by editing an existing monitor or creating a new monitor.

Chronosphere reserves specific Prometheus labels such as alertname and severity. Chronosphere also uses the severity label to group alerts, except when a monitor specifies a signal per series.

Refer to the Prometheus metric naming recommendations for additional information.

View signal groups

To view a signal group:

  1. In the navigation menu select Alerting > Monitors & Signals and select the monitor to view.
  2. Click the subdirectory icon to view the signal group settings for the monitor.

Edit signal groups

  1. In the navigation menu select Alerting > Monitors & Signals and select the monitor to edit.

  2. Click the three vertical dots icon and select Edit Monitor.

  3. Scroll to the Signal Grouping section and select one of the following options:

    • Per monitor (one alert): Chronosphere sends a notification using your selected notification policy. It includes all time series triggered by the monitor. This alert can send additional notifications if new time series trigger.

    • Per Signal group (multiple alerts): In the Label Key field, choose the label to group alerts from this monitor. To add more label keys, click the add icon.

      You can use query aggregation to include or exclude specific labels.

      For example, create a query to group results by only the namespace and instance labels:

       count by (namespace, instance) (up)
    • Per time series (many alerts): The monitor sends a notification for every time series as it triggers. You can change the alert behavior or channel by changing the policy for the monitor or editing the notification policy.

Signal grouping examples

Use the following examples to help you use signal grouping, by type.

Per monitor

This example query generates a single notification that includes all alerting time series.

  1. Enter the following query in the Query field, and select 15s as the check interval:

    count by (namespace, job, instance) ({instance!="", namespace~=""})
  2. In the Signal Grouping section, select Per monitor (one alert).

  3. In the Conditions section, select Critical, choose is > as the operator, and enter 5s in the Sustain field.

When the critical condition matches any time series, a single notification sends that includes all alerting time series.

Per signal group

In this example, you configure a query to track outages and use signal groups to track multiple time series.

Four teams (frontend, backend, database, and search) are working on different components of a project. Each component has a set of services and resources its team monitors for performance and availability.

Chronosphere ingests these metrics:

resource_status{component="frontend", resource_type="availability", service_name="web_app"} 0
resource_status{component="frontend", resource_type="performance", service_name="web_app"} 0
resource_status{component="backend", resource_type="availability", service_name="api"} 0
resource_status{component="database", resource_type="availability", service_name="db_cluster_1"} 0
resource_status{component="database", resource_type="availability", service_name="db_cluster_2"} 0
resource_status{component="database", resource_type="availability", service_name="db_cluster_3"} 0
resource_status{component="search", resource_type="performance", service_name="search_engine"} 92
resource_status{component="search", resource_type="availability", service_name="search_engine"} 100

Each monitored service sends a metric to Chronosphere called resource_status. This metric has the following labels:

  • component: The component and team name.
  • resource_type: Either availability or performance for a tracked metric resource.
  • service_name: The name of the service.

To define the signal groups for this data:

  1. Enter the following query in the Query field to alert your teams when a resource isn't working as expected, and select 15s as the check interval:

    resource_status == 0

    This query looks for a status of 0, which indicates a service failure.

  2. In the Signal grouping field, select Per signal group (multiple alerts) and enter the following labels in the Label key field to send the component and resource_type information when an alert triggers:

    component,resource_type
  3. In the Conditions section, select Critical, choose is > as the operator, and enter 5s in the Sustain field.

When a matching alert triggers:

  • The frontend team receives a notification with these metrics:

    resource_status{component="frontend", resource_type="availability", service_name="web_app"} 0
    resource_status{component="frontend", resource_type="performance", service_name="web_app"} 0
  • The backend team receives a notification with this metric:

    resource_status{component="backend", resource_type="availability", service_name="api"} 0
  • The database team receives a notification with these metrics:

    resource_status{component="database", resource_type="availability", service_name="db_cluster_1"} 0
    resource_status{component="database", resource_type="availability", service_name="db_cluster_2"} 0
    resource_status{component="database", resource_type="availability", service_name="db_cluster_3"} 0

The search team doesn't receive any alerts because their services are functioning.

Per time series

This example uses the same query and condition as in the per signal group example. The difference is that in the Signal grouping field, select Per time series (many alerts).

  1. Enter the following query in the Query field, and select 15s as the check interval:

    count by (namespace, job, instance) ({instance!="", namespace~=""})
  2. In the Signal grouping field, select Per time series (many alerts) and enter the following labels in the Label key field to send the component and resource_type information when an alert triggers:

    component,resource_type
  3. In the Conditions section, select Critical, choose is > as the operator, and enter 5s in the Sustain field.

When the critical condition matches, a notification for each time series in the monitor query triggers using your notification policy.