View available monitors
Select from the following methods to view and filter monitors. To query and get detailed information about monitors, see monitor actions. To understand patterns across alerts, analyze related alerts.- Web
- Chronoctl
- Terraform
- API
To display a list of defined monitors, in the navigation menu select
Alerts > Monitors.The list of monitors displays the status for each monitor next to its title:
Use the following methods to filter your monitors:
| Icon | Description |
|---|---|
| Currently alerting monitor that exceeds the defined critical conditions. | |
| Currently alerting monitor that exceeds the defined warning conditions. | |
| Monitor that’s currently muted by an active muting rule. | |
| Passing monitor that’s not generating alerts. |
- Using the Search monitors search box (an OR filter).
- By team, using the Select a team dropdown.
- By owner, using the Select an owner dropdown. The icon indicates the monitor is part of a collection. The icon indicates this monitor is part of a service.
- By notification policy, using the Select a notification policy dropdown.
- By error status:
- All: Default, displays all monitors.
- Alerting: Monitors currently in alert status.
- Critical: Monitors in a critical alert status.
- Muted: Displays only muted monitors.
- To filter the table to display only your favorite monitors, enable the View only my favorites toggle.
- Include connected monitors: If you filter monitors by owner with the toggle disabled, only monitors owned by that owner are returned. When the toggle is enabled, your filter includes monitors that are connected to that owner, even if they aren’t owned by that owner. Connections are based on collections.
- Click the search bar to focus on it, or use the keyboard shortcut
Control+K(Command+Kon macOS). - Begin typing any part of the monitor’s name.
- Optional: Click the filters for all other listed resource types at the top of the search results to remove them and display only monitor.
- Click the search result you’re interested in, or use the arrow keys to select it and press enter, to go to that monitor.
Series legend
The series legend displays labels for all metrics displayed in the graph as either a list or table view. Both views display the label keys and values for each series and include the current alert status. The available values passing, warning, and critical. You can filter the list of the resulting time series with the Search Series search box. Click an item in the list to isolate the related line on the graph. To clear the selection, click the item again. You canControl+click (Command+click
on macOS) to choose multiple items.
Annotations
The annotations defined for the monitor, such as runbooks, link to related dashboards, data links to related traces, and documentation links. See Annotations for more information.Alert history
The Alert History tab next to Monitor Info displays a history of alerts generated by the monitor. To change the order of the history, click the Timestamp (UTC) column header, and then click the chevron to toggle display of the JSON payload for an alert. The page also lets you filter the history by event type, or toggle the scope of the Alert history between the currently selected signal or all signals.There can be up to a five-minute delay between the time an alert for a monitor
resolves (Alert resolved) and the time when a notification is sent indicating the
alert resolved (Notification sent (alert resolved)).
Alert event payload
The Alert History tab displays the values captured when the alert fired. The payload fields are primarily defined in the monitor data model.monitorSlug: The monitor’s unique slug.eventType: The type of event triggered. Valid alert types are:- Alert triggered
- Alert resolved
- Notification sent (alert triggered)
- Notification failed (alert triggered)
- Notification sent (alert resolved)
- Notification failed (alert resolved)
- Alert muted
- Alert unmuted
createdAt: The time the alert fired.signal: See signals. Contains one or morenameandvalueentries.details: Containsseverityas defined by the notification policy andnotifier.notifiervalues contain the notifiernameandslug.
Create a monitor
Most monitors alert when a value matches a specific condition, such as when an error condition defined by the query lasts longer than one minute. You can also choose to alert when a value doesn’t exist, such as when a host stops sending metrics and is likely unavailable. This condition triggers only if the entire monitor query returns no results. For example, to alert on missing or no data, add aNOT_EXISTS series condition in the series_conditions section of the
monitor definition:
Prerequisites
Before creating a monitor, complete the following tasks:- Create a notifier to define where to deliver alerts and who to notify.
- Create a notification policy to determine how to route notifications to notifiers based on signals that trigger from your monitor. You select the notifier you created for the critical or warning conditions on the notification policy.
Create monitors
After completing the prerequisite tasks, use any of the following methods to create a new monitor. When creating or editing a monitor in Observability Platform, you can simulate and test alerts to see how an alert would have performed against historical data. Use backtesting to review how your alert would have performed if it had been defined in the past. Chronosphere recommends a query interval minimum of at least 15 seconds. There can be a ten second delay between an alert trigger and the notifier activation. You can create a monitor using one of the following procedures, or you can duplicate an existing monitor.- Web
- Chronoctl
- Terraform
- API
To add a new monitor:
-
In the navigation menu select one of these locations:
- Alerts > Monitors.
- Platform > Collections, and then select the collection you want to create a monitor for. This can be a standard collection or a service.
-
Create the monitor:
- From the Monitors page, click Create monitor. You can also choose Duplicate monitor to copy an existing monitor.
- From the Collections page, in the Monitors panel, click + Add.
- Enter the information for the monitor based on its data model.
- Select an Owner to organize and filter your monitor. You can select a collection or a service.
-
Enter a Monitor Name, which you can change after creating the monitor. Monitor
names are static strings and don’t accept label variables, such as
$labels.LABEL_NAME. - Choose a Notification Policy to determine which notification policy to use at a particular alert severity.
- Enter Labels as key/value pairs to categorize and filter monitors.
-
In the Query section, choose the type of query you want to enter:
- Prometheus: Enter a valid Prometheus query. Click Edit in Query Builder to open your query in the Query Builder, where you can construct, optimize, and debug your query before saving it. After modifying your query, click Done to return to the Add Monitor page.
- Graphite: Enter a valid Graphite query.
-
Logs: Enter a valid log query, which must include the
make-seriesoperator with a specifiedstepsize to return data. This operator uses thecount()function by default, but you can specify different operators instead. For example, the following query creates a time chart that includes the average forlatencyInSeconds. Thestepparameter defines the time step for each bucket in Prometheus time duration format:If the log query includes a field that contains a period in its name and you want to use signals to group notifications, use an alias for that field name. Otherwise, periods are converted to underscores in the generated visualization.
-
Use these options to validate and update your query:
-
Click Check Query to validate your query and preview query results. In the
query preview, use the following options to understand your query:
- Toggle Show thresholds to display the monitor’s defined thresholds.
- Select a time range up to the present in the time range selector. If your selected time period has too many alerts, or the entire graph appears to display in alerted status, reduce the selected time period. If multiple alerts would have fired simultaneously, only one threshold marker displays. The banner shows the correct number of alerts. For example, if a critical and a warning would fire at the same time, only one alert displays on the graph. The banner shows two alerts would have fired.
- Click Open in Explorer to open your query in Metrics Explorer, where you can review your query for syntax errors and make necessary changes.
-
Click Check Query to validate your query and preview query results. In the
query preview, use the following options to understand your query:
-
For Prometheus queries, test monitor conditions by reviewing when a monitor would
have triggered, based on historical data. The preview reflects existing monitor
schedules, signal grouping, and overrides:
- Use the Show alert durations toggle to display the time period over which the alert would have been active.
-
Toggle Simulate alerts to backtest your condition against existing data. You
must define at least one condition for alert simulations to work.
If your selected query returns too much data, the graph displays an error. Chronosphere recommends selecting shorter time periods for testing, when possible. Alert simulation isn’t available outside the raw data retention period.Alert simulations use existing data, and can’t predict future alerts.
-
Optional: Group alerts based on the results returned from the query by choosing an
option in the Signals section.
Signals use a unique set of labels to
create groups of notifications when a monitor alert triggers or resolves.
If you select per signal (multiple alerts) to generate multiple alerts, enter a label key that differs in name and casing from the label you enter in the Key field in the Labels section. For example, if you enter
environmentin the Key field, you might useEnvironmentsas the Label Key to match on. Pinned scopes can be used as a Label Key. -
Define a condition and sustain period (duration of time) in the Conditions
section, and assign the resulting alert a severity (warning or critical). In
the Sustain field, enter a value followed by an abbreviated unit such as
60s. Valid units ares(seconds),m(minutes),h(hours), ord(days). The dialog also displays the notifiers associated with the monitor for reference.To alert on missing or no data, select not exists in the Alert when value dropdown. -
In the Resolve field, enter a time period for the resolve window as a
value followed by an abbreviated unit such as
30s. Valid units ares(seconds),m(minutes),h(hours), ord(days). - Add notes for the monitor in the Annotations section, such as runbooks, links to related dashboards, data links to related traces, and documentation links.
- Click Save.
Chronoctl examples
Use one of the following examples to understand the monitor structure for a Chronoctl definition.- Chronoctl (Prometheus)
- Chronoctl (logs)
The following YAML definition consists of one monitor named
Disk Getting Full. The
series_conditions trigger a warning notification when the disk is 80% full for more
than 300 seconds, and a critical notification when 90% full for more than 300
seconds. It groups series into signals based on the source and
service_environment label keys.The schedule section indicates that this monitor runs each week on Mondays from
7:00 to 10:10 and 15:00 to 22:30, and Thursdays from 21:15 through the end of the
day. All times are in UTC.If you define
label_names in the signal_grouping section, enter a label name that
differs in name and casing from the label you enter in the labels section. For
example, if you enter environment as a key in the labels section, you might use
Environments in the label_names section.Terraform examples
Use one of the following examples to understand the monitor structure for a Terraform resource.- Terraform (Prometheus)
- Terraform (logs)
The following Terraform resource creates a monitor that Terraform refers to by
infra, and with a human-readable name of Infra Example monitor.The schedule section runs this monitor each week on Mondays from 7:00 to 10:10 and
15:00 to 22:30, and Thursdays from 21:15 through the end of the day. All times are
UTC, and Observability Platform won’t run this monitor during the rest of the week.If you define
label_names in the signal_grouping section, enter a label name
that differs in name and casing from the label you enter in the labels section.
For example, if you enter environment as a key in the labels section, you
might use Environments in the label_names section.Edit a monitor
Select from the following methods to edit monitors.Users can modify Terraform-managed resources only by using Terraform.
Learn more.
- Web
- Chronoctl
- Terraform
- API
To edit a monitor:
- In the navigation menu select Alerts > Monitors.
- Click the name of the monitor you want to edit.
- In the action menu, click the three vertical dots icon and select Edit monitor. This opens a sidebar where you can edit the monitor’s properties.
- Make your edits, and then click Save. Refer to the monitor data model for specific definitions.
Use the Code Config tool
When adding or editing a monitor, click the Code Config tab to view code representations of a monitor for Terraform, Chronoctl, and the Chronosphere API. The displayed code also responds to changes you make in the Visual Editor tab. For details, see Use the Code Config tool.Override a monitor alert
You can override the default conditions that define when an alert triggers for a monitor. This override is similar to overriding a notification policy that routes a notification to a notifier other than the specified default. On a monitor, you can specify a condition override to use a separate threshold for certain series. For example, a monitor might have a default threshold of>100 but
you specify an override threshold of >50 where the label key/value pair is
cluster=production.
You can specify any label as a matcher for a monitor condition override. If no
override matches the defined conditions, Observability Platform applies the default
conditions. Additionally:
- Overrides must specify at least one matcher, and meet every matcher condition to apply the override.
- Observability Platform evaluates overrides in the listed order. When an override matches, the remaining overrides and defaults are ignored.
- Overrides don’t inherit any properties from the default conditions. For example, if
the default policy route specifies
warnandcriticalnotifiers but the override specifies onlycriticalnotifiers, the notifier doesn’t sendwarnnotifications.
Users can modify Terraform-managed resources only by using Terraform.
Learn more.
- Web
- Terraform
- In the navigation menu select Alerts > Monitors.
- Click the name of the monitor you want to specify an override for.
- In the action menu, click the three vertical dots icon and select Edit monitor. This opens a sidebar where you can edit the monitor’s properties.
- In the Condition Override section, click the plus icon to display the override fields.
- Select Exact or Regex as the matcher type, and enter the key/value pair to match on for the override.
- Select Critical or Warn as the override severity.
- Define the match condition, and enter a value and sustain duration.
- Click Save to apply the override changes.
Delete a monitor
Select from the following methods to delete monitors.Users can modify Terraform-managed resources only by using Terraform.
Learn more.
- Web
- Chronoctl
- Terraform
- API
To delete a monitor:
- In the navigation menu select Alerts > Monitors.
- Click the name of the monitor you want to delete.
- In the action menu, click the three vertical dots icon and select Edit monitor.
- In the Edit Monitor dialog, click the three vertical dots icon and select Delete.
Use annotations with monitors
Create annotations for monitors that link to dashboards, runbooks, related documents, and trace metrics, which lets you provide direct links for your on-call engineers to help diagnose issues. You can reference Prometheus Alertmanager variables in annotations with the{{.VARIABLE_NAME }} syntax. Annotations can access monitor labels by using
variables with the {{ .CommonLabels.LABEL }} pattern, and from the alerting metric
with the {{ .Labels.LABEL }} pattern. In both patterns, replace LABEL with
the label’s name.
To reference labels in Alertmanager variables, you must include those labels in the
alerting time series. Otherwise, the resulting notifier won’t display any information
for the variables you specify.
- Web
- Chronoctl
- Terraform
To add annotations to a monitor:
- Create a monitor.
-
In the Annotations section, add a description for your annotation in the
Key field, and text or links in the Value field. For example, you might
add the following key/value pairs as annotations:
Key Value summary Instance {{ $labels.instance }}is downdescription Container {{ $labels.namespace }}/{{ $labels.pod }}/{{ $labels.container }}terminated with{{ $labels.reason }}.runbook http://default-runbook