Convert Datadog monitors

Datadog monitors (opens in a new tab) actively check metrics of the infrastructure and manage alerts on alert platforms. Chronosphere uses monitors and alerts for the same purposes.

Datadog creates monitoring and notification in one longer file, while Chronosphere separates monitors and notifications into smaller logical configuration files. These smaller files enable users to target and update specific changes without risking the entire configuration.

Compare configurations

These are examples of matching configurations for Datadog and Chronosphere.

This is an example of a Datadog monitor definition.

"id": 1234567,
    "org_id": 12345,
    "type": "metric alert",
    "name": "IOWAIT is high ({{value}})",
    "message": "{{#is_alert}}\Load is too high, check and lower load immediately (use AWS console for {{pod.name}} to scale tasks to 1 and investigate)\n@slack-ops-bots \n{{/is_alert}} \n\n{{#is_alert_recovery}}\n@slack-ops-bots \n@pagerduty-resolve \n{{/is_alert_recovery}}{{#is_warning}}\nLoad is reaching the limit.\n@slack-ops-warning-bots\n@pagerduty{{/is_warning}}",
    "tags": [
        "high-load",
        "team:platform"
    ],
    "query": "min(last_30m):max:system.cpu.iowait{function:cassandraevents} by {pod,name} > 20",
    "options": {
        "notify_audit": false,
        "locked": false,
        "timeout_h": 0,
        "include_tags": true,
        "no_data_timeframe": 30,
        "require_full_window": true,
        "notify_by": ["pod"],
        "notify_no_data": true,
        "new_group_delay": 60,
        "renotify_interval": 30,
        "renotify_occurrences": 1,
        "renotify_statuses": [
            "alert",
            "no data",
        ],
        "scheduling_options": {
            "evaluation_window": {
                "hour_starts": 30
            }
        },
        "thresholds": {
            "critical": 20,
            "critical_recovery": 10,
            "warning": 15
        },
        "timeout_h": 12,
        "escalation_message": "{{#is_alert}}\nEscalated to pagerduty - \nLoad is too high, check and lower load immediately (use AWS console for {{pod.name}} to scale tasks to 1 and investigate)\n@slack-ops-bots \n@pagerduty \       n{{/is_alert}}",
        "evaluation_delay": 300,
        "min_failure_duration": 120,
        "silenced": {}
    },
    "multi": true,
    "created_at": 1479858941000,
    "created": "2016-11-22T15:55:41.80188-08:00",
    "modified": "2021-10-14T09:23:36.750186-07:00",
    "deleted": null,
    "restricted_roles": null,
    "priority": 1,
    "overall_state_modified": "2022-07-05T06:13:14-07:00",
    "overall_state": "OK",
    "creator": {
        "name": "Jane Smith",
        "handle": "janesmith@example.com",
        "email": "janesmith@example.com",
        "id": 18219
    },
    "matching_downtimes": []

Field mapping

Chronosphere and Datadog fields have many equivalent functions. Use the following tables to map fields between these apps.

Names of Chronosphere equivalents are subject to change as the conversion process improves.

Configuration mapping

This table matches Datadog fields to their Chronosphere equivalents for monitor specification.

Datadog field	Chronosphere equivalent
`created`	N/A
`creator`	N/A
`id`	Add to `Monitor.labels`.
`message`	Add to `Monitor.annotations` and create `Notify` routes - See details.
`modified`	N/A
`multi`	`Monitor.spec.signal_grouping.signal_per_series`
`name`	`Monitor.name` - This can also contain variables.
`options`	Monitor options
`threshold_windows`	N/A - Used only for `anomalies`.
`thresholds`	`Monitor.spec.series_conditions.severity_conditions .conditions`
`timeout_h`	N/A
`overall_state`	For monitors with an `Ignored / Skipped / Unknown` state, still create the monitor but have it either go to a black hole route or create it as muted.
`priority`	Can support as a message annotation.
`query`	`Monitor.spec.query.expr`
`restricted_roles`	N/A
`state`	N/A
`matching_downtimes`	Equivalent to schedules.
`tags`	An arbitrary list of strings that fits the tag format (which can be single word tags). Chronosphere can support this using `Monitor.labels`, if the field requires a key/value format. Tags are used as label names with the value set to `true.`
`type`	The type of monitor. Chronosphere supports query alert and metric alerts.

Monitor options

Use these values in the specification's options field.

Datadog field	How to map
`aggregation`	N/A - For log alerts only.
`enable_logs_sample`	N/A - For log alerts only.
`enable_samples`	N/A - Per Datadog docs (opens in a new tab). This is used only by CI Test and Pipeline monitors.
`escalation_message`	No separate message for renotify notifications; can append this to the generic alert message.
`evaluation_delay`	Can support by using offset in the query.
`group_retention_duration`	N/A - Not for metrics monitors.
`groupby_simple_monitor`	N/A - For log alerts only.
`include_tags`	Use Prometheus `{{ $value }}` template.
`min_failure_duration`	`Monitor.spec.series_conditions.severity_conditions.conditions.sustain`
`min_location_failed`	Can support by adding thresholds to the PromQL expression.
`new_group_delay`	N/A
`new_host_delay`	N/A - Deprecated, use `new_group_delay` instead.
`no_data_timeframe`	Threshold for a `no data` alert. See severity section for details.
`notification_preset_name`	N/A - Datadog docs (opens in a new tab).
`notify_audit`	N/A
`notify_by`	Equivalent to `Monitor.spec.signal_grouping`, except the inverse. Note: This can be set to `*`, which is the same as setting `Monitor.spec.signal_grouping.signal_per_series`.
`notify_no_data`	Add a `NOT_EXISTS` series condition in the MonitorSpec. Review severity for details.
`on_missing_data`	N/A - Not for metrics alerts.
`renotify_interval`	`NotificationPolicy.routes.overrides.notifiers.repeat_interval`
`renotify_occurences`	N/A
`renotify_statuses`	Only renotify on status X. Create overrides using `NotificationPolicy.routes.overrides.notifiers.repeat_interval` for each severity listed here.
`require_full_window`	Only evaluate if there's a full window of data. Datadog recommends setting this to `false`. Supportable using the `count_over_time` function.
`scheduling_evaluation_window`	Cumulative time windows (opens in a new tab). For example, "evaluate this alert every hour on the :00 mark".
`silenced`	Dictionary of muted tags to end timestamp (opens in a new tab). Create MutingRule objects for each tag.
`thresholds`	Thresholds for severity. Can map to `MonitorSpec.series_conditions.severity_conditions` for warning and critical. No support for separate thresholds for recovery.
`variables`	N/A

Severity

Chronosphere supports both critical and warning severities by implementing different thresholds for the metric values. In addition to this, Datadog also supports alerting on no data for a particular metric as a distinct severity. While this state isn't a true severity, the state is treated the same as critical and warning alerts for configuration.

Chronosphere supports alerting on no data conditions using a series condition in the MonitorSpec:

api_version: v1/config
kind: Monitor
spec:
  spec:
    prometheus_query: <promql query>
    series_conditions:
      defaults:
        critical:
          conditions:
            - op: NOT_EXISTS
              sustain: 60s

Message and route

Datadog allows different messages and routing endpoints for the different severity levels (critical, warning, no data). Chronosphere can support different messages by using separate annotations:

 
api_version: v1/config
kind: Monitor
spec:
  annotations:
    message_critical: This is the critical threshold message
    message_warning: This is the warning threshold message
    message_no_data: This is the message for no data

To support different routes, users must use a separate monitor with different labels, set using notification policies.

Notification policy resources

Link a Monitor resource to a Notification resource by defining a notification policy. Each unique route in the Datadog message field maps to a Notification resource. The Monitor contains a label specifying the notification route it links to, and the default NotificationPolicy defines overrides that point to each Notification resource.

For example:

api_version: v1/config
kind: Monitor
spec:
  labels:
    datadog_id: 1234567
    route_slack_ops_bots_critical: true
    route_slack_ops_bots_warning: true
    route_pagerduty_critical: true
 
---
api_version: v1/config
kind: NotificationPolicy
spec:
    routes:
      overrides:
        - alert_label_matchers:
            name: route_slack_ops_bots_critical
            type: EXACT_MATCHER_TYPE
            value: critical
          notifiers:
            critical:
              notifier_slugs:
                - slack-ops-bots
                name: slack-ops-bots

Evaluation frequency

Datadog doesn't support the use of different evaluation frequencies per monitor, but instead relies on a hard-coded interval dependant on the evaluation window (opens in a new tab). For windows of less than 24h, the window defaults to 1m. Set this to a desired value with the MonitorSpec.interval field, or default to 15s to receive faster alerts.

Migration considerations Recreate widgets