Collector service discovery for Prometheus metrics

The Collector supports several mechanisms to discover which metrics applications to scrape Prometheus metrics from:

Using ServiceMonitors or Kubernetes annotations (or a combination of both) are recommended for most deployments.

Use push-based collection mechanisms for use cases where jobs can't be scraped automatically, such as AWS Lambda, Google Cloud Functions, or ephemeral batch jobs.

ServiceMonitors

ServiceMonitors are a custom resource definition (CRD) you can use to define scrape configurations and options in a separate Kubernetes resource.

Discovery is scoped to the targets on the local node by default, which requires you to deploy the Collector as a DaemonSet for this method of service discovery.

Prerequisites

Run the following command to install the ServiceMonitor CRD (opens in a new tab) from the full Prometheus Operator, using the file in the kube-prometheus-stack Helm chart:

kubectl apply -f https://raw.githubusercontent.com/prometheus-community/helm-charts/e46dc6360b6733299452c8fd65d304004484de79/charts/kube-prometheus-stack/crds/crd-servicemonitors.yaml

Chronosphere supports only fields in version 0.44.1 of the Prometheus Operator.

Enable ServiceMonitor discovery

To enable ServiceMonitor discovery in the Collector, make the following configuration changes:

  1. Add the following options after the ClusterRole resource in the manifest under rules:

    kind: ClusterRole
    rules:
      - apiGroups:
          - monitoring.coreos.com
        resources:
          - servicemonitors
        verbs:
          - get
          - list
          - watch
      - apiGroups:
          - discovery.k8s.io
        resources:
          - endpointslices
        verbs:
          - get
          - list
          - watch
  2. Enable the ServiceMonitors feature of the Collector by setting the following keys to true in the manifest ConfigMap under the discovery > kubernetes key:

    discovery:
      kubernetes:
        enabled: true
        serviceMonitorsEnabled: true
        endpointsDiscoveryEnabled: true
        useEndpointSlices: true
        podMatchingStrategy: VALUE
    • serviceMonitorsEnabled: Indicates whether to use ServiceMonitors to generate job configurations.

    • endpointsDiscoveryEnabled: Determines whether to discover Endpoints. Requires serviceMonitorsEnabled to be set to true.

    • useEndpointSlices: Use EndpointSlices instead of Endpoints. Requires serviceMonitorsEnabled and endpointsDiscoveryEnabled to be set to true. EndpointSlices use less resources than Endpoints.

      EndpointSlices are available with Collector v0.85.0 or later and Kubernetes 1.21 or later.

    • podMatchingStrategy: Determines how to use ServiceMonitors and annotations when discovering targets. Accepts the following settings for VALUE:

      • all: Allows any and all scrape jobs to be registered for a single pod.
      • annotations_first: Matches annotations first. If no matches return, then other matching can occur.
      • service_monitors_first: Matches ServiceMonitors first. If no matches return, then other matching can occur.
      • service_monitors_only: Matches ServiceMonitors only.

Pod-based ServiceMonitor discovery

If you use a version of Kubernetes that doesn't support endpoint slices, you can set endpointsDiscoveryEnabled to false to run the Collector in a mode that doesn't discover Kubernetes endpoint slices or service resources.

In this mode, the Collector can still discover scrape targets using ServiceMonitors under specific circumstances depending on the Kubernetes resource configuration. The Collector uses the Pod's labels as the Service's labels. If the Pod's labels match the Service's labels a ServiceMonitor that uses targetPort (container port) to indicate the port to scrapes.

⚠️

Because this discovery method can be very resource intensive, do not use this method for most deployments. Instead, contact Chronosphere Support for more information about pod-based ServiceMonitor discovery.

Run as a DaemonSet with ServiceMonitors

If you want to run the Collector as a DaemonSet and scrape kube-state-metrics through a Collector running as a Deployment, you need to update the manifest for both Collector instances.

In your DaemonSet, add the serviceMonitor > serviceMonitorSelector key to your manifest and define the following matchExpressions to ensure that your DaemonSet only matches on ServiceMonitors that don't contain kube-state-metrics:

serviceMonitor:
  serviceMonitorSelector:
    matchAll: false
    matchExpressions:
      - label: app.kubernetes.io/name
        operator: NotIn
        values:
          - kube-state-metrics

In your Deployment, add the same key and definitions to your manifest, but set the operator value of the matchExpressions attribute to In. This setting ensures that your Deployment only matches on ServiceMonitors that contain kube-state-metrics:

serviceMonitor:
  serviceMonitorSelector:
    matchAll: false
    matchExpressions:
      - label: app.kubernetes.io/name
        operator: In
        values:
          - kube-state-metrics

Match specific ServiceMonitors

By default, the Collector ingests metrics from all ServiceMonitor sources. To match specific instances, use a series of AND match rules under the serviceMonitor > serviceMonitorSelector key and set the matchAll under the serviceMonitorSelector key to false.

serviceMonitorSelector:
  matchAll: false

The available match rules are:

  • matchLabelsRegexp: Labels and a regular expression to match a value. For example:

    matchLabelsRegexp:
      labelone: '[a-z]+'
  • matchLabels: Labels and a matching value. For example:

    matchLabels:
      labelone: foo
  • matchExpressions: Depending on the operator set, labels that exist or don't exist, or have or don't have specific values. For example:

    • To match ServiceMonitors that have the examplelabel with values a or b use the In operator:

      matchExpressions:
        - label: examplelabel
          operator: In
          values:
            - a
            - b
    • To match ServiceMonitors that have the examplelabel without values a or b, use the NotIn operator. The NotIn operator also matches any ServiceMonitors without the examplelabel present:

      matchExpressions:
        - label: examplelabel
          operator: NotIn
          values:
            - a
            - b
    • To match ServiceMonitors that have the examplelabel with any value, use the Exists operator:

      matchExpressions:
        - label: examplelabel
          operator: Exists
    • To match ServiceMonitors that don't have the examplelabel, use the DoesNotExist operator:

      matchExpressions:
        - label: examplelabel
          operator: DoesNotExist

Match endpoints without pods using ServiceMonitors

The default Collector configuration isn't suitable if you want to discover endpoints but lack access to Pod information. For example, if you want to:

  • Monitor the Kubernetes API server, which doesn't run on the same node as Kubernetes workloads.

  • Monitor endpoints that can be running anywhere in the cluster, but without using a Collector running as a DaemonSet.

  • Discover and scrape kube-state-metrics, which listen to the Kubernetes API server and generate metrics about deployments, nodes, and pods.

    If you're monitoring endpoints but don't have access to Pod information, the ServiceMonitor can't use the TargetPort attribute to target the endpoint and must instead use the Port attribute.

In these cases, run the Collector as a Kubernetes Deployment with a single instance, and set the allowSkipPodInfo attribute to true.

serviceMonitor:
  allowSkipPodInfo: true
⚠️

Use this attribute with caution. Setting allowSkipPodInfo to true on a DaemonSet can cause every Collector in the DaemonSet to attempt to scrape every endpoint in the cluster, or cause duplicate scrapes.

Kubernetes annotations

Discovery is scoped to the targets on the local node by default, which requires you to deploy the Collector as a DaemonSet for this method of service discovery.

For the Collector to start scraping the Pods in a Kubernetes cluster, set the following annotations on each Pod in the cluster:

spec:
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/port: '{port_number}'

The following manifest is an example of using these two annotations for a basic Node Exporter (opens in a new tab) deployment. Based on these annotations, by default, the Collector starts scraping the /metrics endpoint on port 9100.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app.kubernetes.io/name: node-exporter
    app.kubernetes.io/version: v1.0.1
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: node-exporter
  template:
    metadata:
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9100"
      labels:
        app.kubernetes.io/name: node-exporter
        app.kubernetes.io/version: v1.0.1
    spec:
      containers:
      - image: quay.io/prometheus/node-exporter:v1.0.1
        name: node-exporter
        ...

You can set additional annotations to control other scrape options. For a complete list of supported annotations, read the scrape configuration documentation.

You can change the annotation prefix, which defaults to prometheus.io/, from the kubernetes > processor section of the Collector ConfigMap.

After any changes, send the updated manifest to the cluster with the following command:

kubectl apply -f path/to/manifest.yml

If you modify a Collector manifest, you must update it in the cluster and restart the Collector.

Prometheus service discovery

If using Prometheus service discovery within Kubernetes, deploy a single Collector as a Kubernetes Deployment per cluster. This is to avoid every Collector instance duplicating scrapes to all endpoints defined in the Prometheus service discovery configuration.

To enable Prometheus service discovery, set discovery.prometheus.enabled to true in the Collector config. Provide the list of scrape configs in the discovery.prometheus.scrape_configs section. The following example uses the kubernetes_sd_config (opens in a new tab).

discovery:
  prometheus:
    enabled: true
    scrape_configs:
      - job_name: kubernetes-pods
        honor_timestamps: true
        scrape_interval: 30s
        scrape_timeout: 30s
        metrics_path: /metrics
        scheme: http
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels:
              [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            separator: ;
            regex: 'true'
            replacement: $1
            action: keep
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            separator: ;
            regex: (.+)
            target_label: __metrics_path__
            replacement: $1
            action: replace
          - source_labels:
              [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            separator: ;
            regex: ([^:]+)(?::\d+)?;(\d+)
            target_label: __address__
            replacement: $1:$2
            action: replace
          - separator: ;
            regex: __meta_kubernetes_pod_label_(.+)
            replacement: $1
            action: labelmap
          - source_labels: [__meta_kubernetes_namespace]
            separator: ;
            regex: (.*)
            target_label: kubernetes_namespace
            replacement: $1
            action: replace
          - source_labels: [__meta_kubernetes_pod_name]
            separator: ;
            regex: (.*)
            target_label: kubernetes_pod_name
            replacement: $1
            action: replace

Find more details in the Prometheus scrape configuration documentation (opens in a new tab). For a complete list of examples, see the examples section in the Prometheus GitHub repository (opens in a new tab).

Set the Collector to scrape its own metrics

For the Collector to scrape its own metrics, add another job to the discovery.prometheus.scrape_configs key:


- job_name: 'collector'
  scrape_interval: 30s
  scrape_timeout: 30s
  static_configs:
    - targets: ['0.0.0.0:3030']