Backpressure
When a telemetry pipeline ingests data, it’s possible for the volume of incoming data to exceed that pipeline’s throughput. This creates a condition known as backpressure, where data accumulates faster than the pipeline can process and route that data to its intended destination.
Although some amount of backpressure is normal and expected, unmanaged or excessive backpressure can result in data loss, or even cause your pipeline or the cluster it’s running in to crash.
Overview
Pipelines can buffer data by temporarily storing it somewhere until that pipeline is ready to process and route the data to its intended destination. Depending on your pipeline’s workload type, it can buffer this data by storing it in memory (for Deployment and DaemonSet pipelines) or in a file system (for StatefulSet pipelines).
If the volume of incoming data exceeds a pipeline’s throughput, the amount of data in temporary storage will increase accordingly. This is what creates backpressure, but pipelines are designed to accommodate a certain amount of backpressure without issue. To draw a comparison with pipes that carry water, temporary storage is like the basin of a sink: if water flows faster than it drains, the basin will fill up and store the extra water until it reaches the drain.
However, if a pipeline continues buffering new data to temporary storage faster than it can remove old data, that storage will eventually reach capacity. This is the point at which backpressure becomes an urgent problem: the buffered data is like a sink that’s about to overflow. If your pipeline stops buffering new data to temporary storage, this interruption might cause data loss. Conversely, if your pipeline continues to buffer new data, it will either:
- Run out of memory and crash, which can also result in data loss.
- Attempt to increase its storage capacity, which can add unexpected strain on resources and cause your entire cluster to crash.
Push-based sources versus pull-based sources
When a pipeline stops buffering new data, the potential for data loss partly depends on whether its associated source plugins are push-based or pull-based.
For push-based source plugins, which passively receive data, Chronosphere has no control over the behavior of that data source. Some sources might pause the flow of data if they detect an interruption, but other sources might continue attempting to send data to the unavailable pipeline, which can cause data loss.
For pull-based source plugins, which actively fetch data, Chronosphere can control communication between that source and Telemetry Pipeline. If a pipeline is unavailable, Telemetry Pipeline will pause the flow of data, then resume fetching data when the pipeline is ready to ingest it again. This behavior can add temporary latency if the source buffers a large amount of data during the pipeline’s downtime, but avoids major data loss.
Manage backpressure
The available methods for managing backpressure vary depending on your pipeline’s workload type. Because of this, choosing the right workload type is also a key part of preventing and responding to different problems created by backpressure.
Some factors that create or contribute to backpressure are independent of Telemetry Pipeline, like the amount of data that your sources emit and your destinations’ capacity to receive data. To manage these factors, you must configure your sources and destinations directly.
Deployment pipelines
Use the following methods to manage backpressure for pipelines with the Deployment workload type:
- Set the
mem_buf_limit
configuration parameter to enforce a limit for how much data a source plugin can buffer to memory. When this limit is reached, the pipeline will stop buffering new data from that source plugin. - Configure resource profiles to set thresholds for a pipeline’s resource usage.
- Scale pipelines to increase their throughput. However, keep in mind that increasing a pipeline’s replicas increases its resource usage, and that scaling won’t alleviate destination-level bottlenecks.
StatefulSet pipelines
Use the following methods to manage backpressure for pipelines with the StatefulSet workload type:
- Configure resource profiles to set
thresholds for a pipeline’s resource usage.
- Set the
resources.storage.maxChunksUp
resource profile parameter to increase or decrease the amount of storage available for buffered data.
- Set the
- Scale pipelines to increase their throughput. However, keep in mind that increasing a pipeline’s replicas increases its resource usage, and that scaling won’t alleviate bottlenecks caused by destinations.
DaemonSet pipelines
Use the following methods to manage backpressure for pipelines with the DaemonSet workload type:
- Set the
mem_buf_limit
configuration parameter to enforce a limit for how much data a source plugin can buffer to memory. When this limit is reached, the pipeline will stop buffering new data from that source plugin. - Configure resource profiles to set thresholds for a pipeline’s resource usage.
DaemonSet pipelines use a static number of replicas, which means they can’t be scaled.