Long-term downsampling
After approximately five days, each persisted metric, including both raw and aggregated metrics, undergoes a process called long-term downsampling (LTD). This process temporally downsamples data into a more compact form, and then deletes the non-downsampled data permanently.
To maintain an accurate representation of the data, Chronosphere Observability Platform utilizes different downsampling methodologies, depending on the metric type and, if it's an aggregated metric, the method of aggregation used to produce its value.
These behaviors are important to note beforehand, since any unexpected results of LTD will be noticed only approximately five days after ingestion.
By default, Observability Platform downsamples long term data at a five-minute granularity, where all data points within each five-minute window compress into a single data point. This five-minute window is termed the downsample window.
Effects on metric types
Downsampling effects differ based on the metric type.
-
Cumulative counter: Cumulative counters downsample by preserving the overall increase (respecting resets) between the start and end of the downsample window. This reduces the temporal granularity by observing only one increase every five minutes, while keeping the running count accurate.
-
Gauge: Downsampling of gauges differs, depending on how the gauge was ingested.
By default, gauges downsample by preserving only the last data point in every downsample window. Any changes to the gauge prior to the end of the downsample window aren't retained.
If the gauge is an output of a
MIN/MAX
aggregation, the gauge is downsampled by preserving theMIN/MAX
data point in every downsample window, respectively.Gauges ingested with StatsD downsample using a Largest-Triangle-Three-Buckets (LTTB) downsampling algorithm for consistency with the Graphite query engine. Graphite statistics use a sum downsampling for counters, and LTTB for timers and gauges.
-
Histograms: Cumulative exponential histograms are downsampled by preserving the overall increase (respecting resets) of each bucket count between the start and end of the downsample window. Delta exponential histograms are downsampled by summing the data points in the downsampling window.
Due to the reduced temporal granularity, you'll see changes no more frequently than every 5 minutes, while the running bucket counts remain accurate.
If the histogram exceeds the 160-bucket limit, Observability Platform decreases its scale until the bucket count is within the limit. Downscaling reduces the histogram's resolution.
Legacy Prometheus histograms are cumulative counters and have the same downsampling effects.
Querying downsampled data
Users can have issues querying downsampled data, in particular the switch from higher resolution data to lower. Some queries can be sensitive to data resolution. Although a query might work well with raw data, it could return no data or data that doesn't make sense after downsampling.
For example, query rate(some_metric[2m])
with a raw resolution of 30s
returns
results. If the user increases the query time range, the query requests downsampled
data. The results are empty, because the query requested multiple data points in the
2m
window, but downsampling reduces available data to one data point every
five minutes.
To prevent empty graphs when changing the query time range, Observability Platform
modifies the user query. If a query uses downsampled data at a 5m
resolution, but
it contains range selectors that are lower than that (for example, [2m]
),
Observability Platform rewrites the range selectors to be three times the resolution
of the data.
Based on the previous example, Observability Platform executes the modified query
rate (some_metric[15m]) (15m = 3 x 5m)
when querying downsampled data.
Querying lower resolution data might additionally affect some of the function output, including:
- The
increase()
function can return very different numbers after switching to downsampled data. - The
rate()
function smooths out peaks for graphs with many peaks with raw data. - Any
resets()
are dropped when downsampling data.