Long-term downsampling

Long-term downsampling

After approximately five days, each persisted metric, including both raw and aggregated metrics, undergoes a process called long-term downsampling (LTD). This process temporally downsamples data into a more compact form, and then deletes the non-downsampled data permanently.

To maintain an accurate representation of the data, Chronosphere utilizes different downsampling methodologies, depending on the metric type and, if it's an aggregated metric, the method of aggregation used to produce its value.

These behaviors are important to note beforehand, since any unexpected results of LTD will be noticed only approximately five days after ingestion.

By default, Chronosphere downsamples long term data at a five-minute granularity, where all data points within each five-minute window compress into a single data point. This five-minute window is termed the downsample window.

Effects on metric types

Downsampling effects differ based on the metric type.

  • Cumulative counter: Cumulative counters downsample by preserving the overall increase (respecting resets) between the start and end of the downsample window. This reduces the temporal granularity by observing only one increase every five minutes, while keeping the running count accurate.

  • Gauge: Downsampling of gauges differs, depending on how the gauge was ingested.

    By default, gauges downsample by preserving only the last data point in every downsample window. Any changes to the gauge prior to the end of the downsample window aren't retained.

    If the gauge is an output of a MIN/MAX aggregation, the gauge is downsamples by preserving the MIN/MAX data point in every downsample window, respectively.

    Gauges ingested with StatsD downsample using a Largest-Triangle-Three-Buckets downsampling algorithm for consistency with the Graphite query engine.

Querying downsampled data

Users can have issues querying downsampled data, in particular the switch from higher resolution data to lower. Some queries can be sensitive to data resolution. Although a query might work well with raw data, it could return no data or data that doesn't make sense after downsampling.

For example, query rate(some_metric[2m]) with a raw resolution of 30s returns results. If the user increases the query range, the query requests downsampled data. The results are empty, because the query requested multiple data points in the 2m window, but downsampling reduces available data to one data point every five minutes.

To prevent empty graphs when a changes the query time range, Chronosphere modifies the user query. If a query uses downsampled data at a 5m resolution, but it contains range selectors that are lower than that (for example, [2m]), Chronosphere rewrites the range selectors to be three times the resolution of the data.

Based on the previous example, Chronosphere executes the modified query rate (some_metric[15m]) (15m = 3 x 5m) when querying downsampled data.

Querying lower resolution data might additionally affect some of the function output, including:

  • The increase() function can return very different numbers after switching to downsampled data.
  • The rate() function smooths out peaks for graphs with many peaks with raw data.
  • Any resets() are dropped when downsampling data.