Data control concepts

Chronosphere uses collectors to ingest data, and utilizes push and pull models of ingestion, depending on the data collected and the method of ingestion.

Chronosphere provides the following control concepts to help manage your data you keep only the data that’s most important to your organization.

Datasets

Datasets are a data control mechanism for organizing your data and mapping it to named groups relevant to your organization. Each telemetry type can have any number of datasets. Datasets can either overlap, or stand alone and be assigned to only a single business unit or service within your organization. You can use behaviors to change the sampling rates of one or more datasets.

Each dataset entity lets you define a filter to assign specific chunks of data to a particular dataset. A default, system-defined dataset exists for budgeting. You can assign the default dataset to data that isn’t explicitly assigned to any other dataset.

See trace datasets to learn about how to use datasets to track your processed and persisted trace data.

Behaviors

Behaviors are a data shaping mechanism you can use to sample your data and more effectively control your persisted data. Use behaviors to change the sampling rates of one or more datasets without needing to write fine-grained sampling or shaping rules.

Each behavior can have continuous shaping rules, such as sampling rules or rollup rules. Behaviors might also have responsive shaping rules. For example, when a dataset exceeds a defined budget, downsample the low-priority data by 50% for ten minutes.

See trace behaviors to learn about how to use behaviors to set sampling rates for your datasets without needing to write fine-grained sampling rules.

Allocations

Allocations are a data budgeting concept that define how to distribute a subset of data within a dataset. Each allocation is an array of values representing a percentage of your total license consumption in relation to specific datasets.

The priority order for each dataset determines how data gets distributed. If data overlaps two or more datasets, the overlapping data is allocated to the first dataset in the entity definition only.

Only one allocation can exist for each supported telemetry type. The allocation can reference one or more datasets within each telemetry type. Administrators can configure each allocation to create budgets with the assigned datasets.

Create budgets to allocate a percentage of your log license limit to each dataset. For metrics, configure pool allocations, which define how much of your total persisted writes license you want allocated to each pool.

Late-arriving metrics

Prometheus, OpenTelemetry, and other metric formats include one or more timestamp values for each data point. These values indicate when a sample was observed, or represent the time range of the data point. Chronosphere Observability Platform can accept late-arriving data points within a time frame, depending whether the data point matches an aggregation rule or is ingested without aggregation:

Aggregation rules accept data points from two minutes to eight minutes past the current ingestion time.
Raw data points can be written to the database up to two hours before the ingestion timestamp.

Downsampling

Collectors ingest metrics at specific intervals, based on system configuration. This granularity of metric data can be helpful in diagnostic efforts, but particular issues might not be served by such granularity. If you are able to diagnose production issues using a coarser granularity of metric data, downsampling the data reduces the amount of data persisted to the Observability Platform database.

Downsample incoming data in Observability Platform using these methods:

Change the Chronosphere Collector configuration by changing the rate the Collector publishes metrics to the server.
Use mapping rules to downsample metrics that aren’t aggregated.
Use rollup rules to downsample aggregated metrics.

Over time, persisted data uses significant storage capacity. Observability Platform performs long-term downsampling to control data storage costs while retaining important statistics.

Control Metric shaping