Trace sampling

Sample your traces

Distributed traces provide an additional layer of context for solving problems across complex systems that include hundreds or thousands of microservices. However, you want to ensure you're ingesting only tracing data that's relevant and valuable. To help control costs and maximize the usefulness of your tracing data, you can narrow your focus to only a representative sample of your data and drop everything else.

Head and tail sampling

The best-known strategies for sampling trace data are head sampling and tail sampling:

  • Head sampling is a more blunt strategy that seeks to make a sampling decision as early as possible. Head sampling evaluates only a defined percentage of traces to take a representative sample of whole traces.

  • Tail sampling is more fine-grained, and evaluates every trace after assembling all spans. Tail sampling rules can consider request outcomes, such as whether a request succeeded or how long it took to complete, which isn't possible with head sampling.

Creating and managing head and tail sampling rules can be challenging to ensure you're discarding and keeping the most impactful data. To simplify this process and decrease the learning curve of sampling, Chronosphere developed two concepts to group, track, and apply sampling rules: datasets and behaviors.

Datasets

Create datasets to map sets of traces to named groups relevant to your organization so you can track processed and persisted bytes for those groups over time. Datasets don't impact your license consumption, so you can experiment with creating datasets to understand your license usage and make changes as needed without consuming a portion of your license. With datasets in place, you can then apply behaviors to your datasets.

Behaviors

After creating datasets for individual business units, you can apply behaviors to your datasets to set sampling rates without needing to write and manage large sets of fine-grained sampling rules.

You set a baseline behavior that implements data-driven best practices with default parameters. You can modify those parameters based on the needs of your organization. For example, modify the defined criteria to drop low-value traces as quickly as possible and keep high-value traces at a specified rate for one or more datasets from a single behavior.

You can also set a behavior to allow (sample at 100%) or deny (sample at 0%) all traces for a specific period. For example, set an allow behavior when you need to increase the amount of high-fidelity data during a deploy, or when debugging issues. Alternatively, set a deny behavior when you want to decrease the amount of noisy or spam traces to keep your budget spend within limits.