> ## Documentation Index
> Fetch the complete documentation index at: https://docs.chronosphere.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Random sampling

export const entity_0 = "random sampling processing rule"

The random sampling [processing rule](/ingest/pipeline/processing-rules) preserves a
percentage of records that pass through your pipeline, and then discards the
rest. These records are chosen at random from the total number of records that
accumulate in your pipeline between sampling intervals.

<Warning>
  When the random sampling rule waits for data to accumulate during the
  specified time window, the pipeline [buffers](/ingest/pipeline/v2/configure/backpressure)
  that data. Increasing the value of the **Time window** parameter also increases
  the memory load on your pipeline. For example, if 100,000 records pass through
  your pipeline during the specified time period, and those records are 1 kB
  each, the random sampling rule will add approximately 100 MB of memory
  load.
</Warning>

## Configuration parameters

Use the parameters in this section to configure the {entity_0}. The
Telemetry Pipeline web interface uses the items in the **Name** column to
describe these parameters. [Pipeline configuration files](/ingest/pipeline/v2/configure/config-files)
use the items in the **Key** column as YAML keys.

| Name                                 | Key          | Description                                                                                                                                                                                                                                                  | Default |
| ------------------------------------ | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------- |
| **Time window in seconds**           | `window`     | Required. How long to wait (in seconds) for data to accumulate in your pipeline before taking a sample. Depending on how quickly data accumulates in your pipeline, increasing or decreasing the time between samples can affect how much data is preserved. | *none*  |
| **Sample %**                         | `percentage` | Required. The percentage of data to preserve. Within each batch of accumulated data, the individual records to preserve are chosen randomly, and the rest are discarded. This value must be a positive integer between `1` and `100`.                        | *none*  |
| **Seed for random number generator** | `seed`       | A seed to affect the random number generator that determines which records to preserve. This value must be a positive integer.                                                                                                                               | *none*  |
| **Comment**                          | `comment`    | A custom note or description of the rule's function. This text is displayed next to the rule's name in the **Actions** list in the processing rules interface.                                                                                               | *none*  |

## Example

Using the random sampling rule lets you reduce the size of your telemetry data
while still retaining a general snapshot of events that occur during a specified
time frame. For example, given this sample website log data:

```json lines theme={null}
{"page_id":9,"action":"view"}
{"page_id":1,"action":"purchase"}
{"page_id":20,"action":"view"}
{"page_id":14,"action":"click"}
{"page_id":9,"action":"click"}
{"page_id":5,"action":"click"}
{"page_id":14,"action":"purchase"}
{"page_id":16,"action":"purchase"}
{"page_id":14,"action":"click"}
{"page_id":2,"action":"view"}
{"page_id":14,"action":"click"}
{"page_id":11,"action":"click"}
{"page_id":13,"action":"click"}
{"page_id":8,"action":"click"}
{"page_id":20,"action":"purchase"}
{"page_id":4,"action":"click"}
{"page_id":17,"action":"view"}
{"page_id":2,"action":"view"}
{"page_id":15,"action":"click"}
{"page_id":15,"action":"purchase"}
{"page_id":11,"action":"purchase"}
{"page_id":13,"action":"view"}
{"page_id":1,"action":"click"}
{"page_id":15,"action":"click"}
{"page_id":1,"action":"click"}
{"page_id":3,"action":"purchase"}
{"page_id":18,"action":"purchase"}
{"page_id":11,"action":"purchase"}
{"page_id":11,"action":"view"}
{"page_id":12,"action":"click"}
```

A processing rule with the **Time window in seconds** value `60` and the **Sample %** value
`10` returns the following result:

```json theme={null}
{"action":"click","page_id":15}
{"action":"purchase","page_id":15}
{"action":"click","page_id":12}
```

This rule retained 10% of logs that accumulated within a 60-second time frame.
Since all of the sample logs accumulated within this time frame, and the original
sample contained 30 logs, three random logs were retained and the other 27 logs
were discarded.
