> ## Documentation Index
> Fetch the complete documentation index at: https://docs.chronosphere.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Deduplicate records

export const entity_0 = "deduplicate records processing rule"

The deduplicate records [processing rule](/ingest/pipeline/processing-rules) looks for
any records that contain identical key/value data during a specified time frame, then
removes all but the first occurrence of those records within that time frame.

<Warning>
  When the deduplicate records rule waits for data to accumulate during the
  specified time window, the pipeline [buffers](/ingest/pipeline/v2/configure/backpressure)
  that data. Increasing the value of the **Time window** parameter also increases
  the memory load on your pipeline. For example, if 100,000 records pass through
  your pipeline during the specified time period, and those records are 1 kB
  each, the deduplicate records rule will add approximately 100 MB of memory
  load.
</Warning>

## Configuration parameters

Use the parameters in this section to configure the {entity_0}. The
Telemetry Pipeline web interface uses the items in the **Name** column to
describe these parameters. [Pipeline configuration files](/ingest/pipeline/v2/configure/config-files)
use the items in the **Key** column as YAML keys.

| Name                                    | Key           | Description                                                                                                                                                                                                                                                                                                                                                                         | Default           |
| --------------------------------------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- |
| **Time window**                         | `window`      | Required. How long to wait (in seconds) for data to accumulate in your pipeline before searching for duplicate records. For example, for a window length of `5`, two records with an identical key/value pair are considered duplicates if they both occur within the same five-second period, but not if they occur within the same 10-second period.                              | *none*            |
| **Select key**                          | `key`         | Required. The key to use in your comparison. If multiple records have the same value assigned to this key, this rule removes all but the earliest record to contain that key/value pair within the specified time frame. You can also use [record accessor syntax](/ingest/pipeline/processing-rules#record-accessor-syntax) to reference keys nested within another nested object. | *none*            |
| **Ignore records without key** checkbox | `skipMissing` | If selected, skips any records that don't contain your specified **Select key**. Chronosphere recommends selecting this checkbox to prevent processing errors.                                                                                                                                                                                                                      | Selected / `true` |
| **Comment**                             | `comment`     | A custom note or description of the rule's function. This text is displayed next to the rule's name in the **Actions** list in the processing rules interface.                                                                                                                                                                                                                      | *none*            |

## Example

Using the deduplicate records rule lets you remove redundant information from
your pipeline and reduce the amount of data that reaches your backend.

For example, given the following sample log data:

```json lines theme={null}
{"message": "All endpoints are functional."}
{"message": "All endpoints are functional."}
{"message": "All endpoints are functional."}
{"message": "All endpoints are functional."}
{"message": "The /purchase endpoint is unavailable."}
{"message": "The /purchase endpoint is unavailable."}
{"message": "The /purchase endpoint is unavailable."}
{"message": "The /purchase endpoint is partly unavailable."}
{"message": "The /purchase endpoint has been reset."}
{"message": "All endpoints are functional."}
{"message": "All endpoints are functional."}
```

A processing rule with the **Time window** value `5` and the **Source key** value
`message` returns the following result:

```json theme={null}
{"message":"All endpoints are functional."}
{"message":"The /purchase endpoint is unavailable."}
{"message":"The /purchase endpoint is partly unavailable."}
{"message":"The /purchase endpoint has been reset."}
{"message":"All endpoints are functional."}
```

This rule removed all but the first instance of any logs with identical
`message` values that appeared within the specified time frame. Because more
than five seconds elapsed between the value `All endpoints are functional` on
line 1 and the same value on line 10, this rule retained both the log on line 1
and the log on line 10. However, since fewer than five seconds elapsed between
the value `All endpoints are functional` on line 10 and the same value on line 11,
this rule removed the log on line 11.
