Trace datasets
Understanding your tracing license consumption helps identify where you're spending the most money on your tracing data.
Trace datasets are a control mechanism that let you map sets of traces to named groups relevant to your organization, and then track processed and persisted bytes for those groups over time.
For example, you might create a Shopper dataset based on data like services, operations, customer IDs, and tags that relate to your shopping app. Viewing that dataset provides a snapshot of trace data volume associated with the entire business unit related to your shopping app.
Chronosphere recommends creating one dataset per team or per environment. Understanding data consumption for individual business units can highlight which sampling rules to adjust so you can better control your trace data license consumption and remain within defined data limits.
Datasets are part of the Trace Control Plane, which also includes trace behaviors and head and tail sampling rules. You need administrative access to use the Trace Control Plane.
To access trace datasets, in the navigation menu, click Go to Admin and then select Control > Trace Control Plane.
View datasets
Select from the following methods to view and filter available trace datasets.
To view trace datasets:
-
In the navigation menu, click Go to Admin and then select Control > Trace Control Plane.
-
Take any of the following actions to change the displayed data:
-
Click Processed or Persisted to toggle between processed and persisted bytes.
-
On either of the graphs, click the more icon and select Open in Metrics Explorer to visualize the underlying query.
-
Toggle Show unique volume to display only the volume of data that doesn't overlap with another dataset.
-
The Overview tab displays your total license consumption for the selected period, which defaults to the current month. This view includes graphs that display the daily volume breakdown and the cumulative breakdown over the current week.
A table includes all available datasets. The row for each dataset displays the underlying query that defines the dataset, the total data volume, the percent of data overlap, and any active behaviors.
In the datasets table, select one or more datasets to update the graphs. You can click and drag a section of either graph to zoom in on the selected time period.
To view an individual dataset, click the name of the dataset you want to view from the list. The individual dataset page includes a definition of the underlying Trace Explorer query and the services at the root of all traces in the dataset. To view the underlying queries, in either the Definition or Root services, click Search in Trace Explorer.
Create datasets
You can create datasets using the following methods. Define and test your query in Trace Explorer, and then map that query to the resource you want to create.
After creating datasets, you can assign trace behaviors for your datasets to set sampling rules from within Chronosphere Observability Platform without needing to write individual head or tail sampling rules.
To create a dataset:
-
Define a query in Trace Explorer that represents the data you want included in the dataset. For example, the following query returns all traces where at least one span includes a service called
payment-svc
, an operation that starts withcheckout
, and a tag namedenv=prod
:service="payment-svc" operation=~"^payment*." tag:env=prod*"
-
After defining the underlying query, create a YAML definition in Chronoctl or a Terraform resource to map the query to a dataset that represents the business unit you want to track trace data for.
The
scaffold
parameter requires Chronoctl version 0.55 or later.
If you don't already have a YAML configuration file, use the scaffold
Chronoctl
parameter to generate a template for a specific resource type:
chronoctl datasets scaffold
You can redirect the results (using the redirection operator >
) to a file for
editing.
To create a dataset with Chronoctl:
-
Run the following command to generate a sample dataset configuration you can use as a template:
chronoctl datasets scaffold
In the template,
kind: Dataset
defines an individual dataset. -
With a completed definition, submit it with:
chronoctl datasets create -f FILE_NAME
Replace
FILE_NAME
with the name of the YAML definition file you want to use.
See the Chronoctl dataset example for a completed dataset definition.
Identify incomplete traces
You can create a dataset specifically for identifying incomplete traces, which are traces with spans that reference other spans outside of the selected trace. Incomplete traces can occur if a service is misconfigured and isn't exporting spans correctly.
Chronosphere recommends creating at least one dataset with the Chronosphere-supplied
parent_missing=true
key/value pair to help identify and track changes in incomplete
trace volume or trace instrumentation over time. As you add more trace
instrumentation, fewer traces meet this criteria, which drives down the volume of
traces in this dataset. You can also apply behaviors to this dataset to decrease the
persisted volume of incomplete traces.
Use one of the following examples to create a dataset for identifying incomplete traces.
name: Partial Traces
slug: partial-traces
description: Track data volume for incomplete traces.
configuration:
type: TRACES
trace_dataset:
match_criteria:
span:
- match_type: INCLUDE
tags:
- key: parent_missing
value:
match: EXACT
value: true
Chronoctl dataset example
The following YAML definition consists of one dataset named
Traces payment service US prod
. This dataset includes any spans that include the
payment
service, the payment_store
operation, and have a tag where
deployment.environment=production
.
If you want to specify criteria at the trace level rather than the span level,
define trace
instead of span
in your YAML definition.
api_version: v1/config
kind: Dataset
spec:
# Required name of the dataset. Can be modified after the dataset is created.
name: Traces payment service US prod
# Unique identifier of the dataset. If not provided, a slug is generated based
# on the name field. Can't be modified after the dataset is created.
slug: traces-payment-service-us-prod
# Optional description for the dataset.
description: Traces for payment service in US production environment
# Defining characteristics of the dataset.
configuration:
# Dataset type, which must be TRACES.
type: TRACES
trace_dataset:
# Trace criteria to match for the dataset.
match_criteria:
# Object that represents the span conditions to match on. All conditions must
# be true in a single span for the span to be considered a match.
span:
# Determines whether in INCLUDE or EXCLUDE all traces that contain at least
# one span matching the filter.
- match_type: INCLUDE
# The service to match on in candidate spans.
service:
# Operator to compare in_values with. Can be one of EXACT, REGEX,
# EXACT_NEGATION, REGEX_NEGATION, IN, NOT_IN.
match: IN
# Values the filter tests against when using IN or NOT_IN match type.
in_values:
- payment
# The operation to match on in candidate spans.
operation:
match: REGEX
# The value the filter compares to the target trace or span field.
value: /payment_store/.*
# The tag to match on in candidate spans.
tags:
# The key of the span tag to match on in the filter.
- key: deployment.environment
value:
match: EXACT
value: production
Terraform dataset example
The following Terraform resource creates a dataset that Terraform refers to by
prod_payment_us
, and with a human-readable name of Traces payment service US prod
.
This dataset includes any spans that include the payment
service, where the parent
service matches either us-east
or us-west
, the parent operation begins with
/payment
, and a tag where environment
includes prod
.
If you want to specify criteria at the trace level rather than the span level,
define trace
instead of span
in your YAML definition.
resource "chronosphere_dataset" "prod_payment_us" {
# Required name of the dataset. Can be modified after the dataset is created.
name = "Traces payment service US prod"
# Optional description for the dataset.
description = "Traces passing through the payment service in US production"
# Defining characteristics of the dataset.
configuration {
# Dataset type, which must be TRACES.
type = "TRACES"
trace_dataset {
# Trace criteria to match for the dataset.
match_criteria {
# Object that represents the span conditions to match on. All conditions must
# be true in a single span for the span to be considered a match.
span {
# Matches traces based on the entire duration of the trace.
duration {
max_secs = 99
min_secs = 1
}
# Matches traces based on the top-level error status.
error {
value = true
}
# Determines whether in INCLUDE or EXCLUDE all traces that contain at least
# one span matching the filter.
match_type = "INCLUDE"
# Matches the operation of the candidate span's parent span if it's not a
# root span.
parent_operation {
value = "payments/.*"
match = "REGEX"
}
# Matches the service of the candidate span's parent span if it's not a
# root span.
parent_service {
value = "us-[east|west]"
match = "REGEX"
}
# The service to match on in candidate spans.
service {
match = "IN"
in_values = ["payment"]
}
# Defines the number of spans that must match the criteria defined by
# filter. Defaults to least one span.
span_count {
max = 2
min = 1
}
# The tag to match on in candidate spans.
tag {
key = "environment"
value {
value = "prod.*"
match = "REGEX"
}
}
tag {
key = "client_build"
value {
match = "NOT_IN"
in_values = ["debug", "beta"]
}
}
}
}
}
}
}
Assign behaviors
When viewing an individual dataset, you can assign a behavior to the dataset to set sampling rates on two levels:
-
Assign a main behavior to define the primary behavior for a dataset.
-
Assign an override behavior to temporarily override the main behavior.
You can assign only one main behavior and one override behavior to a dataset.
Both the main and override layers can use any of the trace behavior types, which are baseline, allow, and deny. You can also create custom behaviors and assign them to the main or override layers on datasets. When assigning a behavior to the override layer, you can set the behavior to start immediately, or schedule it to start at a future time.
When managing assigned behaviors, you can set the shaping order for overlapping trace datasets. The shaping order determines the priority order to apply behaviors when traces in one dataset overlap with traces in another dataset. For example, if a trace belongs to more than one dataset with an assigned behavior, Observability Platform uses the behavior assigned to the dataset that's first in the shaping order.
The shaping order applies only when the selected behavior is active.
Assigning a behavior to a dataset is different than editing the baseline behavior, where you can modify the facets of a baseline behavior based on the sampling strategy you want to use.
Select from the following methods to assign behaviors to a dataset.
To assign behaviors to a dataset:
You can also manage assigned behaviors from the Behaviors tab of Trace Control Plane.
- In the navigation menu, click Go to Admin and then select Control > Trace Control Plane.
- From the list of datasets, click the dataset you want to manage behaviors for.
- In the selected dataset page, in the Behavior pane, click Manage.
- In the Main layer pane, select a main behavior from the dropdown.
- Optional: In the Override layer pane, select an override behavior and choose when the override should start and end, and select a duration for how long the override remains active.
- Select a shaping order for your main behavior. Shaping order is in decreasing priority, so a behavior in position one takes precedence over a behavior in position three.
- Click Save to save the behavior definition for your dataset.
Chronoctl behavior example
The following YAML definition consists of one behavior named
Traces payment service US prod
. This dataset includes any spans that include the
payment
service, the payment_store
operation, and have a tag where
deployment.environment=production
.
api_version: v1/config
kind: TraceBehaviorConfigs
spec:
# List of assignments for the main behavior. The referenced datasets are datasets
# to enroll in behaviors. The referenced behaviors are the active behaviors
# for the dataset when there is no override in place.
# * Only one main behavior can be assigned to a dataset.
# * Only one referenced 'TraceBehavior' with 'type' field set to 'TYPE_BASELINE' can
# be set, which must match the slug referenced by 'baseline_behavior_slug'.
main_behavior_assignments:
- created_at: "2024-08-24T14:15:22Z"
updated_at: "2024-08-24T13:22:21Z"
# The slug reference of a TraceDataset
dataset_slug: "shopper-dataset"
# The slug reference of a TraceBehavior
behavior_slug: "baseline"
# The author or creator of the entry.
created_by: "someone@example.com"
# A description of the entry.
description: "Description of the behavior"
# List of assignments for the override behavior. OverrideBehaviorAssignments are used to
# specify the active behavior for a dataset over a specific time range.
# * Only one override behavior can be assigned to a dataset.
# * Only one referenced 'TraceBehavior' with 'type' field set to 'TYPE_BASELINE' can
# be set, which must match the slug referenced by 'baseline_behavior_slug', and any
# baseline behavior referenced in 'main_behavior_assignments'.
override_behavior_assignments:
- created_at: "2024-08-24T14:15:22Z"
updated_at: "2024-08-24T13:22:21Z"
# The slug reference of a TraceDataset
dataset_slug: "shopper-dataset"
# The slug reference of a TraceBehavior
behavior_slug: "keep-all"
# The starting time of the override.
start_time: "2024-08-26T14:15:22Z"
# The ending time of the override.
end_time: "2024-08-26T15:15:22Z"
# The author or creator of the entry.
created_by: "someone@example.com"
# A description of the entry.
description: "Allow all traces for one hour"
# List of dataset priorities. This list specifies the order in which datasets
# are considered when determining the behavior to follow for a trace. Dataset
# priorities are used to break ties when a trace matches more than one dataset
# with an active behavior.
# * Each entry in this list must refer to the slug of an existing dataset.
# * The order of the list is the order in which the datasets are considered.
# * The list must contain all datasets referenced in either main_behavior_assignments
# and override_behavior_assignments.
# * The list may contain datasets that are not referenced in either of the
# previous references.
dataset_priorities:
- "baseline"
- "keep-all"
# The baseline behavior to use for behavior assignments and base head sampling rates.
# Must reference a TraceBehavior entity with type: TYPE_BASELINE.
baseline_behavior_slug: "baseline"
Delete datasets
Select from the following methods to delete trace datasets.
Users cannot modify Terraform-managed resources in the user interface, with Chronoctl, or by using the API. Learn more.
To delete a dataset with Chronoctl, use the chronoctl datasets delete
command:
chronoctl datasets delete SLUG
Replace SLUG
with the slug of the dataset you want to delete.
For example, to delete a dataset with the slug infra-example-dataset
:
chronoctl datasets delete infra-example-dataset