Understanding your tracing license consumption helps identify where you’re spending
the most money on your tracing data.Trace datasets are a control mechanism that let you map sets of traces to
named groups relevant to your organization, and then track processed and persisted
bytes for those groups over time.For example, you might create a Shopper dataset based on data like services,
operations, customer IDs, and tags that relate to your shopping app. Viewing that
dataset provides a snapshot of trace data volume associated with the entire business
unit related to your shopping app.
Chronosphere recommends creating one dataset per team or per environment.
Understanding data consumption for individual business units can highlight which
sampling rules to adjust so you can better control your trace
data license consumption and remain within defined data limits.
Datasets are part of the Trace Control Plane, which also includes trace behaviors and
head and tail sampling rules. You need administrative access to use the Trace Control
Plane.To access trace datasets, in the navigation menu, click
Go to Admin and then select
Control > Trace Control
Plane.When searching traces in Trace
Explorer, use the Custom tags field in the Query builder in Trace Explorer to
search for the behavior_dataset_slug tag. This tag identifies traces that were
sampled by a particular dataset’s behavior during processing. A trace cam match many
datasets, but will only be sampled by the behaviors on the first dataset it matched.
used to determine the active behavior for sampling purposes.
Select from the following methods to view and filter available trace datasets.
Web
Chronoctl
Terraform
API
To view trace datasets:
In the navigation menu, click Go to Admin
and then select
Control > Trace
Control Plane.The Overview tab displays your total license consumption for the selected
period, which defaults to the current month to date. This view includes graphs
that display the daily volume breakdown and the cumulative breakdown over the
current week.
Take any of the following actions to change the displayed data:
On either of the Processed or Persisted graphs, click the more icon and select Open in Metrics Explorer to
visualize the underlying query.
Toggle Show unique volume to display only the volume of data that doesn’t
overlap with another dataset.
Toggle Show dropped volume to display only the volume of data that’s being
dropped.
Use the search box to search for a specific dataset. The row for each dataset
displays, the total data volume, the percent of data overlap, and any active
behaviors.
In the datasets table, select one or more datasets to update the graphs. You can
click and drag a section of either graph to zoom in on the selected time period.
To view an individual dataset, click the name of the dataset you want to view from
the list.The individual dataset page includes a definition of the underlying Trace Explorer
query and the services at the root of all traces in the dataset. To view the
underlying queries, in either the Definition or Root services, click
Search in Trace Explorer.
To use Chronoctl to return all trace datasets, use the
chronoctl datasets list command:
Copy
chronoctl datasets list
To filter for a specific trace dataset, add the slugs argument to the command:
Copy
chronoctl datasets list --slugs SLUG
Replace SLUG with the slug of the dataset you want to
display.Use the Code Config tool tool to view the
dataset’s Chronoctl YAML representation.
Use the Code Config tool in Observability
Platform to view a dataset’s Terraform representation.
To complete this action with the Chronosphere API, use the
ListDatasets
endpoint.Because the Chronosphere API requires authentication, include an API token with your
curl request, as shown in the following example. For more details, see
Create an API token.
To create a dataset, define and test your trace query, and then map that query to the
resource you want to create.After creating datasets, you can assign trace behaviors for your
datasets. Behaviors let you set sampling rates and the shaping order, which
determines the order of priority behaviors to apply when there are overlapping traces
with other datasets.Use one of the following methods to create a dataset.
Web
Chronoctl
Terraform
API
In the navigation menu, click Go to Admin
and then select
Control > Trace Control Plane.
Click Create dataset.
Enter a display name for your dataset, which is used to generate a default slug.
If you want the slug to be a different value, edit the Slug field directly.
Enter comments about the dataset, such as the business unit this dataset tracks
trace data for.
Define dataset match criteria to outline the query that matches traces you want
included in the dataset. You can add one or more span filters to additionally
refine the trace results.See Search and filter trace data for
information about how to define an effective search for trace data.
Click View statistics to open Trace Explorer in a new tab with your defined
query. Review the results to ensure your query returns the trace data you expect.
In the Create dataset pane, click Save to create your dataset.
Observability Platform creates your dataset and displays its definition. Next,
assign trace behaviors for your dataset to set sampling rates.
If you don’t already have a YAML configuration file, use the scaffold Chronoctl
parameter to generate a template for a specific resource type:
Copy
chronoctl datasets scaffold
You can redirect the results (using the redirection operator >) to a file for
editing.
Define a query in
Trace Explorer that
represents the data you want included in the dataset. For example, the following
query returns all traces where at least one span includes a service called
payment-svc, an operation that starts with checkout, and a tag named env=prod:
Create a YAML definition to map the query to a dataset that represents the
business unit you want to track trace data for.Use the following command to generate a sample dataset configuration you can use
as a template:
Copy
chronoctl datasets scaffold
In the template, kind: Dataset defines an individual dataset.
With a completed definition, submit it with:
Copy
chronoctl datasets create -f FILE_NAME
Replace FILE_NAME with the name of the YAML definition file you want to use.
See the Chronoctl dataset example for a completed
dataset definition.After creating your dataset, assign trace behaviors for your
dataset to set sampling rates and the shaping order, which determines the order of
priority behaviors to apply when there are overlapping traces with other datasets.
When you run terraform plan to generate an execution plan, Chronosphere automatically
tests configurations that include notification policies by submitting them as dry runs.
For details, see the
Terraform provider
documentation.
Define a query in
Trace Explorer that
represents the data you want included in the dataset. For example, the following
query returns all traces where at least one span includes a service called
payment-svc, an operation that starts with checkout, and a tag named env=prod:
To complete this action with the Chronosphere API, use the
CreateDataset
endpoint.Because the Chronosphere API requires authentication, include an API token with your
curl request, as shown in the following example. For more details, see
Create an API token.
You can create a dataset specifically for identifying incomplete traces, which are
traces with spans that reference other spans outside of the selected trace.
Incomplete traces can occur if a service is misconfigured and isn’t exporting spans
correctly.Chronosphere recommends creating at least one dataset with the Chronosphere-supplied
parent_missing=true key/value pair to help identify and track changes in incomplete
trace volume or trace instrumentation over time. As you add more trace
instrumentation, fewer traces meet this criteria, which drives down the volume of
traces in this dataset. You can also apply behaviors to this dataset to decrease the
persisted volume of incomplete traces.Use one of the following examples to create a dataset for identifying incomplete
traces.
Chronoctl
Terraform
Copy
name: Partial Traces slug: partial-traces description: Track data volume for incomplete traces. configuration: type: TRACES trace_dataset: match_criteria: span: - match_type: INCLUDE tags: - key: parent_missing value: match: EXACT value: true
Copy
resource "chronosphere_dataset" "incomplete_traces" { name = "Incomplete traces" description = "Track data volume for incomplete traces." configuration { type = "TRACES" trace_dataset { match_criteria { span { match_type = "INCLUDE" tag { key = "parent_missing" value { value = "true" match = "EXACT" } } } } } }}
The following YAML definition consists of one dataset named
Traces payment service US prod. This dataset includes any spans that include the
payment service, the payment_store operation, and have a tag where
deployment.environment=production.
If you want to specify criteria at the trace level rather than the span level,
define trace instead of span in your YAML definition.
Copy
api_version: v1/configkind: Datasetspec: # Required name of the dataset. Can be modified after the dataset is created. name: Traces payment service US prod # Unique identifier of the dataset. If not provided, a slug is generated based # on the name field. Can't be modified after the dataset is created. slug: traces-payment-service-us-prod # Optional description for the dataset. description: Traces for payment service in US production environment # Defining characteristics of the dataset. configuration: # Dataset type, which must be TRACES. type: TRACES trace_dataset: # Trace criteria to match for the dataset. match_criteria: # Object that represents the span conditions to match on. All conditions must # be true in a single span for the span to be considered a match. span: # Determines whether in INCLUDE or EXCLUDE all traces that contain at least # one span matching the filter. - match_type: INCLUDE # The service to match on in candidate spans. service: # Operator to compare in_values with. Can be one of EXACT, REGEX, # EXACT_NEGATION, REGEX_NEGATION, IN, NOT_IN. match: IN # Values the filter tests against when using IN or NOT_IN match type. in_values: - payment # The operation to match on in candidate spans. operation: match: REGEX # The value the filter compares to the target trace or span field. value: /payment_store/.* # The tag to match on in candidate spans. tags: # The key of the span tag to match on in the filter. - key: deployment.environment value: match: EXACT value: production
The following Terraform resource creates a dataset that Terraform refers to by
prod_payment_us, and with a human-readable name of Traces payment service US prod.This dataset includes any spans that include the payment service, where the parent
service matches either us-east or us-west, the parent operation begins with
/payment, and a tag where environment includes prod.
If you want to specify criteria at the trace level rather than the span level,
define trace instead of span in your YAML definition.
Copy
resource "chronosphere_dataset" "prod_payment_us" { # Required name of the dataset. Can be modified after the dataset is created. name = "Traces payment service US prod" # Optional description for the dataset. description = "Traces passing through the payment service in US production" # Defining characteristics of the dataset. configuration { # Dataset type, which must be TRACES. type = "TRACES" trace_dataset { # Trace criteria to match for the dataset. match_criteria { # Object that represents the span conditions to match on. All conditions must # be true in a single span for the span to be considered a match. span { # Matches traces based on the entire duration of the trace. duration { max_secs = 99 min_secs = 1 } # Matches traces based on the top-level error status. error { value = true } # Determines whether in INCLUDE or EXCLUDE all traces that contain at least # one span matching the filter. match_type = "INCLUDE" # Matches the operation of the candidate span's parent span if it's not a # root span. parent_operation { value = "payments/.*" match = "REGEX" } # Matches the service of the candidate span's parent span if it's not a # root span. parent_service { value = "us-[east|west]" match = "REGEX" } # The service to match on in candidate spans. service { match = "IN" in_values = ["payment"] } # Defines the number of spans that must match the criteria defined by # filter. Defaults to least one span. span_count { max = 2 min = 1 } # The tag to match on in candidate spans. tag { key = "environment" value { value = "prod.*" match = "REGEX" } } tag { key = "client_build" value { match = "NOT_IN" in_values = ["debug", "beta"] } } } } } }}
When viewing an individual dataset, you can assign a
behavior to the dataset to set sampling rates on two
levels:
Assign a main behavior to define the primary behavior for a dataset.
Assign an override behavior to temporarily override the main behavior.
You can assign only one main behavior and one override behavior to a dataset.
Both the main and override layers can use any of the
trace behavior types, which are
baseline, allow, and deny. You can also
create custom behaviors and
assign them to the main or override layers on datasets. When assigning a behavior to
the override layer, you can set the behavior to start immediately, or schedule it to
start at a future time.When managing assigned behaviors, you can set the shaping order for overlapping trace
datasets. The shaping order determines the priority order to apply behaviors when
traces in one dataset overlap with traces in another dataset. For example, if a trace
belongs to more than one dataset with an assigned behavior, Observability Platform
uses the behavior assigned to the dataset that’s first in the shaping order.The shaping order applies only when the selected behavior is active.
Assigning a behavior to a dataset is different than
editing the baseline behavior,
where you can modify the facets based on the sampling strategy you want to use.
Select from the following methods to assign behaviors to a dataset.
In the navigation menu, click Go to Admin
and then select
Control > Trace Control Plane.
From the list of datasets, click the dataset you want to manage behaviors for.
In the selected dataset page, in the Behavior pane, click Manage.If you already have a behavior assigned to a dataset, you can
run a preview of another
dataset to preview its affects based on a dataset’s volume. This capability
lets you temporarily preview a behavior to understand its impact before assigning
it.
In the Main layer pane, select a main behavior from the dropdown.
Optional: In the Override layer pane, select an override behavior and choose
when the override should start and end, and select a duration for how long the
override remains active.
Select a shaping order for your main behavior. Shaping order is in decreasing
priority, so a behavior in position one takes precedence over a behavior in
position three.
Click Save to save the behavior definition for your dataset.
If you don’t already have a YAML configuration file, use the scaffold Chronoctl
parameter to generate a template for a specific resource type:
Copy
chronoctl trace-behavior-config scaffold
You can redirect the results (using the redirection operator >) to a file for
editing.
To complete this action with the Chronosphere API, use the
UpdateTraceBehaviorConfig
endpoint.Because the Chronosphere API requires authentication, include an API token with your
curl request, as shown in the following example. For more details, see
Create an API token.
The following YAML definition consists of one behavior named
Traces payment service US prod. This dataset includes any spans that include the
payment service, the payment_store operation, and have a tag where
deployment.environment=production.
Copy
api_version: v1/configkind: TraceBehaviorConfigspec: # List of assignments for the main behavior. The referenced datasets are datasets # to enroll in behaviors. The referenced behaviors are the active behaviors # for the dataset when there is no override in place. # * Only one main behavior can be assigned to a dataset. # * Only one referenced 'TraceBehavior' with 'type' field set to 'TYPE_BASELINE' can # be set, which must match the slug referenced by 'baseline_behavior_slug'. main_behavior_assignments: - created_at: "2024-08-24T14:15:22Z" updated_at: "2024-08-24T13:22:21Z" # The slug reference of a TraceDataset dataset_slug: "shopper-dataset" # The slug reference of a TraceBehavior behavior_slug: "baseline" # The author or creator of the entry. created_by: "someone@example.com" # A description of the entry. description: "Description of the behavior" # List of assignments for the override behavior. OverrideBehaviorAssignments are used to # specify the active behavior for a dataset over a specific time range. # * Only one override behavior can be assigned to a dataset. # * Only one referenced 'TraceBehavior' with 'type' field set to 'TYPE_BASELINE' can # be set, which must match the slug referenced by 'baseline_behavior_slug', and any # baseline behavior referenced in 'main_behavior_assignments'. override_behavior_assignments: - created_at: "2024-08-24T14:15:22Z" updated_at: "2024-08-24T13:22:21Z" # The slug reference of a TraceDataset dataset_slug: "shopper-dataset" # The slug reference of a TraceBehavior behavior_slug: "keep-all" # The starting time of the override. start_time: "2024-08-26T14:15:22Z" # The ending time of the override. end_time: "2024-08-26T15:15:22Z" # The author or creator of the entry. created_by: "someone@example.com" # A description of the entry. description: "Allow all traces for one hour" # List of dataset priorities. This list specifies the order in which datasets # are considered when determining the behavior to follow for a trace. Dataset # priorities are used to break ties when a trace matches more than one dataset # with an active behavior. # * Each entry in this list must refer to the slug of an existing dataset. # * The order of the list is the order in which the datasets are considered. # * The list must contain all datasets referenced in either main_behavior_assignments # and override_behavior_assignments. # * The list may contain datasets that are not referenced in either of the # previous references. dataset_priorities: - "baseline" - "keep-all" # The baseline behavior to use for behavior assignments and base head sampling rates. # Must reference a TraceBehavior entity with type: TYPE_BASELINE. baseline_behavior_slug: "baseline"
When creating or editing a dataset, you can
use the Code Config tool to view code
representations of a dataset for
Terraform, Chronoctl, and
the Chronosphere API. The displayed code also responds to
changes you make in the Visual Editor tab.Entities modified by Terraform and Chronoctl are viewable in Observability Platform,
but can’t be modified.Select from the following methods to edit trace datasets.
Web
Chronoctl
Terraform
API
In the navigation menu, click Go to Admin
and then select
Control > Trace Control Plane.
From the list of datasets, click the dataset you want to edit.
On the selected dataset page, click Edit dataset.
Make changes to your dataset, and then click Save.
Observability Platform saves changes to your dataset.
To complete this action with the Chronosphere API, use the
UpdateDataset endpoint.Because the Chronosphere API requires authentication, include an API token with your
curl request, as shown in the following example. For more details, see
Create an API token.
Edit your Terraform configuration file to remove the pre-existing resource
definition.
Run this command to remove the resource from Observability Platform:
Copy
terraform apply
To complete this action with the Chronosphere API, use the
DeleteDataset endpoint.Because the Chronosphere API requires authentication, include an API token with your
curl request, as shown in the following example. For more details, see
Create an API token.