Datasets

Datasets

Understanding your tracing license consumption helps identify where you're spending the most money on your tracing data.

Datasets are a control mechanism that allow you to map sets of traces to named groups relevant to your organization, and then track processed and persisted bytes for those groups over time.

For example, you might create a Shopper dataset based on data like services, operations, customer IDs, and tags that relate to your shopping app. Viewing that dataset provides a snapshot of trace data volume associated with the entire business unit related to your shopping app.

Understanding data consumption for individual business units can highlight which sampling rules to adjust so you can better control your trace data license consumption and remain within defined data limits.

Datasets are part of the Trace Control Plane, which also includes head and tail sampling rules. You need administrative access to use the Trace Control Plane.

To access datasets:

  1. Click Go to Admin.
  2. In the navigation menu, select Control > Trace Control Plane.

View datasets

The Overview tab displays your total license consumption for the selected period, which defaults to the current month. This view includes graphs that display the daily volume breakdown and the cumulative breakdown over the current week. A table includes all available datasets. The row for each dataset displays the underlying query that defines the dataset, the total data volume, and the percent of data overlap. You can take any of the following actions:

  • Click Processed or Persisted to toggle between processed and persisted bytes.

  • On either of the graphs, click the more icon and select Open in Metrics Explorer to visualize the underlying query.

  • Toggle Show unique volume to display only the volume of data that doesn't overlap with another dataset.

In the datasets table, select one or more datasets to update the graphs. You can click and drag a section of either graph to zoom in on the selected time period.

To view an individual dataset, click the name of the dataset you want to view from the list. The individual dataset page includes a definition of the underlying Trace Explorer query and the services at the root of all traces in the dataset. To view the underlying queries, in either the Definition or Root services, click Search in Trace Explorer.

Create datasets

You can create monitors using Chronoctl, Terraform, or the CreateDataset API (opens in a new tab). Define and test your query in Trace Explorer, and then map that query to the resource you want to create.

To create a dataset:

  1. Define a query in Trace Explorer that represents the data you want included in the dataset. For example, the following query returns all traces where at least one span includes a service called payment-svc, an operation that starts with checkout, and a tag named env=prod:

    service="payment-svc" operation=~"^payment*." tag:env=prod*"
  2. After defining the underlying query, create a YAML definition in Chronoctl or a Terraform resource to map the query to a dataset that represents the business unit you want to track trace data for.

The scaffold parameter requires Chronoctl version 0.55 or later.

If you don't already have a YAML configuration file, use the scaffold Chronoctl parameter to generate a template for a specific resource type:

chronoctl dataset scaffold

You can redirect the results (using the redirection operator >) to a file for editing.

To create a dataset with Chronoctl:

  1. Run the following command to generate a sample dataset configuration you can use as a template:

    chronoctl dataset scaffold

    In the template, kind: Dataset defines an individual dataset.

  2. With a completed definition, submit it with:

    chronoctl dataset create -f FILE_NAME

    Replace FILE_NAME with the name of the YAML definition file you want to use.

See the Chronoctl dataset example for a completed dataset definition.

Chronoctl dataset example

The following YAML definition consists of one dataset named Traces payment service US prod. This dataset includes any spans that include the payment service, the payment_store operation, and have a tag where deployment.environment=production.

If you want to specify criteria at the trace level rather than the span level, define trace instead of span in your YAML definition.

api_version: v1/config
kind: Dataset
spec:
  # Required name of the dataset. Can be modified after the dataset is created.
  name: Traces payment service US prod
  # Unique identifier of the dataset. If not provided, a slug is generated based
  # on the name field. Can't be modified after the dataset is created.
  slug: traces-payment-service-us-prod
  # Optional description for the dataset.
  description: Traces for payment service in US production environment
  # Defining characteristics of the dataset.
  configuration:
    # Dataset type, which must be TRACES.
    type: TRACES
    trace_dataset:
      # Trace criteria to match for the dataset.
      match_criteria:
      # Object that represents the span conditions to match on. All conditions must
      # be true in a single span for the span to be considered a match.
        span:
        # Determines whether in INCLUDE or EXCLUDE all traces that contain at least
        # one span matching the filter.
          - match_type: INCLUDE
            # The service to match on in candidate spans.
            service:
              # Operator to compare in_values with. Can be one of EXACT, REGEX,
              # EXACT_NEGATION, REGEX_NEGATION, IN, NOT_IN.
              match: IN
              # Values the filter tests against when using IN or NOT_IN match type.
              in_values:
                - payment
            # The operation to match on in candidate spans.
            operation:
              match: REGEX
              # The value the filter compares to the target trace or span field.
              value: /payment_store/.*
            # The tag to match on in candidate spans.
            tags:
            # The key of the span tag to match on in the filter.
              - key: deployment.environment
                value:
                   match: EXACT
                   value: production

Terraform dataset example

The following Terraform resource creates a dataset that Terraform refers to by prod_payment_us, and with a human-readable name of Traces payment service US prod.

This dataset includes any spans that include the payment service, where the parent service matches either us-east or us-west, the parent operation begins with /payment, and a tag where environment includes prod.

If you want to specify criteria at the trace level rather than the span level, define trace instead of span in your YAML definition.

resource "chronosphere_dataset" "prod_payment_us" {
  # Required name of the dataset. Can be modified after the dataset is created.
  name        = "Traces payment service US prod"
  # Optional description for the dataset.
  description = "Traces passing through the payment service in US production"
  # Defining characteristics of the dataset.
  configuration {
    # Dataset type, which must be TRACES.
    type = "TRACES"
 
    trace_dataset {
      # Trace criteria to match for the dataset.
      match_criteria {
        # Object that represents the span conditions to match on. All conditions must
        # be true in a single span for the span to be considered a match.
        span {
          # Matches traces based on the entire duration of the trace.
          duration {
            max_secs = 99
            min_secs = 1
          }
 
          # Matches traces based on the top-level error status.
          error {
            value = true
          }
 
          # Determines whether in INCLUDE or EXCLUDE all traces that contain at least
          # one span matching the filter.
          match_type = "INCLUDE"
 
          # Matches the operation of the candidate span's parent span if it's not a
          # root span.
          parent_operation {
            value = "payments/.*"
            match = "REGEX"
          }
 
          # Matches the service of the candidate span's parent span if it's not a
          # root span.
          parent_service {
            value = "us-[east|west]"
            match = "REGEX"
          }
 
          # The service to match on in candidate spans.
          service {
            match = "IN"
            in_values = ["payment"]
          }
 
          # Defines the number of spans that must match the criteria defined by
          # filter. Defaults to least one span.
          span_count {
            max = 2
            min = 1
          }
 
          # The tag to match on in candidate spans.
          tag {
            key = "environment"
 
            value {
              value = "prod.*"
              match = "REGEX"
            }
          }
 
          tag {
            key = "client_build"
            value {
              match = "NOT_IN"
              in_values = ["debug", "beta"]
            }
          }
        }
      }
    }
  }
}