> ## Documentation Index
> Fetch the complete documentation index at: https://docs.chronosphere.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Observability Platform concepts

Chronosphere Observability Platform includes several distinct components, utilities,
and features that give you insight into your telemetry data. This guide describes
the most common concepts you encounter while using Observability Platform.

## Control Plane

The [*Control Plane*](/control) is a processing layer between ingestion and storage
that shapes, filters, and routes telemetry data before it counts against your
license. Use the Control Plane to control costs and improve the relevance of stored
data.

* **Metrics**: Analyze [traffic](/investigate/analyze/telemetry-analyzer) and
  [usage](/investigate/analyze/usage) to identify opportunities to reduce the
  overall volume of metrics and understand the impact of proposed
  [shaping rules](/control/shaping).
* **Traces**: Use [sampling](/control/shaping/sample-traces),
  [datasets](/control/shaping/sample-traces/datasets), and
  [behaviors](/control/shaping/sample-traces/behaviors) to manage the trace data you
  keep and discard.
* **Logs**: Apply [shaping rules](/control/shaping/shape-logs) to transform,
  reshape, or exclude log data before it counts against your license.

For all telemetry types, create *partitions* to attribute costs and usage to
appropriate owners in your organization. Define and attach
[budgets](/control/consumption/budgeting) to partitions to safeguard against runaway
usage and overspending.

## Observe telemetry

Observability Platform combines metrics, traces, and logs in a single platform so
you can correlate changes to incidents and monitor service health from one location.

[*Chronosphere Lens*](/observe/services) curates and visualizes service-level views
from your ingested metrics, traces, and change events. Each service gets a dedicated
page with rate, errors, duration (RED) metrics, related monitors, alert statuses,
and links to traces and events.

[*Dashboards*](/observe/dashboards) are a visual representation of your telemetry
data that you can customize, filter, and focus on
[query results](/investigate/querying) to gain deeper context of an issue.

[*Change events*](/observe/enable-events) overlay deployment and configuration
changes on many Chronosphere resources such as dashboards, service pages, and traces
to help correlate events with system anomalies during incident investigation.

## Investigate monitors and alerts

[*Monitors*](/investigate/alerts/monitors) define watch criteria that evaluate
telemetry data against thresholds. When the criteria are met, the monitor generates
an [alert](/investigate/alerts), which represents the active state of that
condition. Monitors track conditions like capacity, uptime, and error rates, and
classify them as passing, warning, or critical.

[*Notification policies*](/investigate/alerts/notifications/policies) route alerts
to the appropriate responders.
[Notifiers](/investigate/alerts/notifications/notifiers) define where alerts are
delivered, such as email, Slack, PagerDuty, or a webhook. Use
[signals](/investigate/alerts/notifications/signals) to group alerts and control
how many notifications Observability Platform sends.

## Review collections and services

A [*collection*](/administer/collections) is a group of resources such as
[dashboards](/observe/dashboards) and
[monitors](/investigate/alerts/monitors).

A [*service*](/observe/services) is a type of
[collection](/administer/collections) that represent a logical unit emitting
telemetry data, such as a microservice or endpoint. Observability Platform
[discovers services](/administer/service-discovery) automatically or through
user-defined discovery jobs, and generates a service page with queries, data
visualizations, and related monitors for each one.

## Define service level objectives

[*Service level objectives*](/observe/slo) (SLOs) measure longer-term service
reliability rather than point-in-time threshold breaches. An SLO defines:

* A percentile *objective* representing your reliability goal, such as 99.95%
  uptime.
* An *error budget*, or tolerance for downtime. The error budget is the inverse
  of the objective.
* *Indicator* queries that measure performance against the objective.
* A rolling *time window* over which Observability Platform evaluates performance.

SLOs complement monitors by detecting gradual degradation that fixed-threshold
monitors might miss. When the error budget is depleted, the SLO reports the service
failed its objective. Use SLO burn rate alerts to notify teams before budget
exhaustion, and use differential diagnosis to isolate potential causes.

## Investigate and analyze data

Observability Platform provides tools to reduce mean time to repair (MTTR)
during incidents:

* [*Differential diagnosis*](/investigate/analyze/differential-diagnosis) (DDx)
  identifies the most probable sources of issues by ranking and highlighting
  suspicious trends in your trace and metric data without requiring manual query
  construction.
* [*Metrics Explorer*](/investigate/querying/metrics),
  [*Trace Explorer*](/investigate/querying/traces), and
  [*Log Explorer*](/investigate/querying/query-logs) search and visualize metric,
  trace, and log data to identify patterns and anomalies.
* [*Telemetry Analyzer*](/investigate/analyze/telemetry-analyzer) provides
  visibility into traffic volume and composition to identify cost reduction
  opportunities.
