Concepts

This document explains the core concepts, data models, and system behavior of OpenDT.

Overview

OpenDT in its current state operates in Shadow Mode: it connects to a datacenter (real or mocked) and replays historical workload data through the OpenDC simulator. The system continuously compares predicted power consumption against actual measurements.

Key capabilities:

Power consumption prediction based on workload patterns
What-If analysis (e.g., "What if we upgrade CPU architecture?")
Real-time topology calibration
Carbon emission estimation

Data Flow

Workload Data → dc-mock → Kafka → simulator → OpenDC → Results
----
Results → api → Grafana

dc-mock reads historical workload and power data from Parquet files
Messages are published to Kafka topics
simulator consumes workload messages, aggregates them into time windows, and invokes OpenDC
api queries results and serves them to Grafana

Workload Data

Tasks

A Task represents a job submitted to the datacenter. Each task requests compute resources for a specific duration.

Field	Type	Description
id	int	Unique identifier
submission_time	datetime	When the task was submitted
duration	int	Total duration in milliseconds
cpu_count	int	Number of CPU cores requested
cpu_capacity	float	CPU speed in MHz
mem_capacity	int	Memory capacity in MB
fragments	list	Execution profile segments

Physical interpretation: A task represents a request for compute cycles:

Total Cycles = cpu_count × cpu_capacity × duration

Fragments

A Fragment describes resource usage during a segment of task execution. Tasks can have varying resource usage over time (e.g., high CPU at start, low CPU during I/O).

Field	Type	Description
id	int	Fragment identifier
task_id	int	Parent task ID
duration	int	Segment duration in milliseconds
cpu_count	int	CPUs used in this segment
cpu_usage	float	CPU utilization value

Consumption

A Consumption record represents actual power telemetry from the datacenter.

Field	Type	Description
timestamp	datetime	Measurement time
power_draw	float	Instantaneous power in Watts
energy_usage	float	Accumulated energy in Joules

Topology

The Topology defines the datacenter hardware that the simulator uses to calculate power. It is hierarchical: Clusters contain Hosts, which have CPUs, Memory, and a Power Model.

Structure

Topology
└── Cluster (e.g., "C01")
    └── Host
        ├── count: 277 (number of identical hosts)
        ├── CPU
        │   ├── coreCount: 16
        │   └── coreSpeed: 2100 MHz
        ├── Memory
        │   └── memorySize: 128 GB
        └── CPUPowerModel
            ├── modelType: "mse"
            ├── idlePower: 25 W
            ├── maxPower: 174 W
            └── calibrationFactor: 10.0

Power Models

The CPUPowerModel defines how CPU utilization translates to power consumption.

Model Type	Description
mse	Mean Squared Error based model (default)
asymptotic	Non-linear curve with asymptotic behavior
linear	Linear interpolation between idle and max power

Key parameters:

idlePower: Power draw at 0% utilization (Watts)
maxPower: Power draw at 100% utilization (Watts)
calibrationFactor: Scaling factor for the mse model

Time Windows

The simulator aggregates tasks into time windows for batch simulation.

Window Behavior

Tasks are assigned to windows based on their submission timestamp
Windows close when a heartbeat message indicates time has progressed past the window end
When a window closes, all accumulated tasks are simulated

Cumulative Simulation

OpenDT uses cumulative simulation: each window simulates all tasks from the beginning of the workload, not just tasks in that window. This ensures accurate long-running predictions.

Heartbeats

Heartbeat messages are synthetic timestamps published by dc-mock to signal time progression. They enable deterministic window closing even when no tasks arrive.

Calibration

When enabled, the calibrator service optimizes topology parameters by comparing simulation output against actual power measurements.

Process

Calibrator runs parallel simulations with different parameter values
Each simulation result is compared against actual power (MAPE calculation)
The parameter value with lowest error is selected
Updated topology is published to Kafka
Simulator uses the calibrated topology for future windows

Kafka Topics

OpenDT uses Kafka for inter-service communication.

Topic	Purpose
dc.workload	Task submissions and heartbeats
dc.power	Actual power consumption telemetry
dc.topology	Real datacenter topology
sim.topology	Simulated/calibrated topology
sim.results	Simulation predictions

Output Files

Aggregated Results

simulator/agg_results.parquet contains the combined simulation output:

Column	Description
timestamp	Simulation timestamp
power_draw	Predicted power in Watts
carbon_intensity	Grid carbon intensity (gCO2/kWh)

OpenDC Archives

Each simulation run is archived in simulator/opendc/run_<N>/:

run_1/
├── input/
│   ├── experiment.json    # OpenDC experiment config
│   ├── topology.json      # Topology used
│   ├── tasks.parquet      # Tasks simulated
│   └── fragments.parquet  # Task fragments
├── output/
│   ├── powerSource.parquet  # Power timeseries
│   ├── host.parquet         # Host-level metrics
│   └── service.parquet      # Service-level metrics
└── metadata.json          # Run metadata

Pydantic Models

All data models are defined using Pydantic v2 in libs/common/odt_common/models/:

Model	File	Description
Task	task.py	Workload task
Fragment	fragment.py	Task execution segment
Consumption	consumption.py	Power measurement
Topology	topology.py	Datacenter topology
WorkloadMessage	workload_message.py	Kafka message wrapper

Models provide:

Runtime type validation
JSON serialization/deserialization
Automatic API documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concepts

Overview

Data Flow

Workload Data

Tasks

Fragments

Consumption

Topology

Structure

Power Models

Time Windows

Window Behavior

Cumulative Simulation

Heartbeats

Calibration

Process

Kafka Topics

Output Files

Aggregated Results

OpenDC Archives

Pydantic Models

FilesExpand file tree

CONCEPTS.md

Latest commit

History

CONCEPTS.md

File metadata and controls

Concepts

Overview

Data Flow

Workload Data

Tasks

Fragments

Consumption

Topology

Structure

Power Models

Time Windows

Window Behavior

Cumulative Simulation

Heartbeats

Calibration

Process

Kafka Topics

Output Files

Aggregated Results

OpenDC Archives

Pydantic Models