Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/hercules_input.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ The old format is still supported for backward compatibility but will show a dep
### External Data File Format

The CSV file must contain:
- A `time_utc` column with UTC timestamps in ISO 8601 format
- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description appears to conflict with HerculesModel._read_external_data_file, which explicitly interpolates external data with "instantaneous_to_instantaneous" (and docs/timing.md also documents external data as instantaneous unless preprocessed for ZOH). Update this line to reflect the external-data convention (instantaneous-to-instantaneous interpolation), and reserve the “start-of-period period-average” wording for wind/solar/SCADA/playback inputs.

Suggested change
- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp represents an **instantaneous sample time** for the values on that row. Hercules interpolates external data using instantaneous-to-instantaneous interpolation. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for the distinction between these external-data inputs and start-of-period period-average inputs such as wind/solar/SCADA/playback data.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated

- One or more data columns with external signals. Note that the names of the other columns are arbitrary; any column names will be carried forward and interpolated. However, the values must be floats. Additionally, some controllers and plotting utilities that work on external signals may require specific column names like `lmp_rt`, `lmp_da`, `wind_forecast`, etc.

Example `lmp_data.csv`:
Expand Down
2 changes: 2 additions & 0 deletions docs/output_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

Hercules generates HDF5 output files containing simulation data for analysis and visualization. This page describes the file format, available utilities for reading the data, and how HerculesModel generates these files.

All values in output files represent **instantaneous** quantities at each time step, not period averages. This differs from the convention used by input data files, where timestamps mark the start of a reporting period. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for details on this distinction and the midpoint correction applied during input interpolation.

## File Format

Hercules outputs simulation data in HDF5 (Hierarchical Data Format 5) format.
Expand Down
2 changes: 1 addition & 1 deletion docs/power_playback.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ power_unit_1:

The input file must contain the following columns:

- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings)
- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings). Each timestamp marks the **start of a reporting period**; the power value on that row is treated as the period average. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
- `power`: Power output in kW

Supported file formats: `.csv`, `.p`, `.pkl` (pickle), `.f`, `.ftr` (feather).
Expand Down
2 changes: 1 addition & 1 deletion docs/solar_pv.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Presently only one solar simulator is available

Both models require an input weather file:
1. A CSV file that specifies the weather conditions (e.g. NonAnnualSimulation-sample_data-interpolated-daytime.csv). This file should include:
- timestamp (see [timing](timing.md) for time format requirements)
- timestamp (see [timing](timing.md) for time format requirements). Each `time_utc` timestamp marks the **start of a reporting period**; irradiance and weather values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
- direct normal irradiance (DNI)
- diffuse horizontal irradiance (DHI)
- global horizontal irradiance (GHI)
Expand Down
91 changes: 90 additions & 1 deletion docs/timing.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,89 @@ Timing in Hercules is specified using two complementary representations:
- `time` (float): Simulation time in seconds, where `time=0` corresponds to `starttime_utc`
- `time_utc` (datetime): Absolute UTC timestamp

## Time Interpretation: Inputs vs. Internal Values

### Input files: start-of-period convention

In external data sources such as weather files, SCADA records, and resource
databases, each `time_utc` timestamp marks the **beginning** of a reporting
period and the associated values (irradiance, wind speed, power, etc.)
represent an average or aggregate over that period. For example, an hourly
weather file with a row at `2020-06-15T12:00:00Z` and GHI = 735 W/m² means
that 735 W/m² is the average GHI from 12:00 to 13:00.

### Hercules internal values: instantaneous convention

Inside the simulation, values at a given time step represent **instantaneous**
quantities at that moment. All Hercules output values follow this same
instantaneous convention.

### Interpolation methods

The `interpolate_df` function in `utilities.py` accepts a mandatory
`interpolation_method` parameter that controls how numeric columns are
resampled onto the simulation time grid. Three methods are available:

#### `"averaged_to_instantaneous"` (wind, solar, and similar)
Comment thread
paulf81 marked this conversation as resolved.
Outdated

Input values are period averages whose timestamps mark the **start** of each
period. The best single-point estimate of a period-averaged value is at the
**midpoint** of its interval, not the start. For example, the hourly average
from 12:00-13:00 is most representative of conditions at 12:30.
Comment thread
paulf81 marked this conversation as resolved.
Outdated

1. Each numeric value is assigned to the midpoint of its input interval
(using `_compute_interval_midpoints`).
2. Linear interpolation is then performed between these midpoints to produce
values at the simulation time steps.

```
Input file (start-of-period):

time_utc value
12:00 100 ← average over [12:00, 13:00)
13:00 200 ← average over [13:00, 14:00)

After midpoint correction:

time value
12:30 100 ← midpoint of [12:00, 13:00)
13:30 200 ← midpoint of [13:00, 14:00)

Querying at 13:00 yields 150 (halfway between midpoints).
```

#### `"zoh_to_instantaneous"` (LMP, external signals)

Input values are piecewise-constant (zero-order hold) with timestamps at the
start of each interval. Each query time receives the value of the last
original timestamp at or before it -- the value is held constant until the
next timestamp. This is appropriate for signals like locational marginal
prices (LMP) that change in discrete steps.

```
Input file:

time_utc value
12:00 100 ← held constant over [12:00, 13:00)
13:00 200 ← held constant over [13:00, 14:00)

Querying at 12:30 yields 100.
Querying at 13:00 yields 200.
```

#### `"instantaneous_to_instantaneous"`

Input values already represent instantaneous measurements at their
timestamps. Standard linear interpolation is performed directly on the
original timestamps with no midpoint shift.

---

In all three methods, datetime columns (e.g. `time_utc`) are linearly
interpolated on the raw timestamps without any shift, because they are
instantaneous coordinate mappings between simulation time and wall-clock
time, not period-averaged measurements.

## Input Requirements

All Hercules input files must specify start and end times using UTC datetime strings:
Expand Down Expand Up @@ -113,7 +196,11 @@ For the example above, `endtime` would be 3600.0 seconds.

### Wind and Solar Input Data

Both wind and solar input CSV/Feather/Parquet files must contain a `time_utc` column with UTC timestamps:
Both wind and solar input CSV/Feather/Parquet files must contain a `time_utc` column with UTC timestamps. Each `time_utc` value marks the **start of a reporting period**; the data values on that row are treated as period averages. These are interpolated with `"averaged_to_instantaneous"`. See [Interpolation methods](#interpolation-methods) above for details.

### External Data (LMP, etc.)

External data files loaded via `_read_external_data_file` are interpolated with `"zoh_to_instantaneous"` (zero-order hold), which is appropriate for signals like LMP prices that are piecewise-constant over each interval rather than time-averaged.

```text
time_utc,wd_mean,ws_000,ws_001,ws_002
Expand Down Expand Up @@ -145,6 +232,8 @@ Key Points:

## Output Files

All values in Hercules output files represent **instantaneous** quantities at each time step, not period averages. See [Time Interpretation](#time-interpretation-inputs-vs-internal-values) for the distinction from input files.

Hercules output HDF5 files store:

- `time` array: Simulation time points (seconds from t=0)
Expand Down
2 changes: 1 addition & 1 deletion docs/wind.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Required parameters for WindFarmSCADAPower:
**SCADA File Format:**

The SCADA file must contain the following columns:
- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings)
- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings). Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
- `wd_mean`: Mean wind direction in degrees
- `pow_###`: Power output for each turbine (e.g., `pow_000`, `pow_001`, `pow_002`)

Expand Down
17 changes: 12 additions & 5 deletions hercules/hercules_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,10 +172,15 @@ def _read_external_data_file(self, filename):
"""
Read and interpolate external data from a CSV, feather, or pickle file.

This method reads external data from the specified file (CSV, feather, or pickle)
and interpolates it according to the simulation time steps. The external data must
include a 'time_utc' column which will be converted to simulation time.
The interpolated data is stored in self.external_signals_all.
This method reads external data from the specified file (CSV, feather, or
pickle) and interpolates it onto the simulation time grid using zero-order
hold (``"zoh_to_instantaneous"``). ZOH is appropriate because external
signals such as LMP prices are piecewise-constant over each reporting
interval, unlike time-averaged weather data used by wind/solar components.

The external data must include a ``time_utc`` column which will be
converted to simulation time. The interpolated data is stored in
``self.external_signals_all``.

Args:
filename (str): Path to the file containing external data. Supported formats:
Expand Down Expand Up @@ -216,7 +221,9 @@ def _read_external_data_file(self, filename):
)

# Interpolate using the utility function
df_interpolated = interpolate_df(df_ext, new_times)
df_interpolated = interpolate_df(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should hard code this. It makes sense that the LMP prices should be this type of interpolation, but this is also how we input power reference signals. Should those also use the zoh_to_instantaneous method?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is tricky. I think to date in Hercules we haven't explicitly tracked LMP prices, these are just external signals that end up having signal names that Hycon expects.

One idea is we can add to Hercules input a dictionary that says how different external channels should be upsampled. With a default to this for backwards compatibilty, or not, force explicit and have one more breaking change? @misi9170 any thoughts here?

df_ext, new_times, interpolation_method="zoh_to_instantaneous"
)

# Convert interpolated DataFrame to dictionary format
for col in df_interpolated.columns:
Expand Down
4 changes: 3 additions & 1 deletion hercules/plant_components/power_playback.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,9 @@ def __init__(self, h_dict, component_name):

# Interpolate df_scada on to the time steps
time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
df_scada = interpolate_df(df_scada, time_steps_all)
df_scada = interpolate_df(
df_scada, time_steps_all, interpolation_method="averaged_to_instantaneous"
)

# Confirm that there is a column called "power"
if "power" not in df_scada.columns:
Expand Down
4 changes: 3 additions & 1 deletion hercules/plant_components/solar_pysam_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,9 @@ def _load_solar_data(self, h_dict):

# Interpolate df_solar on to the time steps
time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
df_solar = interpolate_df(df_solar, time_steps_all)
df_solar = interpolate_df(
df_solar, time_steps_all, interpolation_method="averaged_to_instantaneous"
)

# Can now save the input data as simple columns
self.year_array = df_solar["time_utc"].dt.year.values
Expand Down
4 changes: 3 additions & 1 deletion hercules/plant_components/wind_farm.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,9 @@ def __init__(self, h_dict, component_name):

# Interpolate df_wi on to the time steps
time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
df_wi = interpolate_df(df_wi, time_steps_all)
df_wi = interpolate_df(
df_wi, time_steps_all, interpolation_method="averaged_to_instantaneous"
)

# INITIALIZE FLORIS BASED ON WAKE MODEL
if self.wake_method == "precomputed":
Expand Down
4 changes: 3 additions & 1 deletion hercules/plant_components/wind_farm_scada_power.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,9 @@ def __init__(self, h_dict, component_name):

# Interpolate df_scada on to the time steps
time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
df_scada = interpolate_df(df_scada, time_steps_all)
df_scada = interpolate_df(
df_scada, time_steps_all, interpolation_method="averaged_to_instantaneous"
)

# Get a list of power columns and infer number of turbines
self.power_columns = sorted([col for col in df_scada.columns if col.startswith("pow_")])
Expand Down
120 changes: 77 additions & 43 deletions hercules/utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,20 +448,52 @@ def close_logging(logger):
logger.removeHandler(handler)


def interpolate_df(df, new_time):
_VALID_INTERPOLATION_METHODS = {
"averaged_to_instantaneous",
"zoh_to_instantaneous",
"instantaneous_to_instantaneous",
}


def interpolate_df(df, new_time, interpolation_method):
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interpolate_df previously accepted two parameters and now requires interpolation_method with no default, which is a breaking API change for any downstream/internal callers not updated in this PR. If backward compatibility is needed, consider providing a default (and optionally emitting a deprecation warning when omitted) so older call sites keep working while migrating to explicit behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not truly backward compatible, but this function was not used outside of Hercules

"""Interpolate DataFrame values to match new time axis.

Uses linear interpolation with Polars backend for better performance and memory efficiency.
Converts datetime columns to timestamps for interpolation.
The ``interpolation_method`` parameter controls how numeric columns are
resampled onto ``new_time``:

- ``"averaged_to_instantaneous"``: Input values are period averages whose
timestamps mark the **start** of each period. Each value is assigned to
the midpoint of its interval and then linearly interpolated. Use for
wind speed, solar irradiance, and similar time-averaged signals.
- ``"zoh_to_instantaneous"``: Input values are piecewise-constant
(zero-order hold) with timestamps at the start of each interval. Each
query time receives the value of the last original timestamp at or
before it. Use for LMP prices and other step-change signals.
- ``"instantaneous_to_instantaneous"``: Input values already represent
instantaneous measurements. Standard linear interpolation is performed
directly on the original timestamps with no midpoint shift.

Datetime columns (e.g. ``time_utc``) are always linearly interpolated on
the raw timestamps regardless of the chosen method, because they map
simulation time to wall-clock time directly.

Args:
df (pd.DataFrame): DataFrame with 'time' column and data columns.
new_time (array-like): New time points for interpolation.
interpolation_method (str): One of ``"averaged_to_instantaneous"``,
``"zoh_to_instantaneous"``, or
``"instantaneous_to_instantaneous"``.

Returns:
pd.DataFrame: DataFrame with new time axis and interpolated data columns.

"""
# Convert new_time to numpy array for consistency
if interpolation_method not in _VALID_INTERPOLATION_METHODS:
raise ValueError(
f"Unknown interpolation_method '{interpolation_method}'. "
f"Must be one of {sorted(_VALID_INTERPOLATION_METHODS)}."
)

new_time = np.asarray(new_time)

# Separate datetime and non-datetime columns for different processing
Expand All @@ -475,50 +507,27 @@ def interpolate_df(df, new_time):
else:
numeric_cols.append(col)

return _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols)


def _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols):
"""Interpolate using Polars backend.

Args:
df (pd.DataFrame): Input DataFrame.
new_time (np.ndarray): New time points.
datetime_cols (list): Datetime column names.
numeric_cols (list): Numeric column names.

Returns:
pd.DataFrame: Interpolated DataFrame.
"""
# Convert to Polars for efficient processing
df_pl = pl.from_pandas(df)
result_pl = pl.DataFrame({"time": new_time})

# Create a Polars DataFrame for the new time points
new_time_pl = pl.DataFrame({"time": new_time})

# Start with the time column
result_pl = new_time_pl
time_values = df_pl["time"].to_numpy()

# Process numeric columns using Polars' interpolation
if numeric_cols:
for col in numeric_cols:
# Use Polars' join_asof for efficient interpolation-like behavior
# This is more memory efficient than pandas for large datasets
col_data = df_pl.select(["time", col]).sort("time")

# Perform interpolation using Polars operations
# Note: Polars doesn't have direct linear interpolation, so we use numpy interp
# but with Polars' efficient data extraction
time_values = col_data["time"].to_numpy()
col_values = col_data[col].to_numpy()

# Linear interpolation with float32 precision
interpolated_values = np.interp(new_time, time_values, col_values).astype(
if interpolation_method == "averaged_to_instantaneous":
x_coords = _compute_interval_midpoints(time_values)
else:
x_coords = time_values

for col in numeric_cols:
col_values = df_pl[col].to_numpy()
if interpolation_method == "zoh_to_instantaneous":
indices = np.searchsorted(time_values, new_time, side="right") - 1
indices = np.clip(indices, 0, len(col_values) - 1)
interpolated_values = col_values[indices].astype(hercules_float_type)
else:
interpolated_values = np.interp(new_time, x_coords, col_values).astype(
hercules_float_type
)

# Add interpolated column to result
result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col))
result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col))

# Process datetime columns
for col in datetime_cols:
Expand All @@ -540,6 +549,31 @@ def _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols):
return result_pl.to_pandas()


def _compute_interval_midpoints(time_values):
"""Compute the midpoints of consecutive time intervals.

For start-of-period timestamps, each value is best represented at the
centre of its interval. The last interval width is assumed equal to the
Comment thread
paulf81 marked this conversation as resolved.
Outdated
preceding one.

Args:
time_values (np.ndarray): Sorted array of start-of-period timestamps.

Returns:
np.ndarray: Array of interval midpoints, same length as *time_values*.
"""
# Allow the edge case of a single time value by returning the time value itself
if len(time_values) < 2:
return time_values
# Compute midpoints
midpoints = np.empty_like(time_values, dtype=np.float64)
midpoints[:-1] = (time_values[:-1] + time_values[1:]) / 2.0
midpoints[-1] = (
time_values[-1] + (time_values[-1] - time_values[-2]) / 2.0
) # Last interval is equal to the previous one
return midpoints
Comment on lines +566 to +571
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_compute_interval_midpoints assumes at least 2 time points; with a single-row input (or after upstream filtering) this will raise an IndexError at time_values[-2]. Add an explicit guard for len(time_values) < 2 (e.g., return time_values.copy() or time_values + 0.0) so interpolate_df can handle degenerate/constant inputs gracefully.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an unlikely edge case, handling by simply returning that scalar value as its own midpoint



def find_time_utc_value(df, time_value, time_column="time", time_utc_column="time_utc"):
"""Return UTC timestamp at a given time value via linear interpolation or extrapolation.

Expand Down
4 changes: 2 additions & 2 deletions tests/example_regression_tests/example_00_regression_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@

# Test configuration
NUM_TIME_STEPS = 5
EXPECTED_FINAL_WIND_POWER = 3271 # Updated after wind model changes
EXPECTED_FINAL_PLANT_POWER = 3271 # Same as wind power for wind-only case
EXPECTED_FINAL_WIND_POWER = 3265 # Updated for midpoint interpolation correction
EXPECTED_FINAL_PLANT_POWER = 3265 # Same as wind power for wind-only case

# File names
INPUT_FILE = "hercules_input.yaml"
Expand Down
Loading
Loading