NatLabRockies · genevievestarke · Apr 22, 2026 · Apr 8, 2026 · Apr 8, 2026 · Apr 8, 2026
diff --git a/docs/hercules_input.md b/docs/hercules_input.md
@@ -131,7 +131,7 @@ The old format is still supported for backward compatibility but will show a dep
 ### External Data File Format
 
 The CSV file must contain:
-- A `time_utc` column with UTC timestamps in ISO 8601 format
+- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
+- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp represents an **instantaneous sample time** for the values on that row. Hercules interpolates external data using instantaneous-to-instantaneous interpolation. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for the distinction between these external-data inputs and start-of-period period-average inputs such as wind/solar/SCADA/playback data.
- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
+- A `time_utc` column with UTC timestamps in ISO 8601 format. Each timestamp represents an **instantaneous sample time** for the values on that row. Hercules interpolates external data using instantaneous-to-instantaneous interpolation. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for the distinction between these external-data inputs and start-of-period period-average inputs such as wind/solar/SCADA/playback data.
 - One or more data columns with external signals. Note that the names of the other columns are arbitrary; any column names will be carried forward and interpolated. However, the values must be floats. Additionally, some controllers and plotting utilities that work on external signals may require specific column names like `lmp_rt`, `lmp_da`, `wind_forecast`, etc.
 
 Example `lmp_data.csv`:

diff --git a/docs/output_files.md b/docs/output_files.md
@@ -2,6 +2,8 @@
 
 Hercules generates HDF5 output files containing simulation data for analysis and visualization. This page describes the file format, available utilities for reading the data, and how HerculesModel generates these files.
 
+All values in output files represent **instantaneous** quantities at each time step, not period averages. This differs from the convention used by input data files, where timestamps mark the start of a reporting period. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for details on this distinction and the midpoint correction applied during input interpolation.
+
 ## File Format
 
 Hercules outputs simulation data in HDF5 (Hierarchical Data Format 5) format.

diff --git a/docs/power_playback.md b/docs/power_playback.md
@@ -32,7 +32,7 @@ power_unit_1:
 
 The input file must contain the following columns:
 
-- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings)
+- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings). Each timestamp marks the **start of a reporting period**; the power value on that row is treated as the period average. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
 - `power`: Power output in kW
 
 Supported file formats: `.csv`, `.p`, `.pkl` (pickle), `.f`, `.ftr` (feather).

diff --git a/docs/solar_pv.md b/docs/solar_pv.md
@@ -12,7 +12,7 @@ Presently only one solar simulator is available
 
 Both models require an input weather file:
 1. A CSV file that specifies the weather conditions (e.g. NonAnnualSimulation-sample_data-interpolated-daytime.csv). This file should include:
-    - timestamp (see [timing](timing.md) for time format requirements)
+    - timestamp (see [timing](timing.md) for time format requirements). Each `time_utc` timestamp marks the **start of a reporting period**; irradiance and weather values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
     - direct normal irradiance (DNI)
     - diffuse horizontal irradiance (DHI)
     - global horizontal irradiance (GHI)

diff --git a/docs/timing.md b/docs/timing.md
@@ -9,6 +9,89 @@ Timing in Hercules is specified using two complementary representations:
 - `time` (float): Simulation time in seconds, where `time=0` corresponds to `starttime_utc`
 - `time_utc` (datetime): Absolute UTC timestamp
 
+## Time Interpretation: Inputs vs. Internal Values
+
+### Input files: start-of-period convention
+
+In external data sources such as weather files, SCADA records, and resource
+databases, each `time_utc` timestamp marks the **beginning** of a reporting
+period and the associated values (irradiance, wind speed, power, etc.)
+represent an average or aggregate over that period.  For example, an hourly
+weather file with a row at `2020-06-15T12:00:00Z` and GHI = 735 W/m² means
+that 735 W/m² is the average GHI from 12:00 to 13:00.
+
+### Hercules internal values: instantaneous convention
+
+Inside the simulation, values at a given time step represent **instantaneous**
+quantities at that moment.  All Hercules output values follow this same
+instantaneous convention.
+
+### Interpolation methods
+
+The `interpolate_df` function in `utilities.py` accepts a mandatory
+`interpolation_method` parameter that controls how numeric columns are
+resampled onto the simulation time grid.  Three methods are available:
+
+#### `"averaged_to_instantaneous"` (wind, solar, and similar)
+
+Input values are period averages whose timestamps mark the **start** of each
+period.  The best single-point estimate of a period-averaged value is at the
+**midpoint** of its interval, not the start.  For example, the hourly average
+from 12:00-13:00 is most representative of conditions at 12:30.
+
+1. Each numeric value is assigned to the midpoint of its input interval
+   (using `_compute_interval_midpoints`).
+2. Linear interpolation is then performed between these midpoints to produce
+   values at the simulation time steps.
+
+```
+Input file (start-of-period):
+
+time_utc             value
+12:00                100        ← average over [12:00, 13:00)
+13:00                200        ← average over [13:00, 14:00)
+
+After midpoint correction:
+
+time                 value
+12:30                100        ← midpoint of [12:00, 13:00)
+13:30                200        ← midpoint of [13:00, 14:00)
+
+Querying at 13:00 yields 150 (halfway between midpoints).
+```
+
+#### `"zoh_to_instantaneous"` (LMP, external signals)
+
+Input values are piecewise-constant (zero-order hold) with timestamps at the
+start of each interval.  Each query time receives the value of the last
+original timestamp at or before it -- the value is held constant until the
+next timestamp.  This is appropriate for signals like locational marginal
+prices (LMP) that change in discrete steps.
+
+```
+Input file:
+
+time_utc             value
+12:00                100        ← held constant over [12:00, 13:00)
+13:00                200        ← held constant over [13:00, 14:00)
+
+Querying at 12:30 yields 100.
+Querying at 13:00 yields 200.
+```
+
+#### `"instantaneous_to_instantaneous"`
+
+Input values already represent instantaneous measurements at their
+timestamps.  Standard linear interpolation is performed directly on the
+original timestamps with no midpoint shift.
+
+---
+
+In all three methods, datetime columns (e.g. `time_utc`) are linearly
+interpolated on the raw timestamps without any shift, because they are
+instantaneous coordinate mappings between simulation time and wall-clock
+time, not period-averaged measurements.
+
 ## Input Requirements
 
 All Hercules input files must specify start and end times using UTC datetime strings:
@@ -113,7 +196,11 @@ For the example above, `endtime` would be 3600.0 seconds.
 
 ### Wind and Solar Input Data
 
-Both wind and solar input CSV/Feather/Parquet files must contain a `time_utc` column with UTC timestamps:
+Both wind and solar input CSV/Feather/Parquet files must contain a `time_utc` column with UTC timestamps.  Each `time_utc` value marks the **start of a reporting period**; the data values on that row are treated as period averages.  These are interpolated with `"averaged_to_instantaneous"`.  See [Interpolation methods](#interpolation-methods) above for details.
+
+### External Data (LMP, etc.)
+
+External data files loaded via `_read_external_data_file` are interpolated with `"zoh_to_instantaneous"` (zero-order hold), which is appropriate for signals like LMP prices that are piecewise-constant over each interval rather than time-averaged.
 
 ```text
 time_utc,wd_mean,ws_000,ws_001,ws_002
@@ -145,6 +232,8 @@ Key Points:
 
 ## Output Files
 
+All values in Hercules output files represent **instantaneous** quantities at each time step, not period averages.  See [Time Interpretation](#time-interpretation-inputs-vs-internal-values) for the distinction from input files.
+
 Hercules output HDF5 files store:
 
 - `time` array: Simulation time points (seconds from t=0)

diff --git a/docs/wind.md b/docs/wind.md
@@ -54,7 +54,7 @@ Required parameters for WindFarmSCADAPower:
 **SCADA File Format:**
 
 The SCADA file must contain the following columns:
-- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings)
+- `time_utc`: Timestamps in UTC (ISO 8601 format or parseable datetime strings). Each timestamp marks the **start of a reporting period**; values on that row are treated as period averages. See [Time Interpretation](timing.md#time-interpretation-inputs-vs-internal-values) for how Hercules converts these to instantaneous values.
 - `wd_mean`: Mean wind direction in degrees
 - `pow_###`: Power output for each turbine (e.g., `pow_000`, `pow_001`, `pow_002`)
 

diff --git a/hercules/hercules_model.py b/hercules/hercules_model.py
@@ -172,10 +172,15 @@ def _read_external_data_file(self, filename):
         """
         Read and interpolate external data from a CSV, feather, or pickle file.
 
-        This method reads external data from the specified file (CSV, feather, or pickle)
-        and interpolates it according to the simulation time steps. The external data must
-        include a 'time_utc' column which will be converted to simulation time.
-        The interpolated data is stored in self.external_signals_all.
+        This method reads external data from the specified file (CSV, feather, or
+        pickle) and interpolates it onto the simulation time grid using zero-order
+        hold (``"zoh_to_instantaneous"``).  ZOH is appropriate because external
+        signals such as LMP prices are piecewise-constant over each reporting
+        interval, unlike time-averaged weather data used by wind/solar components.
+
+        The external data must include a ``time_utc`` column which will be
+        converted to simulation time.  The interpolated data is stored in
+        ``self.external_signals_all``.
 
         Args:
             filename (str): Path to the file containing external data. Supported formats:
@@ -216,7 +221,9 @@ def _read_external_data_file(self, filename):
         )
 
         # Interpolate using the utility function
-        df_interpolated = interpolate_df(df_ext, new_times)
+        df_interpolated = interpolate_df(
+            df_ext, new_times, interpolation_method="zoh_to_instantaneous"
+        )
 
         # Convert interpolated DataFrame to dictionary format
         for col in df_interpolated.columns:

diff --git a/hercules/plant_components/power_playback.py b/hercules/plant_components/power_playback.py
@@ -122,7 +122,9 @@ def __init__(self, h_dict, component_name):
 
         # Interpolate df_scada on to the time steps
         time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
-        df_scada = interpolate_df(df_scada, time_steps_all)
+        df_scada = interpolate_df(
+            df_scada, time_steps_all, interpolation_method="averaged_to_instantaneous"
+        )
 
         # Confirm that there is a column called "power"
         if "power" not in df_scada.columns:

diff --git a/hercules/plant_components/solar_pysam_base.py b/hercules/plant_components/solar_pysam_base.py
@@ -126,7 +126,9 @@ def _load_solar_data(self, h_dict):
 
         # Interpolate df_solar on to the time steps
         time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
-        df_solar = interpolate_df(df_solar, time_steps_all)
+        df_solar = interpolate_df(
+            df_solar, time_steps_all, interpolation_method="averaged_to_instantaneous"
+        )
 
         # Can now save the input data as simple columns
         self.year_array = df_solar["time_utc"].dt.year.values

diff --git a/hercules/plant_components/wind_farm.py b/hercules/plant_components/wind_farm.py
@@ -188,7 +188,9 @@ def __init__(self, h_dict, component_name):
 
         # Interpolate df_wi on to the time steps
         time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
-        df_wi = interpolate_df(df_wi, time_steps_all)
+        df_wi = interpolate_df(
+            df_wi, time_steps_all, interpolation_method="averaged_to_instantaneous"
+        )
 
         # INITIALIZE FLORIS BASED ON WAKE MODEL
         if self.wake_method == "precomputed":

diff --git a/hercules/plant_components/wind_farm_scada_power.py b/hercules/plant_components/wind_farm_scada_power.py
@@ -128,7 +128,9 @@ def __init__(self, h_dict, component_name):
 
         # Interpolate df_scada on to the time steps
         time_steps_all = np.arange(self.starttime, self.endtime, self.dt, dtype=hercules_float_type)
-        df_scada = interpolate_df(df_scada, time_steps_all)
+        df_scada = interpolate_df(
+            df_scada, time_steps_all, interpolation_method="averaged_to_instantaneous"
+        )
 
         # Get a list of power columns and infer number of turbines
         self.power_columns = sorted([col for col in df_scada.columns if col.startswith("pow_")])

diff --git a/hercules/utilities.py b/hercules/utilities.py
@@ -448,20 +448,52 @@ def close_logging(logger):
             logger.removeHandler(handler)
 
 
-def interpolate_df(df, new_time):
+_VALID_INTERPOLATION_METHODS = {
+    "averaged_to_instantaneous",
+    "zoh_to_instantaneous",
+    "instantaneous_to_instantaneous",
+}
+
+
+def interpolate_df(df, new_time, interpolation_method):
     """Interpolate DataFrame values to match new time axis.
 
-    Uses linear interpolation with Polars backend for better performance and memory efficiency.
-    Converts datetime columns to timestamps for interpolation.
+    The ``interpolation_method`` parameter controls how numeric columns are
+    resampled onto ``new_time``:
+
+    - ``"averaged_to_instantaneous"``: Input values are period averages whose
+      timestamps mark the **start** of each period.  Each value is assigned to
+      the midpoint of its interval and then linearly interpolated.  Use for
+      wind speed, solar irradiance, and similar time-averaged signals.
+    - ``"zoh_to_instantaneous"``: Input values are piecewise-constant
+      (zero-order hold) with timestamps at the start of each interval.  Each
+      query time receives the value of the last original timestamp at or
+      before it.  Use for LMP prices and other step-change signals.
+    - ``"instantaneous_to_instantaneous"``: Input values already represent
+      instantaneous measurements.  Standard linear interpolation is performed
+      directly on the original timestamps with no midpoint shift.
+
+    Datetime columns (e.g. ``time_utc``) are always linearly interpolated on
+    the raw timestamps regardless of the chosen method, because they map
+    simulation time to wall-clock time directly.
 
     Args:
         df (pd.DataFrame): DataFrame with 'time' column and data columns.
         new_time (array-like): New time points for interpolation.
+        interpolation_method (str): One of ``"averaged_to_instantaneous"``,
+            ``"zoh_to_instantaneous"``, or
+            ``"instantaneous_to_instantaneous"``.
 
     Returns:
         pd.DataFrame: DataFrame with new time axis and interpolated data columns.
+
     """
-    # Convert new_time to numpy array for consistency
+    if interpolation_method not in _VALID_INTERPOLATION_METHODS:
+        raise ValueError(
+            f"Unknown interpolation_method '{interpolation_method}'. "
+            f"Must be one of {sorted(_VALID_INTERPOLATION_METHODS)}."
+        )
+
     new_time = np.asarray(new_time)
 
     # Separate datetime and non-datetime columns for different processing
@@ -475,50 +507,27 @@ def interpolate_df(df, new_time):
             else:
                 numeric_cols.append(col)
 
-    return _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols)
-
-
-def _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols):
-    """Interpolate using Polars backend.
-
-    Args:
-        df (pd.DataFrame): Input DataFrame.
-        new_time (np.ndarray): New time points.
-        datetime_cols (list): Datetime column names.
-        numeric_cols (list): Numeric column names.
-
-    Returns:
-        pd.DataFrame: Interpolated DataFrame.
-    """
-    # Convert to Polars for efficient processing
     df_pl = pl.from_pandas(df)
+    result_pl = pl.DataFrame({"time": new_time})
 
-    # Create a Polars DataFrame for the new time points
-    new_time_pl = pl.DataFrame({"time": new_time})
-
-    # Start with the time column
-    result_pl = new_time_pl
+    time_values = df_pl["time"].to_numpy()
 
-    # Process numeric columns using Polars' interpolation
-    if numeric_cols:
-        for col in numeric_cols:
-            # Use Polars' join_asof for efficient interpolation-like behavior
-            # This is more memory efficient than pandas for large datasets
-            col_data = df_pl.select(["time", col]).sort("time")
-
-            # Perform interpolation using Polars operations
-            # Note: Polars doesn't have direct linear interpolation, so we use numpy interp
-            # but with Polars' efficient data extraction
-            time_values = col_data["time"].to_numpy()
-            col_values = col_data[col].to_numpy()
-
-            # Linear interpolation with float32 precision
-            interpolated_values = np.interp(new_time, time_values, col_values).astype(
+    if interpolation_method == "averaged_to_instantaneous":
+        x_coords = _compute_interval_midpoints(time_values)
+    else:
+        x_coords = time_values
+
+    for col in numeric_cols:
+        col_values = df_pl[col].to_numpy()
+        if interpolation_method == "zoh_to_instantaneous":
+            indices = np.searchsorted(time_values, new_time, side="right") - 1
+            indices = np.clip(indices, 0, len(col_values) - 1)
+            interpolated_values = col_values[indices].astype(hercules_float_type)
+        else:
+            interpolated_values = np.interp(new_time, x_coords, col_values).astype(
                 hercules_float_type
             )
-
-            # Add interpolated column to result
-            result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col))
+        result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col))
 
     # Process datetime columns
     for col in datetime_cols:
@@ -540,6 +549,31 @@ def _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols):
     return result_pl.to_pandas()
 
 
+def _compute_interval_midpoints(time_values):
+    """Compute the midpoints of consecutive time intervals.
+
+    For start-of-period timestamps, each value is best represented at the
+    centre of its interval.  The last interval width is assumed equal to the
+    preceding one.
+
+    Args:
+        time_values (np.ndarray): Sorted array of start-of-period timestamps.
+
+    Returns:
+        np.ndarray: Array of interval midpoints, same length as *time_values*.
+    """
+    # Allow the edge case of a single time value by returning the time value itself
+    if len(time_values) < 2:
+        return time_values
+    # Compute midpoints
+    midpoints = np.empty_like(time_values, dtype=np.float64)
+    midpoints[:-1] = (time_values[:-1] + time_values[1:]) / 2.0
+    midpoints[-1] = (
+        time_values[-1] + (time_values[-1] - time_values[-2]) / 2.0
+    )  # Last interval is equal to the previous one
+    return midpoints
+
+
 def find_time_utc_value(df, time_value, time_column="time", time_utc_column="time_utc"):
     """Return UTC timestamp at a given time value via linear interpolation or extrapolation.
 

diff --git a/tests/example_regression_tests/example_00_regression_test.py b/tests/example_regression_tests/example_00_regression_test.py
@@ -18,8 +18,8 @@
 
 # Test configuration
 NUM_TIME_STEPS = 5
-EXPECTED_FINAL_WIND_POWER = 3271  # Updated after wind model changes
-EXPECTED_FINAL_PLANT_POWER = 3271  # Same as wind power for wind-only case
+EXPECTED_FINAL_WIND_POWER = 3265  # Updated for midpoint interpolation correction
+EXPECTED_FINAL_PLANT_POWER = 3265  # Same as wind power for wind-only case
 
 # File names
 INPUT_FILE = "hercules_input.yaml"