-
Notifications
You must be signed in to change notification settings - Fork 21
Feature/correct interpolation #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 11 commits
750eccc
0651975
f0aa013
aa82d05
72510da
f1abb94
806ee4a
753d077
55c112d
a444406
9367453
9cf9211
87904a3
17ff78e
3fa8c1a
31e9bc5
fcc046d
d872b4b
b62f96a
34d0053
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -172,10 +172,15 @@ def _read_external_data_file(self, filename): | |
| """ | ||
| Read and interpolate external data from a CSV, feather, or pickle file. | ||
|
|
||
| This method reads external data from the specified file (CSV, feather, or pickle) | ||
| and interpolates it according to the simulation time steps. The external data must | ||
| include a 'time_utc' column which will be converted to simulation time. | ||
| The interpolated data is stored in self.external_signals_all. | ||
| This method reads external data from the specified file (CSV, feather, or | ||
| pickle) and interpolates it onto the simulation time grid using zero-order | ||
| hold (``"zoh_to_instantaneous"``). ZOH is appropriate because external | ||
| signals such as LMP prices are piecewise-constant over each reporting | ||
| interval, unlike time-averaged weather data used by wind/solar components. | ||
|
|
||
| The external data must include a ``time_utc`` column which will be | ||
| converted to simulation time. The interpolated data is stored in | ||
| ``self.external_signals_all``. | ||
|
|
||
| Args: | ||
| filename (str): Path to the file containing external data. Supported formats: | ||
|
|
@@ -216,7 +221,9 @@ def _read_external_data_file(self, filename): | |
| ) | ||
|
|
||
| # Interpolate using the utility function | ||
| df_interpolated = interpolate_df(df_ext, new_times) | ||
| df_interpolated = interpolate_df( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure we should hard code this. It makes sense that the LMP prices should be this type of interpolation, but this is also how we input power reference signals. Should those also use the
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree this is tricky. I think to date in Hercules we haven't explicitly tracked LMP prices, these are just external signals that end up having signal names that Hycon expects. One idea is we can add to Hercules input a dictionary that says how different external channels should be upsampled. With a default to this for backwards compatibilty, or not, force explicit and have one more breaking change? @misi9170 any thoughts here? |
||
| df_ext, new_times, interpolation_method="zoh_to_instantaneous" | ||
| ) | ||
|
|
||
| # Convert interpolated DataFrame to dictionary format | ||
| for col in df_interpolated.columns: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -448,20 +448,52 @@ def close_logging(logger): | |
| logger.removeHandler(handler) | ||
|
|
||
|
|
||
| def interpolate_df(df, new_time): | ||
| _VALID_INTERPOLATION_METHODS = { | ||
| "averaged_to_instantaneous", | ||
| "zoh_to_instantaneous", | ||
| "instantaneous_to_instantaneous", | ||
| } | ||
|
|
||
|
|
||
| def interpolate_df(df, new_time, interpolation_method): | ||
|
||
| """Interpolate DataFrame values to match new time axis. | ||
|
|
||
| Uses linear interpolation with Polars backend for better performance and memory efficiency. | ||
| Converts datetime columns to timestamps for interpolation. | ||
| The ``interpolation_method`` parameter controls how numeric columns are | ||
| resampled onto ``new_time``: | ||
|
|
||
| - ``"averaged_to_instantaneous"``: Input values are period averages whose | ||
| timestamps mark the **start** of each period. Each value is assigned to | ||
| the midpoint of its interval and then linearly interpolated. Use for | ||
| wind speed, solar irradiance, and similar time-averaged signals. | ||
| - ``"zoh_to_instantaneous"``: Input values are piecewise-constant | ||
| (zero-order hold) with timestamps at the start of each interval. Each | ||
| query time receives the value of the last original timestamp at or | ||
| before it. Use for LMP prices and other step-change signals. | ||
| - ``"instantaneous_to_instantaneous"``: Input values already represent | ||
| instantaneous measurements. Standard linear interpolation is performed | ||
| directly on the original timestamps with no midpoint shift. | ||
|
|
||
| Datetime columns (e.g. ``time_utc``) are always linearly interpolated on | ||
| the raw timestamps regardless of the chosen method, because they map | ||
| simulation time to wall-clock time directly. | ||
|
|
||
| Args: | ||
| df (pd.DataFrame): DataFrame with 'time' column and data columns. | ||
| new_time (array-like): New time points for interpolation. | ||
| interpolation_method (str): One of ``"averaged_to_instantaneous"``, | ||
| ``"zoh_to_instantaneous"``, or | ||
| ``"instantaneous_to_instantaneous"``. | ||
|
|
||
| Returns: | ||
| pd.DataFrame: DataFrame with new time axis and interpolated data columns. | ||
|
|
||
| """ | ||
| # Convert new_time to numpy array for consistency | ||
| if interpolation_method not in _VALID_INTERPOLATION_METHODS: | ||
| raise ValueError( | ||
| f"Unknown interpolation_method '{interpolation_method}'. " | ||
| f"Must be one of {sorted(_VALID_INTERPOLATION_METHODS)}." | ||
| ) | ||
|
|
||
| new_time = np.asarray(new_time) | ||
|
|
||
| # Separate datetime and non-datetime columns for different processing | ||
|
|
@@ -475,50 +507,27 @@ def interpolate_df(df, new_time): | |
| else: | ||
| numeric_cols.append(col) | ||
|
|
||
| return _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols) | ||
|
|
||
|
|
||
| def _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols): | ||
| """Interpolate using Polars backend. | ||
|
|
||
| Args: | ||
| df (pd.DataFrame): Input DataFrame. | ||
| new_time (np.ndarray): New time points. | ||
| datetime_cols (list): Datetime column names. | ||
| numeric_cols (list): Numeric column names. | ||
|
|
||
| Returns: | ||
| pd.DataFrame: Interpolated DataFrame. | ||
| """ | ||
| # Convert to Polars for efficient processing | ||
| df_pl = pl.from_pandas(df) | ||
| result_pl = pl.DataFrame({"time": new_time}) | ||
|
|
||
| # Create a Polars DataFrame for the new time points | ||
| new_time_pl = pl.DataFrame({"time": new_time}) | ||
|
|
||
| # Start with the time column | ||
| result_pl = new_time_pl | ||
| time_values = df_pl["time"].to_numpy() | ||
|
|
||
| # Process numeric columns using Polars' interpolation | ||
| if numeric_cols: | ||
| for col in numeric_cols: | ||
| # Use Polars' join_asof for efficient interpolation-like behavior | ||
| # This is more memory efficient than pandas for large datasets | ||
| col_data = df_pl.select(["time", col]).sort("time") | ||
|
|
||
| # Perform interpolation using Polars operations | ||
| # Note: Polars doesn't have direct linear interpolation, so we use numpy interp | ||
| # but with Polars' efficient data extraction | ||
| time_values = col_data["time"].to_numpy() | ||
| col_values = col_data[col].to_numpy() | ||
|
|
||
| # Linear interpolation with float32 precision | ||
| interpolated_values = np.interp(new_time, time_values, col_values).astype( | ||
| if interpolation_method == "averaged_to_instantaneous": | ||
| x_coords = _compute_interval_midpoints(time_values) | ||
| else: | ||
| x_coords = time_values | ||
|
|
||
| for col in numeric_cols: | ||
| col_values = df_pl[col].to_numpy() | ||
| if interpolation_method == "zoh_to_instantaneous": | ||
| indices = np.searchsorted(time_values, new_time, side="right") - 1 | ||
| indices = np.clip(indices, 0, len(col_values) - 1) | ||
| interpolated_values = col_values[indices].astype(hercules_float_type) | ||
| else: | ||
| interpolated_values = np.interp(new_time, x_coords, col_values).astype( | ||
| hercules_float_type | ||
| ) | ||
|
|
||
| # Add interpolated column to result | ||
| result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col)) | ||
| result_pl = result_pl.with_columns(pl.lit(interpolated_values).alias(col)) | ||
|
|
||
| # Process datetime columns | ||
| for col in datetime_cols: | ||
|
|
@@ -540,6 +549,31 @@ def _interpolate_with_polars(df, new_time, datetime_cols, numeric_cols): | |
| return result_pl.to_pandas() | ||
|
|
||
|
|
||
| def _compute_interval_midpoints(time_values): | ||
| """Compute the midpoints of consecutive time intervals. | ||
|
|
||
| For start-of-period timestamps, each value is best represented at the | ||
| centre of its interval. The last interval width is assumed equal to the | ||
|
paulf81 marked this conversation as resolved.
Outdated
|
||
| preceding one. | ||
|
|
||
| Args: | ||
| time_values (np.ndarray): Sorted array of start-of-period timestamps. | ||
|
|
||
| Returns: | ||
| np.ndarray: Array of interval midpoints, same length as *time_values*. | ||
| """ | ||
| # Allow the edge case of a single time value by returning the time value itself | ||
| if len(time_values) < 2: | ||
| return time_values | ||
| # Compute midpoints | ||
| midpoints = np.empty_like(time_values, dtype=np.float64) | ||
| midpoints[:-1] = (time_values[:-1] + time_values[1:]) / 2.0 | ||
| midpoints[-1] = ( | ||
| time_values[-1] + (time_values[-1] - time_values[-2]) / 2.0 | ||
| ) # Last interval is equal to the previous one | ||
| return midpoints | ||
|
Comment on lines
+566
to
+571
|
||
|
|
||
|
|
||
| def find_time_utc_value(df, time_value, time_column="time", time_utc_column="time_utc"): | ||
| """Return UTC timestamp at a given time value via linear interpolation or extrapolation. | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description appears to conflict with
HerculesModel._read_external_data_file, which explicitly interpolates external data with"instantaneous_to_instantaneous"(anddocs/timing.mdalso documents external data as instantaneous unless preprocessed for ZOH). Update this line to reflect the external-data convention (instantaneous-to-instantaneous interpolation), and reserve the “start-of-period period-average” wording for wind/solar/SCADA/playback inputs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, updated