-
Notifications
You must be signed in to change notification settings - Fork 1
feat: Add pandas Series support to Parameter.magnitude #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: prototype-2
Are you sure you want to change the base?
Conversation
- Update Parameter.magnitude type to accept int | float | list[float] | pd.Series - Add helper methods for series/scalar detection and conversion - Update all arithmetic operations (__add__, __sub__, __mul__, __truediv__, __pow__) for element-wise series operations - Update conversion methods (to, change_currency, change_heating_value) to handle series data - Preserve pandas Series indices through all operations - Add comprehensive test suite (16 new tests) covering series functionality - Update documentation with series usage examples and limitations - Maintain full backward compatibility with existing scalar usage - Add proper Pydantic configuration for pandas Series validation All existing tests (48) continue to pass, ensuring no regression. Total test coverage: 64 tests with 88% coverage of parameter.py.
for more information, see https://pre-commit.ci
…on/technology-data into feature/parameter-series-support
euronion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments @finozzifa to think about.
|
|
||
| magnitude: Annotated[ | ||
| int | float, Field(description="The numerical value of the parameter.") | ||
| int | float | pd.Series, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get pydantic to validate that the pd.Series is a series with only numerical values?
We don't want something like this:
pd.Series(
["a", "b", "c"],
index=[0, 1, 2]
)Also we want to allow any numericals, not just float, but also e.g. int, and dare I say also complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Maybe not directly, but you can add custom validators for pydantic that get triggered whenever a value is assigned. Maybe we need to use that. But checking inside the __init__ is not great.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uuuh this is a tough (but interesting) request I fear. I did try to implement but could not make it work. Hence I inclined to the validation in the constructor. Let us chat about it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, there is a package for that: pandera -> Making pandas and pydantic happy together.
Details for DataFrame here, looks like it works similarly for Series (but the docs for Series are not as nice):
https://pandera.readthedocs.io/en/stable/pydantic_integration.html
| if self._is_magnitude_series(): | ||
| # For series data, create a quantity with array magnitude | ||
| magnitude_array = self._get_magnitude_as_series().to_numpy() | ||
| self._pint_quantity = technologydata.ureg.Quantity( | ||
| magnitude_array, self.units | ||
| ) | ||
| else: | ||
| self._pint_quantity = technologydata.ureg.Quantity( | ||
| self.magnitude, self.units | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally we could always use an array, even if it is just a scalar? That'd allow you to get rid of a couple of if statements
| # Handle series vs scalar magnitude | ||
| if self._is_magnitude_series(): | ||
| magnitude_series = self._get_magnitude_as_series() | ||
| new_magnitude = magnitude_series * multiplier | ||
| else: | ||
| new_magnitude = self.magnitude * multiplier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pd.Series is always a numerical, shouldn't mypy then not accept the multiplication and the if-else block be obsolete?
| magnitude=new_quantity.magnitude, | ||
| units=new_quantity.units, | ||
| magnitude=new_magnitude, | ||
| units=str(new_quantity.units), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the new_magnitude? If so, this was therefore before I started to modify the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you also think it is not necessary feel free to remove it. I had not checked the code beforehand ;)
| This method checks for parameter compatibility before performing the addition. | ||
| The resulting Parameter retains the carrier, heating value, and combines provenance, | ||
| notes, and sources from both operands. | ||
| When adding series parameters, the operation is performed element-wise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this special behaviour? Can't we just do self_series + other_series and let pandas handle it? (scalar + Series = Series with scalar applied to each element; Series + Series = element-wise addition on aligned indices or NaN on indices that are only present in one Series).
Then we would have default pandas behaviour + code should be simpler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the other operations :)
Co-authored-by: Johannes HAMPP <[email protected]>
…/open-energy-transition/technology-data into feature/parameter-series-support
|
On hold for now, because more complicated than anticipated. |
Careful - this PR was fully prepared using GH Copilot, including the very verbose description. I didn't review it yet.
Low priority
This PR adds support for pandas Series and lists as magnitude values in the Parameter class, enabling users to work with time series data directly while maintaining full backward compatibility.
Key Changes
Testing
All 64 tests pass (48 existing + 16 new), ensuring no regression while adding new functionality.
Usage Example
What's Included in the PR:
Modified Files:
src/technologydata/parameter.py- Core implementationdocs/user_guide/parameter.md- Updated documentationNew Files:
test/test_parameter_series.py- Comprehensive test suiteCommit:
2d8e469with detailed commit messageThe branch is ready and all tests pass. Just create the PR manually through the GitHub interface! 🚀### What's Included in the PR:
Modified Files:
src/technologydata/parameter.py- Core implementationdocs/user_guide/parameter.md- Updated documentationNew Files:
test/test_parameter_series.py- Comprehensive test suiteCommit:
2d8e469with detailed commit messageThe branch is ready and all tests pass. Just create the PR manually through the GitHub interface! 🚀