Use of native sparse array support in xarray / pandas / netCDF #800
Replies: 9 comments 4 replies
-
|
We've investigated sparse arrays a few times in the past. The problem is the data types that can be handled sparsely. If it can now handle pyomo/gurobi objects then that's great! |
Beta Was this translation helpful? Give feedback.
-
|
Can you elaborate on the incompatibility? |
Beta Was this translation helpful? Give feedback.
-
|
I did some tests on this locally and it doesn't work out-of-the-box. Operations between sparse arrays don't work when the data types are objects, e.g.: import calliope
import xarray
m = calliope.examples.urban_scale()
m.build(force=True)
# you can't transform an existing array into sparse, so this is a quick hack.
foo = xr.DataArray.from_series(m.backend.variables.flow_cap.to_series(), sparse=True)
bar = xr.DataArray.from_series(m.backend.parameters.flow_cap_max.to_series(), sparse=True)
foo * bar
[Out] TypeError: Implicit conversion of Pyomo numeric value (parameters[flow_cap_max][12]*variables[flow_cap][0]) to float is disabled.
This error is often the result of using Pyomo components as arguments to
one of the Python built-in math module functions when defining
expressions. Avoid this error by using Pyomo-provided math functions or
explicitly resolving the numeric value using the Pyomo value() function.I've also tried setting the fill value of File ~/miniforge3/envs/calliope/lib/python3.12/site-packages/sparse/_umath.py:542, in _Elemwise._get_fill_value(self)
540 # Store dtype separately if needed.
541 if self.dtype is not None:
--> 542 fill_value = fill_value.astype(self.dtype)
544 self.fill_value = fill_value
545 self.dtype = self.fill_value.dtype
AttributeError: 'float' object has no attribute 'astype'File ~/miniforge3/envs/calliope/lib/python3.12/site-packages/sparse/_umath.py:524, in _Elemwise._get_fill_value(self)
521 fill_value_array = self.func(*np.broadcast_arrays(*zero_args), **self.kwargs)
523 try:
--> 524 fill_value = fill_value_array[(0,) * fill_value_array.ndim]
525 except IndexError:
526 zero_args = tuple(
527 arg.fill_value if isinstance(arg, COO) else _zero_of_dtype(arg.dtype) for arg in self.args
528 )
AttributeError: 'float' object has no attribute 'ndim'File ~/miniforge3/envs/calliope/lib/python3.12/site-packages/sparse/_umath.py:545, in _Elemwise._get_fill_value(self)
542 fill_value = fill_value.astype(self.dtype)
544 self.fill_value = fill_value
--> 545 self.dtype = self.fill_value.dtype
AttributeError: 'float' object has no attribute 'dtype'It works fine for operations on purely numeric data, just not when working with object arrays. |
Beta Was this translation helpful? Give feedback.
-
|
@brynpickering after messing around with the backend, I see what you mean... this one will be tough. The big issue here is that Pyomo does support sparsity natively... but I do not yet know the backend well enough to know how/what needs to change. Algorithmically, a solution should be possible if we have full determinism when flattening matrixes. i.e., we always know the following:
Not sure if we currently have this, but it should allow us to lazily fill in sparse vectors and then drop them to the backend. At least in theory... |
Beta Was this translation helpful? Give feedback.
-
|
We're already effectively using dense arrays as far as Pyomo is concerned. It's the application of operations across N dimensions (incl. broadcasting capabilities) that we benefit from by also representing those objects in NaN-filled arrays. We came from this full determinism in v0.6 to what we have now because it is very messy to ensure this in a generalised way, especially when math components don't share the exact same dimensions. |
Beta Was this translation helpful? Give feedback.
-
|
Yep, I understand that the issue is not in the backend. The increase in memory will only happen on our side... import itertools
dims = ["techs", "nodes", "steps", "carriers", "foo", "bar", "perrito"]
all_unique_combinations_sorted = set()
for i in range(len(dims)):
for group in itertools.combinations(dims, i+1):
all_unique_combinations_sorted.add(".".join(sorted(group)).lower())
all_unique_combinations_sorted.add(["GLOBALS"]) # no idea if this is even neededThis contains all possible combinations of our dimensions, always in order (127 + Lookups are easy: just sort and join the dimensions you want. Similarly, the large number of keys does not matter too much. Sparse data has little memory impact by design, and you can easily erase empty combinations if you wish. Would this work, or am I saying something silly? |
Beta Was this translation helpful? Give feedback.
-
|
I'm closing this as not an issue that we plan to address, given how small our datasets are (even with all their NaNs) compared to peak memory consumption in the optimisation step (see this comment). |
Beta Was this translation helpful? Give feedback.
-
|
@brynpickering I'm ok in closing this. Some closing remarks: my worry actually relates to the size of the optimisation problem.
However, I should've done more checks on this, so I will only re-open this if I identify it as a problem that reaches the actual optimisation. |
Beta Was this translation helpful? Give feedback.
-
|
@irm-codebase I've moved this to a discussion so we can slowly chip away at it. One thing I've found I can do to circumvent the way import calliope
import xarray as xr
import numpy as np
class FillObj:
def __init__(self, value):
self.value = float(value)
@property
def dtype(self):
return np.dtype('O') # Object dtype
def __add__(self, other):
return FillObj(self.value)
def __sub__(self, other):
return FillObj(self.value)
def __mul__(self, other):
return FillObj(self.value)
def __truediv__(self, other):
return FillObj(self.value)
def __floordiv__(self, other):
return FillObj(self.value)
def __mod__(self, other):
return FillObj(self.value)
def __pow__(self, other):
return FillObj(self.value)
# Reverse operations
def __radd__(self, other):
return FillObj(self.value)
def __rsub__(self, other):
return FillObj(self.value)
def __rmul__(self, other):
return FillObj(self.value)
def __rtruediv__(self, other):
return FillObj(self.value)
def __rfloordiv__(self, other):
return FillObj(self.value)
def __rmod__(self, other):
return FillObj(self.value)
def __rpow__(self, other):
return FillObj(self.value)
def __repr__(self):
return f"<{self.value}>"
def __float__(self):
return self.value
m = calliope.examples.national_scale(time_subset=None) # full year
m.build()
# you can't transform an existing array into sparse, so this is a quick hack.
foo = xr.DataArray.from_series(m.backend.variables.flow_cap.to_series(), sparse=True)
bar = xr.DataArray.from_series(m.backend.parameters.flow_out.to_series(), sparse=True)
for sparse_arr in [foo, bar]:
sparse_arr.data.fill_value = FillObj(np.nan)
sparse_da = foo * bar
dense_da = m.backend.variables.flow_cap * m.backend.parameters.flow_outThe problem I'm having is that the size of the resulting sparse array is larger than the dense one! def quick_memory_comparison(dense_da, sparse_da):
"""Quick one-liner memory comparison"""
dense_mb = dense_da.nbytes / (1024**2)
sparse_mb = sparse_da.data.nbytes / (1024**2) if hasattr(sparse_da.data, 'nbytes') else \
(sum(coord.nbytes for coord in sparse_da.data.coords) + sparse_da.data.data.nbytes) / (1024**2)
print(f"Dense: {dense_mb:.2f} MB | Sparse: {sparse_mb:.2f} MB | Ratio: {dense_mb/sparse_mb:.2f}x")
return dense_mb, sparse_mb
quick_memory_comparison(dense_da, sparse_da)
[Out] Dense: 2.67 MB | Sparse: 5.68 MB | Ratio: 0.47x |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What can be improved?
Calliope should be more memory efficient! <- Finally this one is applicable
While checking how to better support sparsity in our ecosystem, I found about
sparse. It is quite literally focused on the common use-case of super sparse data in ESOMs.After some investigation, it seems like
xarrayeither has, or is planning to, roll out support for this library: pydata/xarray#3213.pandasseems to have also rolled out support.I propose to evaluate how to integrate this into
calliope, with two key design goals in mind:xarraysetups.dimitself in our constraints for cases were they are ordered. Two use cases (for pathways):Version
v0.7
Beta Was this translation helpful? Give feedback.
All reactions