Skip to content

Commit c88f46a

Browse files
authored
squash lookup table types (#745)
* remove old LookupTable types * merge categorical and interpolated paths
1 parent f76183b commit c88f46a

File tree

8 files changed

+227
-461
lines changed

8 files changed

+227
-461
lines changed

docs/source/concepts/lookup.rst

Lines changed: 31 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -41,55 +41,32 @@ population in a simulation.
4141
The lookup table system is built in layers. At the top is the
4242
:class:`Lookup Table <vivarium.framework.lookup.table.LookupTable>` object which
4343
is responsible for providing a uniform interface to the user regardless
44-
of the underlying implementation. From the user's perspective, it takes in
45-
a data set or scalar value on initialization and then lets them query against
46-
that data with a population index.
47-
48-
The next layer is selected at initialization time based on the type of data
49-
provided. The :class:`Lookup Table <vivarium.framework.lookup.table.LookupTable>`
50-
picks a :class:`ScalarTable <vivarium.framework.lookup.table.ScalarTable>`
51-
if a single value is provided as the data, a
52-
:class:`CategoricalTable <vivarium.framework.lookup.table.CategoricalTable>` if a
53-
:class:`pandas.DataFrame` with only categorical variables is provided as the
54-
data, and a :class:`InterpolatedTable <vivarium.framework.lookup.table.InterpolatedTable>`
55-
if a :class:`pandas.DataFrame` which has at least one continuous variable is
56-
provided as the data.
44+
of the underlying data. From the user's perspective, it takes in a data set
45+
or scalar value on initialization and then lets them query against that data
46+
with a population index.
47+
48+
At initialization time, the
49+
:class:`Lookup Table <vivarium.framework.lookup.table.LookupTable>` examines the
50+
provided data and configures itself accordingly. If the data is a scalar value
51+
(or list/tuple of scalars), the table simply broadcasts those values over the
52+
population index when called. If the data is a :class:`pandas.DataFrame`, the
53+
table delegates to an
54+
:class:`Interpolation <vivarium.framework.lookup.interpolation.Interpolation>`
55+
object that handles both categorical and continuous parameter lookups. The
56+
:class:`Interpolation <vivarium.framework.lookup.interpolation.Interpolation>`
57+
groups the data by any categorical (key) columns and then, for each group,
58+
finds the correct bin for any continuous parameters. Tables with only
59+
categorical parameters are simply the special case where there are no
60+
continuous parameters to bin on.
5761

5862
.. note::
5963

60-
The :class:`InterpolatedTable <vivarium.framework.lookup.table.InterpolatedTable>`
61-
is a misnomer here. It confuses the data handling strategy with the
62-
underlying data representation. A better name would be ``BinnedDataTable``
63-
to indicate that it wraps data where the continuous parameters are
64-
represented by bin edges in the provided data. This would allow us
65-
to easily think about and extend the lookup system to wrap data where the
66-
continuous parameters are represented by points and to tables where all
67-
parameters are categorical.
68-
69-
If the underlying data is a single value or consists only of categorical variables,
70-
this is the last layer of abstraction. The
71-
:class:`ScalarTable <vivarium.framework.lookup.table.ScalarTable>` and
72-
:class:`CategoricalTable <vivarium.framework.lookup.table.CategoricalTable>` each
73-
have only one reasonable strategy which is to broadcast the value over the
74-
population index. If we have continuous variables and therefore an
75-
:class:`InterpolatedTable <vivarium.framework.lookup.table.InterpolatedTable>`,
76-
there are additional layers to the lookup system to allow the user to
77-
control the strategy for turning the population index into values based on
78-
the data. The
79-
:class:`InterpolatedTable <vivarium.framework.lookup.table.InterpolatedTable>`
80-
is then responsible for turning the population index into a set of
81-
attributes relevant to the value production based on the structure of
82-
the input data and then providing those attributes to the value production
83-
strategy.
84-
85-
.. note::
86-
87-
I'm being careful with language here. We have objects named
88-
``Interpolation`` and ``InterpolatedTable`` though the operation they
89-
perform is actually disaggregation. If we extend the system to
90-
work with point estimates for the continuous parameters, then
91-
interpolation would appropriately describe what we do. Both are
92-
value production strategies based on the structure of the input data.
64+
The ``Interpolation`` name is somewhat of a misnomer. For order 0
65+
(the only currently supported order), the operation is really
66+
disaggregation -- finding the correct bin a value belongs to rather
67+
than interpolating between points. If the system is extended to work
68+
with point estimates for continuous parameters, then interpolation
69+
would appropriately describe the operation.
9370

9471
More information about the value production strategies can be found in
9572
:ref:`here <interpolation_concept>`.
@@ -223,8 +200,8 @@ When building a lookup table from a :class:`pandas.DataFrame` using ``data_sourc
223200
the component automatically determines key columns, parameter columns, and value columns
224201
based on the data structure:
225202

226-
- **Value columns** are assumed by the structure of the artifact to be ``["value"]``. In principle,
227-
this could be configured by implementing a custom :class:`~vivarium.framework.artifact.manager.ArtifactManager`.
203+
- **Value columns** can be provided as an argument to :meth:`~vivarium.component.Component.build_lookup_table`
204+
If value columns are not provided, it will default to ``"value"``.
228205
- **Parameter columns** are detected by finding columns ending in ``_start``
229206
that have corresponding ``_end`` columns (e.g., ``age_start``/``age_end``).
230207
- **Key columns** are all remaining columns that are neither value columns
@@ -296,11 +273,15 @@ integrating a lookup table into a :term:`component <Component>`, which is primar
296273
how they are used. Assuming you have a valid simulation object named ``sim`` and
297274
the data from the above table in a :class:`pandas.DataFrame` named ``data``, you
298275
can construct a lookup table in the following way, using the interface from the builder.
276+
You don't have to provide a name for the table, but it is recommended to do so for clarity
277+
and for ease of debugging. If you don't provide value column names, it will default to
278+
``"value"``.
279+
299280

300281
.. code-block:: python
301282
302283
# value_columns implicitly set to remaining columns
303-
> bmi = sim.builder.lookup.build_table(data, key_columns=['sex'], parameter_columns=['age'])
284+
> bmi = sim.builder.lookup.build_table(data, name="bmi")
304285
> population = sim.get_population()
305286
> bmi(population.index).head() # returns BMI values for the population
306287
@@ -316,7 +297,7 @@ can construct a lookup table in the following way, using the interface from the
316297
Constructing a lookup table currently requires your data meet specific
317298
conditions. These are a consequence of the method the lookup table uses to
318299
arrive at the correct data. Specifically, your parameter columns must
319-
represent bins and they must overlap.
300+
represent bins and they must not overlap or have gaps.
320301

321302
Estimating Unknown Values
322303
-------------------------
Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,4 @@
11
from vivarium.framework.lookup.interface import LookupTableInterface
2-
from vivarium.framework.lookup.manager import (
3-
LookupTableManager,
4-
validate_build_table_parameters,
5-
)
2+
from vivarium.framework.lookup.manager import LookupTableManager
63
from vivarium.framework.lookup.table import DEFAULT_VALUE_COLUMN, LookupTable
74
from vivarium.types import LookupTableData, ScalarValue

src/vivarium/framework/lookup/interpolation.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -145,11 +145,6 @@ def validate_parameters(
145145
if data.empty:
146146
raise ValueError("You must supply non-empty data to create the interpolation.")
147147

148-
if len(continuous_parameters) < 1:
149-
raise ValueError(
150-
"You must supply at least one continuous parameter over which to interpolate."
151-
)
152-
153148
for p in continuous_parameters:
154149
if not isinstance(p, (tuple, list)) or len(p) != 3:
155150
raise ValueError(
@@ -160,7 +155,6 @@ def validate_parameters(
160155
)
161156

162157
# break out the individual columns from binned column name lists
163-
param_cols = [col for p in continuous_parameters for col in p]
164158
if not value_columns:
165159
raise ValueError(
166160
f"No non-parameter data. Available columns: {data.columns}, "
@@ -343,12 +337,20 @@ def __call__(self, interpolants: pd.DataFrame) -> pd.DataFrame:
343337
Parameters
344338
----------
345339
interpolants
346-
Data frame containing the parameters to interpolate..
340+
Data frame containing the parameters to interpolate.
347341
348342
Returns
349343
-------
350344
A table with the interpolated values for the given interpolants.
351345
"""
346+
if not self.parameter_bins:
347+
# No continuous parameters — just broadcast the data values.
348+
# With only categorical parameters, each sub-table has a single row.
349+
return pd.DataFrame(
350+
{col: self.data[col].iloc[0] for col in self.value_columns},
351+
index=interpolants.index,
352+
)
353+
352354
# build a dataframe where we have the start of each parameter bin for each interpolant
353355
interpolant_bins = pd.DataFrame(index=interpolants.index)
354356

src/vivarium/framework/lookup/manager.py

Lines changed: 16 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -15,23 +15,14 @@
1515
from __future__ import annotations
1616

1717
from collections.abc import Mapping
18-
from datetime import datetime, timedelta
19-
from typing import TYPE_CHECKING, Any
20-
from typing import SupportsFloat as Numeric
21-
from typing import overload
18+
from typing import TYPE_CHECKING, Any, overload
2219

2320
import pandas as pd
2421
from layered_config_tree import LayeredConfigTree
2522

2623
from vivarium.framework.event import Event
2724
from vivarium.framework.lifecycle import lifecycle_states
28-
from vivarium.framework.lookup.table import (
29-
DEFAULT_VALUE_COLUMN,
30-
CategoricalTable,
31-
InterpolatedTable,
32-
LookupTable,
33-
ScalarTable,
34-
)
25+
from vivarium.framework.lookup.table import DEFAULT_VALUE_COLUMN, LookupTable
3526
from vivarium.manager import Manager
3627
from vivarium.types import LookupTableData
3728

@@ -62,14 +53,14 @@ def __init__(self) -> None:
6253
super().__init__()
6354
self.tables: dict[str, LookupTable[pd.Series[Any]] | LookupTable[pd.DataFrame]] = {}
6455

65-
def setup(self, builder: "Builder") -> None:
56+
def setup(self, builder: Builder) -> None:
6657
self._logger = builder.logging.get_logger(self.name)
6758
self._configuration = builder.configuration
68-
self._pop_view_builder = builder.population.get_view
59+
self._get_view = builder.population.get_view
6960
self.clock = builder.time.clock()
70-
self._interpolation_order = builder.configuration.interpolation.order
71-
self._extrapolate = builder.configuration.interpolation.extrapolate
72-
self._validate = builder.configuration.interpolation.validate
61+
self.interpolation_order = builder.configuration.interpolation.order
62+
self.extrapolate = builder.configuration.interpolation.extrapolate
63+
self.validate_interpolation = builder.configuration.interpolation.validate
7364
self._add_resources = builder.resources.add_resources
7465
self._add_constraint = builder.lifecycle.add_constraint
7566
self._get_current_component = builder.components.get_current_component
@@ -125,7 +116,7 @@ def build_table(
125116
table = self._build_table(component, data, name, value_columns)
126117
self._add_resources(component, table, table.required_resources)
127118
self._add_constraint(
128-
table.call,
119+
table._call,
129120
restrict_during=[
130121
lifecycle_states.INITIALIZATION,
131122
lifecycle_states.SETUP,
@@ -150,107 +141,19 @@ def _build_table(
150141
data = pd.DataFrame(data)
151142

152143
value_columns_ = value_columns if value_columns else DEFAULT_VALUE_COLUMN
153-
validate_build_table_parameters(data, value_columns_)
154144

155-
table: LookupTable[pd.Series[Any]] | LookupTable[pd.DataFrame]
156-
if isinstance(data, pd.DataFrame):
157-
parameter_columns, key_columns = self._get_columns(value_columns_, data)
158-
if parameter_columns:
159-
table = InterpolatedTable(
160-
name=name,
161-
component=component,
162-
data=data,
163-
population_view_builder=self._pop_view_builder,
164-
key_columns=key_columns,
165-
parameter_columns=parameter_columns,
166-
value_columns=value_columns_,
167-
interpolation_order=self._interpolation_order,
168-
clock=self.clock,
169-
extrapolate=self._extrapolate,
170-
validate=self._validate,
171-
)
172-
else:
173-
table = CategoricalTable(
174-
name=name,
175-
component=component,
176-
data=data,
177-
population_view_builder=self._pop_view_builder,
178-
key_columns=key_columns,
179-
value_columns=value_columns_,
180-
)
181-
else:
182-
table = ScalarTable(
183-
name=name, component=component, data=data, value_columns=value_columns_
184-
)
145+
table = LookupTable(
146+
name=name,
147+
component=component,
148+
data=data,
149+
value_columns=value_columns_,
150+
manager=self,
151+
population_view=self._get_view(),
152+
)
185153

186154
self.tables[table.name] = table
187155

188156
return table
189157

190158
def __repr__(self) -> str:
191159
return "LookupTableManager()"
192-
193-
@staticmethod
194-
def _get_columns(
195-
value_columns: list[str] | tuple[str, ...] | str, data: pd.DataFrame
196-
) -> tuple[list[str], list[str]]:
197-
if isinstance(value_columns, str):
198-
value_columns = [value_columns]
199-
200-
all_columns = list(data.columns)
201-
202-
potential_parameter_columns = [
203-
str(col).removesuffix("_start")
204-
for col in all_columns
205-
if str(col).endswith("_start")
206-
]
207-
parameter_columns = []
208-
bin_edge_columns = []
209-
for column in potential_parameter_columns:
210-
if f"{column}_end" in all_columns:
211-
parameter_columns.append(column)
212-
bin_edge_columns += [f"{column}_start", f"{column}_end"]
213-
214-
key_columns = [
215-
col
216-
for col in all_columns
217-
if col not in value_columns and col not in bin_edge_columns
218-
]
219-
220-
return parameter_columns, key_columns
221-
222-
223-
def validate_build_table_parameters(
224-
data: LookupTableData,
225-
value_columns: list[str] | tuple[str, ...] | str,
226-
) -> None:
227-
"""Makes sure the data format agrees with the provided column layout."""
228-
if (
229-
data is None
230-
or (isinstance(data, pd.DataFrame) and data.empty)
231-
or (isinstance(data, (list, tuple)) and not data)
232-
):
233-
raise ValueError("Must supply some data")
234-
235-
acceptable_types = (Numeric, datetime, timedelta, list, tuple, pd.DataFrame)
236-
if not isinstance(data, acceptable_types):
237-
raise TypeError(
238-
f"The only allowable types for data are {acceptable_types}. "
239-
f"You passed {type(data)}."
240-
)
241-
242-
if isinstance(data, (list, tuple)):
243-
if isinstance(value_columns, str):
244-
raise ValueError(
245-
"When supplying multiple values, value_columns must be a list or tuple of strings."
246-
)
247-
if len(value_columns) != len(data):
248-
raise ValueError(
249-
"The number of value columns must match the number of values."
250-
f"You supplied values: {data} and value_columns: {value_columns}"
251-
)
252-
elif not isinstance(data, pd.DataFrame):
253-
if not isinstance(value_columns, str):
254-
raise ValueError(
255-
"When supplying a single value, value_columns must be a string if provided."
256-
)

0 commit comments

Comments
 (0)