Skip to content

Commit f531645

Browse files
committed
Add section to lookup.rst about configuring source via component
1 parent 2a048b5 commit f531645

File tree

1 file changed

+189
-6
lines changed

1 file changed

+189
-6
lines changed

docs/source/concepts/lookup.rst

Lines changed: 189 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -142,16 +142,199 @@ Female 40 60 30
142142
Female 60 100 27
143143
====== ========= ======= ======
144144

145+
Constructing Lookup Tables from a Component
146+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
147+
148+
Components can to register lookup tables to be built by specifying
149+
a ``data_sources`` block in their ``configuration_defaults`` property.
150+
As a basic example, DiseaseModel in ``vivarium_public_health`` has the following
151+
``data_sources`` configuration:
152+
.. code-block:: python
153+
154+
@property
155+
def configuration_defaults(self) -> dict[str, Any]:
156+
return {
157+
f"{self.name}": {
158+
"data_sources": {
159+
"cause_specific_mortality_rate": self.load_cause_specific_mortality_rate,
160+
},
161+
},
162+
}
163+
164+
which specifies a single lookup table named
165+
``cause_specific_mortality_rate`` whose data is provided by the component's
166+
``load_cause_specific_mortality_rate`` method.
167+
168+
Each entry in
169+
``data_sources`` maps a table name to a data source from one of several supported types
170+
(see `Data Source Types`_). Barring edge cases (see
171+
`Limitations and When to Override`_), one should specify all of a component's
172+
lookup tables via ``data_sources``, instead of accessing the builder's lookup
173+
interface directly.
174+
175+
When a component configures ``data_sources``, the base
176+
:class:`Component <vivarium.component.Component>` class automatically builds
177+
the lookup tables before the component's ``setup()`` method is called. The
178+
resulting tables are stored in the component's ``lookup_tables`` dictionary,
179+
keyed by the name specified in ``data_sources``.
180+
181+
This approach separates the *what* (which tables to build and where to get data) from the
182+
*how* (the mechanics of table construction), making components easier to
183+
write and configure. It also allows users to override data sources in model specification files
184+
without modifying component code. Following the example above, a model specification could adjust the
185+
``cause_specific_mortality_rate`` data source to point to different data or a scalar value:
186+
187+
.. code-block:: yaml
188+
189+
configuration:
190+
disease_model:
191+
data_sources:
192+
cause_specific_mortality_rate: 0.02
193+
194+
Data Source Types
195+
^^^^^^^^^^^^^^^^^
196+
197+
Each entry in ``data_sources`` maps a table name to a data source. The
198+
following data source types are supported:
199+
200+
**Artifact key (string without** ``::`` **):**
201+
A string path to data in the artifact, e.g.,
202+
``"cause.all_causes.cause_specific_mortality_rate"``. The data is loaded
203+
via ``builder.data.load()``.
204+
205+
**Callable:**
206+
Any callable (function, lambda, or bound method) that accepts a ``builder``
207+
argument and returns the data.
208+
209+
**Scalar value:**
210+
A numeric value (``int``, ``float``), ``datetime``, or ``timedelta`` that
211+
will be broadcast over the population index when the table is called.
212+
213+
**Method reference (string with** ``self::`` **):**
214+
A string of the form ``"self::method_name"`` that references a method on
215+
the component itself. The method should accept a ``builder`` argument and
216+
return the data. This is primarily for use in the `model specification YAML
217+
files <model_specification_concept>`_ where direct method references are not
218+
possible.
219+
220+
**External function reference (string with** ``module.path::`` **):**
221+
A string of the form ``"module.path::function_name"`` that references a
222+
function in another module. The function should accept a ``builder``
223+
argument and return the data. This is primarily for use in the
224+
`model specification YAML files <model_specification_concept>`_ where direct
225+
method references are not possible.
226+
227+
228+
229+
Column Detection
230+
^^^^^^^^^^^^^^^^
231+
232+
When building a lookup table from a :class:`pandas.DataFrame` using ``data_sources``,
233+
the component automatically determines key columns, parameter columns, and value columns
234+
based on the data structure:
235+
236+
- **Value columns** default to ``["value"]`` (configurable via the artifact
237+
interface).
238+
- **Parameter columns** are detected by finding columns ending in ``_start``
239+
that have corresponding ``_end`` columns (e.g., ``age_start``/``age_end``).
240+
- **Key columns** are all remaining columns that are neither value columns
241+
nor parameter bin edge columns.
242+
243+
See the `Construction Parameters`_ section for definitions of these
244+
column types.
245+
246+
Example: Writing a Component with Data Sources
247+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
248+
249+
A more complete example is reproduced from the ``Mortality`` component in ``vivarium_public_health``:
250+
251+
.. code-block:: python
252+
253+
from vivarium import Component
254+
255+
class Mortality(Component):
256+
257+
@property
258+
def configuration_defaults(self) -> dict[str, Any]:
259+
return {
260+
"mortality": {
261+
"data_sources": {
262+
# Artifact key - loaded via builder.data.load()
263+
"all_cause_mortality_rate": "cause.all_causes.cause_specific_mortality_rate",
264+
# Method reference - calls self.load_unmodeled_csmr(builder)
265+
"unmodeled_cause_specific_mortality_rate": self.load_unmodeled_csmr,
266+
# Another artifact key
267+
"life_expectancy": "population.theoretical_minimum_risk_life_expectancy",
268+
},
269+
"unmodeled_causes": [],
270+
},
271+
}
272+
273+
Example: Configuring Data Sources as a User
274+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
275+
276+
Users can override the default data sources in a model specification YAML
277+
file. This allows changing where data comes from without modifying component
278+
code:
279+
280+
.. code-block:: yaml
281+
282+
configuration:
283+
mortality:
284+
data_sources:
285+
# Override with a scalar value instead of artifact data
286+
all_cause_mortality_rate: 0.01
287+
# point to a module function
288+
unmodeled_cause_specific_mortality_rate: "my_module.data::load_unmodeled_csmr"
289+
# Or point to different artifact data
290+
life_expectancy: "alternative.life_expectancy.data"
291+
292+
Limitations and When to Override
293+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
294+
295+
The automatic ``data_sources`` mechanism works well for straightforward cases,
296+
but some scenarios require overriding the ``build_all_lookup_tables()`` method:
297+
298+
**Non-standard value columns:**
299+
The component defaults to ``["value"]`` as the value column name. If your
300+
data has differently named value columns or multiple value columns, you
301+
must call ``build_lookup_table()`` directly with explicit
302+
``value_columns``.
303+
304+
**Complex data transformations:**
305+
When data requires transformation before building tables (e.g., pivoting,
306+
computing derived parameters, combining multiple data sources), override
307+
``build_all_lookup_tables()`` to perform the transformation first.
308+
309+
**Delegation to sub-components:**
310+
When lookup tables should be built by sub-components rather than the
311+
parent component, override ``build_all_lookup_tables()`` to skip the
312+
default behavior.
313+
314+
Examples of these patterns can be found in ``vivarium_public_health``:
315+
316+
- ``RateTransition`` and ``DiseaseState`` in ``vivarium_public_health.disease``
317+
demonstrate the basic ``data_sources`` pattern with various data source types.
318+
- ``Risk`` in ``vivarium_public_health.risks`` overrides ``build_all_lookup_tables()``
319+
to delegate table construction to its exposure distribution sub-component.
320+
321+
Using the Lookup Interface Directly
322+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
323+
324+
For cases not covered by ``data_sources``, or when working in an interactive
325+
context, you can build lookup tables directly using the builder's lookup
326+
interface.
327+
145328
Example Usage
146329
~~~~~~~~~~~~~
147330

148331
The following is an example of creating and calling a lookup table in an
149-
:ref:`interactive setting <interactive_tutorial>` using the data above. The
150-
interface and process are the same when integrating a lookup table into a
151-
:term:`component <Component>`, which is primarily how they are used. Assuming
152-
you have a valid simulation object named ``sim`` and the data from the above
153-
table in a :class:`pandas.DataFrame` named ``data``, you can construct a
154-
lookup table in the following way, using the interface from the builder.
332+
:ref:`interactive setting <interactive_tutorial>` using the data from
333+
`Construction Parameters`_ above. The interface and process are the same when
334+
integrating a lookup table into a :term:`component <Component>`, which is primarily
335+
how they are used. Assuming you have a valid simulation object named ``sim`` and
336+
the data from the above table in a :class:`pandas.DataFrame` named ``data``, you
337+
can construct a lookup table in the following way, using the interface from the builder.
155338

156339
.. code-block:: python
157340

0 commit comments

Comments
 (0)