This guide is for contributors adding a new summary table to the processor/summarize/ package and, optionally, exposing it in the dashboard.
The short version is:
- Add or update a builder function in
processor/summarize/summaries/. - Register it in
processor/summarize/summary_specs.py. - Declare the builder contract once so summarize can derive empty fallback schema and dashboard-facing columns from the builder itself.
- Wire it into a dashboard page through
required_summary_idsif a page needs it. - Add tests.
A summary builder is a pure data step:
- input: prepared
RunDataplus normalizedConfig - output: one Polars
DataFrame
The builder should not know whether the dashboard is in live mode or export mode. It should also not special-case weighted vs unweighted logic; the cache layer handles that by swapping finalweight.
Each registered builder should also own its summary contract through processor/summarize/contracts.py. That contract is the single source of truth for:
- the typed empty fallback frame
- safe preflight prerequisites used by resilient summarize execution
- exported output-column metadata derived by
processor/summarize/schema.py
Add the summary to the most relevant existing topic module when possible:
processor/summarize/summaries/demographics.pyprocessor/summarize/summaries/daily_travel.pyprocessor/summarize/summaries/joint_travel.pyprocessor/summarize/summaries/long_term.pyprocessor/summarize/summaries/tour.pyprocessor/summarize/summaries/trip.pyprocessor/summarize/summaries/validation.pyprocessor/summarize/summaries/legacy.py
Create a new module only when the summary is a distinct topic area rather than just another table in an existing topic.
New summary builders should follow this signature and declare a contract:
@summary_contract(
schema={
"trip_mode": pl.Utf8,
"distance_bin": pl.Int64,
"freq": pl.Float64,
},
required_columns={"trips": ("trip_mode", "distance", "finalweight")},
)
def my_summary(rd: RunData, config: Config) -> pl.DataFrame:
...Expectations:
- Read from the prepared runtime tables on
RunData. - Aggregate
pl.col("finalweight").sum()instead of counting rows directly. - Return a plain
pl.DataFrame. - Let the contract define the typed empty fallback schema once.
- If the builder has more nuanced missing-data logic than the contract can express, return
empty_summary_frame(my_summary)or an equivalent typed empty frame with the same schema. - Keep domain-specific reshaping in the summary layer only if it is part of the table contract. Page-specific chart shaping belongs in the dashboard page.
Example skeleton:
@summary_contract(
schema={
"trip_mode": pl.Utf8,
"distance_bin": pl.Int64,
"freq": pl.Float64,
},
required_columns={"trips": ("trip_mode", "distance", "finalweight")},
)
def trip_distance_by_mode(rd: RunData, config: Config) -> pl.DataFrame:
if "distance" not in rd.trips.columns:
return empty_summary_frame(trip_distance_by_mode)
return (
rd.trips
.filter(pl.col("trip_mode").is_not_null())
.with_columns((pl.col("distance") / 5).floor().cast(pl.Int64).mul(5).alias("distance_bin"))
.group_by(["trip_mode", "distance_bin"])
.agg(pl.col("finalweight").sum().alias("freq"))
.sort(["trip_mode", "distance_bin"])
)Registration still happens in the SUMMARY_SPECS tuple:
SummarySpec("trip_distance_by_mode", "tripDistanceByMode", trips.trip_distance_by_mode)SummarySpec stays intentionally small:
summary_id: stable id used by dashboard pages and testsfilename: CSV filename stem written under each weighting mode directorybuilder: function that returns the summary table
The cache module derives these related structures from SUMMARY_SPECS:
SUMMARY_SPEC_BY_IDSUMMARY_FILENAME_BY_IDDEFAULT_SUMMARY_IDS
If the summary is not in SUMMARY_SPECS, it does not exist to the rest of the application.
If the new table is a reusable dashboard-facing contract, make sure its builder contract schema is correct. processor/summarize/schema.py now derives canonical output columns from the registered builders instead of maintaining a separate hand-written column map.
Do this when:
- dashboard pages depend on a stable shape
- the summary has fallback or empty-frame behavior that should keep the same columns
- you want
tests/test_runtime_canonical_columns.pyto enforce the contract
Skip it when the table is private, transitional, or not yet used as a stable dashboard input.
If a page should consume the summary:
- Add the summary id to the page's
PAGE.required_summary_ids. - Use
require_summary(...)orrequire_summaries(...)in a section render function. - Keep page-specific filtering and chart shaping in the page module.
Example:
PAGE = DashboardPageDefinition(
page_id="trip_distance",
title="Trip Distance",
page_cls=TripDistancePage,
required_summary_ids=("trip_distance_by_mode",),
)This is what makes the summary available through DashboardState and keeps live mode, export mode, and validation aligned.
Use this order when adding a new summary that will appear in the dashboard:
- Add
trip_distance_by_mode()toprocessor/summarize/summaries/trip.py. - Register it in
processor/summarize/summary_specs.pywith a stablesummary_id. - Add or update the builder contract schema if the page will treat it as a stable reusable table.
- Add a new page or update an existing page in
dashboard/pages/. - Declare the page dependency in
PAGE.required_summary_ids. - Add export selector metadata only if the page has page-local controls that must work in HTML export.
- Add tests covering the summary output shape and the page wiring.
Prefer adding or extending:
tests/test_runtime_canonical_columns.pyfor canonical prepared-column usage and output shapetests/test_summary_cache.pyfor cache-layer registration or manifest behavior
Test at least:
- weighted and unweighted paths behave through
finalweight - missing-column fallback returns the expected empty schema
- unavailable and failed summaries are recorded with explicit manifest state
- output columns remain stable
If the summary is used by a page, add or extend:
tests/test_dashboard_live.pytests/test_export_html.py
Test at least:
- the page validates and refreshes using the new summary id
- export works if the page participates in HTML export
- page selectors still serialize correctly if the new summary changes available options
- counting rows instead of summing
finalweight - reading raw ActivitySim column names directly when
prepare_data()already provides canonical aliases - registering the builder locally but forgetting to add it to
SUMMARY_SPECS - duplicating output schema in both the builder and
processor/summarize/schema.py - returning different columns from the empty-data path and the populated-data path
- putting chart-specific reshaping into the summary table when it belongs in the page
processor/summarize/cache.pyfor registration and weighting behaviorprocessor/summarize/contracts.pyfor builder contracts and typed empty fallback helpersprocessor/models.pyfor the preparedRunDatacontractprocessor/prepare/enrichment/pipeline.pyfor theprepare_data()entrypointprocessor/prepare/enrichment/canonicalize.pyandprocessor/prepare/enrichment/columns.pyfor canonical column preparation helpersprocessor/summarize/schema.pyfor dashboard-facing output contractstests/test_runtime_canonical_columns.pyfor the expected testing style- adding-dashboard-pages.md if the summary will be displayed in the UI