Context
We have bike_safety_weighted_edges, a dbt model that enriches ~663k OSMnx bike network edges with bike infrastructure quality (quality_tier, 1–10) and time-decayed crash scores (crash_score_per_meter). We need to collapse these dimensions into a single scalar cost per edge that a shortest-path algorithm can use to produce safety-optimized bike routes.
This model is the last step before the graph is exported for use by a routing service.
Design decisions
Cost function shape
We chose a multiplicative form:
cost = length_m * (1 + alpha * crash_score_per_meter + beta * tier_penalty)
Where tier_penalty = (quality_tier - 1) / 9.0, mapping tiers 1–10 onto 0.0–1.0.
The key property of this form is that distance is always the dominant factor — the penalty terms scale the base distance cost up, but never replace it. This means the router will never send someone on a massive detour to avoid a marginally safer street. The alpha and beta coefficients control how aggressively the router penalizes crash history and lack of infrastructure, respectively.
We considered an additive form (length_m + alpha * crash_score + beta * tier_penalty) but rejected it because it would decouple safety cost from distance, making short dangerous edges cheap in absolute terms and potentially creating unintuitive routes.
Starting coefficients
alpha = 100: calibrated against the crash score distribution. The median crash_score_per_meter is 0 (most edges have no crash history). The 90th percentile is ~0.005. At alpha = 100, a p90 crash edge costs about 50% more than an equivalent crash-free edge. Extreme outliers (max ~6.2, likely short edges near crash clusters) get penalized very heavily, which is appropriate.
beta = 0.5: a street with no bike infrastructure (tier 10, penalty = 1.0) costs 1.5x a dedicated cycleway (tier 1, penalty = 0.0) of the same length. This is a moderate preference — noticeable but not so strong that the router avoids all non-infra streets.
Reverse cost
Bidirectional edges get reverse_cost = cost. One-way edges get reverse_cost = -1, which signals to the routing layer that traversal in the reverse direction is not permitted.
Columns included
The model outputs u, v, key, cost, reverse_cost, geom, plus length_m, quality_tier, and crash_score_per_meter. The last three are not needed by the routing algorithm but are useful for debugging and understanding why a particular route was chosen.
Tuning guidelines
The coefficients alpha and beta are defined as Jinja variables at the top of the model file. To tune them:
- Pick 5–10 origin/destination pairs you know well in Chicago (e.g., home to work, routes you actually ride).
- Run the router with the current coefficients and inspect the suggested routes on a map.
- Ask: does the route take reasonable detours to prefer bike infrastructure? Does it avoid known dangerous intersections? Does it ever suggest absurdly long detours?
- Adjust
alpha up if it doesn't avoid crashes enough, down if it detours too aggressively around crash history. Same logic for beta and infrastructure preference.
A more systematic approach (for later): build an evaluation set of 20–30 origin/destination pairs with manually ranked candidate routes, then grid-search alpha and beta to minimize disagreement with your rankings.
Interpreting the crash score distribution
- Median = 0: most edges have no crash history at all
- 90th percentile ≈ 0.005: edges with any meaningful crash exposure
- Max ≈ 6.2: extreme outliers, likely very short edges adjacent to crash clusters
- The time-decay factor (lambda = 0.4 in
bike_safety_weighted_edges) means older crashes contribute less — a 2-year-old crash contributes ~45% of a recent one
Extending the cost function
The multiplicative form makes it straightforward to add new penalty terms. Some possibilities:
- Speed limit penalty: higher speed limits → more dangerous for cyclists. Add a term like
gamma * speed_penalty(maxspeed).
- Lighting penalty: unlit edges at night could carry extra cost. This would require a time-of-day parameter at query time.
- Surface quality: unpaved or poor-surface edges could be penalized.
- Elevation change: if elevation data is added to edges, uphill segments could carry higher cost (modeling effort, not just safety).
Each new term follows the same pattern: normalize to a 0–1 scale, multiply by a coefficient, add it inside the parentheses.
Implementation
Context
We have
bike_safety_weighted_edges, a dbt model that enriches ~663k OSMnx bike network edges with bike infrastructure quality (quality_tier, 1–10) and time-decayed crash scores (crash_score_per_meter). We need to collapse these dimensions into a single scalar cost per edge that a shortest-path algorithm can use to produce safety-optimized bike routes.This model is the last step before the graph is exported for use by a routing service.
Design decisions
Cost function shape
We chose a multiplicative form:
Where
tier_penalty = (quality_tier - 1) / 9.0, mapping tiers 1–10 onto 0.0–1.0.The key property of this form is that distance is always the dominant factor — the penalty terms scale the base distance cost up, but never replace it. This means the router will never send someone on a massive detour to avoid a marginally safer street. The
alphaandbetacoefficients control how aggressively the router penalizes crash history and lack of infrastructure, respectively.We considered an additive form (
length_m + alpha * crash_score + beta * tier_penalty) but rejected it because it would decouple safety cost from distance, making short dangerous edges cheap in absolute terms and potentially creating unintuitive routes.Starting coefficients
alpha = 100: calibrated against the crash score distribution. The mediancrash_score_per_meteris 0 (most edges have no crash history). The 90th percentile is ~0.005. Atalpha = 100, a p90 crash edge costs about 50% more than an equivalent crash-free edge. Extreme outliers (max ~6.2, likely short edges near crash clusters) get penalized very heavily, which is appropriate.beta = 0.5: a street with no bike infrastructure (tier 10, penalty = 1.0) costs 1.5x a dedicated cycleway (tier 1, penalty = 0.0) of the same length. This is a moderate preference — noticeable but not so strong that the router avoids all non-infra streets.Reverse cost
Bidirectional edges get
reverse_cost = cost. One-way edges getreverse_cost = -1, which signals to the routing layer that traversal in the reverse direction is not permitted.Columns included
The model outputs
u,v,key,cost,reverse_cost,geom, pluslength_m,quality_tier, andcrash_score_per_meter. The last three are not needed by the routing algorithm but are useful for debugging and understanding why a particular route was chosen.Tuning guidelines
The coefficients
alphaandbetaare defined as Jinja variables at the top of the model file. To tune them:alphaup if it doesn't avoid crashes enough, down if it detours too aggressively around crash history. Same logic forbetaand infrastructure preference.A more systematic approach (for later): build an evaluation set of 20–30 origin/destination pairs with manually ranked candidate routes, then grid-search
alphaandbetato minimize disagreement with your rankings.Interpreting the crash score distribution
bike_safety_weighted_edges) means older crashes contribute less — a 2-year-old crash contributes ~45% of a recent oneExtending the cost function
The multiplicative form makes it straightforward to add new penalty terms. Some possibilities:
gamma * speed_penalty(maxspeed).Each new term follows the same pattern: normalize to a 0–1 scale, multiply by a coefficient, add it inside the parentheses.
Implementation
bike_safety_routing_costs.sqlto the marts layerreverse_cost = -1