Skip to content

Add safety-weighted routing cost model (bike_safety_routing_costs) #113

@MattTriano

Description

@MattTriano

Context

We have bike_safety_weighted_edges, a dbt model that enriches ~663k OSMnx bike network edges with bike infrastructure quality (quality_tier, 1–10) and time-decayed crash scores (crash_score_per_meter). We need to collapse these dimensions into a single scalar cost per edge that a shortest-path algorithm can use to produce safety-optimized bike routes.

This model is the last step before the graph is exported for use by a routing service.

Design decisions

Cost function shape

We chose a multiplicative form:

cost = length_m * (1 + alpha * crash_score_per_meter + beta * tier_penalty)

Where tier_penalty = (quality_tier - 1) / 9.0, mapping tiers 1–10 onto 0.0–1.0.

The key property of this form is that distance is always the dominant factor — the penalty terms scale the base distance cost up, but never replace it. This means the router will never send someone on a massive detour to avoid a marginally safer street. The alpha and beta coefficients control how aggressively the router penalizes crash history and lack of infrastructure, respectively.

We considered an additive form (length_m + alpha * crash_score + beta * tier_penalty) but rejected it because it would decouple safety cost from distance, making short dangerous edges cheap in absolute terms and potentially creating unintuitive routes.

Starting coefficients

alpha = 100: calibrated against the crash score distribution. The median crash_score_per_meter is 0 (most edges have no crash history). The 90th percentile is ~0.005. At alpha = 100, a p90 crash edge costs about 50% more than an equivalent crash-free edge. Extreme outliers (max ~6.2, likely short edges near crash clusters) get penalized very heavily, which is appropriate.

beta = 0.5: a street with no bike infrastructure (tier 10, penalty = 1.0) costs 1.5x a dedicated cycleway (tier 1, penalty = 0.0) of the same length. This is a moderate preference — noticeable but not so strong that the router avoids all non-infra streets.

Reverse cost

Bidirectional edges get reverse_cost = cost. One-way edges get reverse_cost = -1, which signals to the routing layer that traversal in the reverse direction is not permitted.

Columns included

The model outputs u, v, key, cost, reverse_cost, geom, plus length_m, quality_tier, and crash_score_per_meter. The last three are not needed by the routing algorithm but are useful for debugging and understanding why a particular route was chosen.

Tuning guidelines

The coefficients alpha and beta are defined as Jinja variables at the top of the model file. To tune them:

  1. Pick 5–10 origin/destination pairs you know well in Chicago (e.g., home to work, routes you actually ride).
  2. Run the router with the current coefficients and inspect the suggested routes on a map.
  3. Ask: does the route take reasonable detours to prefer bike infrastructure? Does it avoid known dangerous intersections? Does it ever suggest absurdly long detours?
  4. Adjust alpha up if it doesn't avoid crashes enough, down if it detours too aggressively around crash history. Same logic for beta and infrastructure preference.

A more systematic approach (for later): build an evaluation set of 20–30 origin/destination pairs with manually ranked candidate routes, then grid-search alpha and beta to minimize disagreement with your rankings.

Interpreting the crash score distribution

  • Median = 0: most edges have no crash history at all
  • 90th percentile ≈ 0.005: edges with any meaningful crash exposure
  • Max ≈ 6.2: extreme outliers, likely very short edges adjacent to crash clusters
  • The time-decay factor (lambda = 0.4 in bike_safety_weighted_edges) means older crashes contribute less — a 2-year-old crash contributes ~45% of a recent one

Extending the cost function

The multiplicative form makes it straightforward to add new penalty terms. Some possibilities:

  • Speed limit penalty: higher speed limits → more dangerous for cyclists. Add a term like gamma * speed_penalty(maxspeed).
  • Lighting penalty: unlit edges at night could carry extra cost. This would require a time-of-day parameter at query time.
  • Surface quality: unpaved or poor-surface edges could be penalized.
  • Elevation change: if elevation data is added to edges, uphill segments could carry higher cost (modeling effort, not just safety).

Each new term follows the same pattern: normalize to a 0–1 scale, multiply by a coefficient, add it inside the parentheses.

Implementation

  • Add bike_safety_routing_costs.sql to the marts layer
  • Run the model and spot-check cost values at various percentiles
  • Verify that one-way edges have reverse_cost = -1
  • Test a few shortest-path queries manually to sanity-check routes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions