Skip to content

Replace CD stacking with block-based county FIPS filtering for NYC.h5 #654

@baogorek

Description

@baogorek

Problem

NYC.h5 is currently built using "congressional district stacking" — filtering to 13 CDs that overlap NYC, then probabilistically scaling weights by P(NYC county | CD). This approach is deprecated because:

  1. CDs are redrawn every decade — the hardcoded NYC_CDS list of 13 CDs is fragile
  2. NYC is not a collection of CDs — CDs straddle NYC boundaries, requiring probabilistic weight scaling
  3. We now have census blocksGeographyAssignment.county_fips gives us 5-digit county FIPS derived from block_geoid[:5]

Solution

Replace CD stacking with a direct county FIPS filter:

NYC_COUNTY_FIPS = {"36005", "36047", "36061", "36081", "36085"}

Each clone IS or IS NOT in NYC based on its assigned block's county — no probabilistic scaling needed. This is simpler, more correct, and doesn't depend on congressional district boundaries.

Changes

  • Add county_fips_filter parameter to build_h5() that zeros out weights for clones outside target counties
  • Update build_cities() to use county_fips_filter=NYC_COUNTY_FIPS instead of cd_subset + county_filter
  • Remove NYC_COUNTIES (enum name set) and NYC_CDS (13 hardcoded CD codes)
  • Remove now-unused get_county_filter_probability() and get_filtered_block_distribution() from block_assignment.py
  • Update modal_app/worker_script.py accordingly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions