Skip to content

Latest commit

 

History

History
56 lines (45 loc) · 2.34 KB

File metadata and controls

56 lines (45 loc) · 2.34 KB

Data Contract — USDA Rural Opportunity Navigator

Version: 0.1
Last Updated: 2025-10-11 13:45 CT
Maintainers: Dataset Team (Victor + Cameron)
Storage Limit: ≤ 0.4 GB total (.parquet, Snappy-compressed)


1. Purpose

Defines structure, validation rules, and exchange standards for datasets powering the Rural Opportunity Navigator prototype.
All data must meet this contract before ingestion into the backend API or dashboard.


2. Dataset Overview

Dataset Description File Target Size Update Frequency
Programs Clean Normalized USDA + federal program data data-samples/programs_clean.parquet ≤ 150 MB Static
Programs + Income Joined with Census/ERS income tiers data-samples/programs_with_income.parquet ≤ 250 MB Static

3. Unified Schema

Field Type Example Description Validation
state string (2) "MO" U.S. state code Must match /^[A-Z]{2}$/
county_fips string (5) "29095" County FIPS code 5-digit numeric
program_name string "Farm Loan Program" Program title Non-null ≤ 120 chars
agency string "USDA Rural Development" Admin agency Optional
intent_category string (enum) "equipment_purchase" User intent Must match set
industry string "Livestock" Sector Optional
income_band string (enum) "Low" Derived tier Low / Mid / High
funding_type string "Loan" Grant / Loan / Aid Optional
application_deadline date 2024-12-31 Deadline ISO-8601
resolved_flag bool true Case resolved true/false
contact_reference string (URL) https://rd.usda.gov/contact Contact link Valid URL
source string "USDA_RD_Portal" Provenance Required

4. Data Quality Rules

  • No nulls in state, county_fips, program_name.
  • ≤ 200 k rows; ≤ 30 columns.
  • Combined size ≤ 0.4 GB.
  • UTF-8 encoding; .parquet Snappy compression.
  • Drop or hash PII before commit.

5. Validation Process

  1. Run scripts/validate_schema.py.
  2. All pydantic models pass.
  3. Check total size ≤ 400 MB via
    du -sh data-samples/