Skip to content

Takshg/Operational-Efficiency-NYC-311

Repository files navigation

NYC 311 Operational Efficiency Analysis

Statistical inference project analyzing 764K+ NYC 311 service requests to evaluate workload patterns, closure-time behavior, and on-time service performance across boroughs and workload conditions.

This project turns public-service request data into an operations-focused analytics case study. Using NYC 311 service request records, I built a Python-based inference workflow to study whether service efficiency differs across boroughs and whether higher daily workload is associated with slower closures and lower on-time performance.


Summary

NYC 311 Operational Efficiency Analysis Built an end-to-end statistical analysis pipeline in Python to evaluate operational efficiency across 764,584 NYC 311 service requests. Engineered closure-time, workload, and on-time performance metrics; diagnosed strong count overdispersion using Poisson and negative binomial models; compared borough-level service rates with two-proportion z-tests; and ran Monte Carlo power simulations to assess detectability of practical efficiency gaps.


Project Highlights

  • Analyzed 764,584 valid NYC 311 service requests across a defined closure cohort.
  • Engineered key operational metrics:
    • Daily workload: number of requests closed per day.
    • Time-to-close: days between request creation and closure.
    • On-time closure: whether a request closed within 7 days.
  • Found strong overdispersion in daily closure counts:
    • Mean daily closures: 10,475.90
    • Variance: 2,857,086.14
    • Variance-to-mean ratio: 272.73
  • Estimated an overall on-time closure rate of 85.70%.
  • Compared service performance across boroughs using confidence intervals and two-proportion z-tests.
  • Identified an association between higher workload days and lower on-time performance:
    • High-load days: 84.17% on-time closure
    • Low-load days: 87.62% on-time closure
  • Compared Gamma and lognormal models for right-skewed closure times.
  • Built Monte Carlo simulations to evaluate statistical power for detecting realistic service-performance gaps.

Problem

City service teams handle large volumes of 311 requests across many agencies, neighborhoods, and complaint types. Even when the overall service rate looks strong, operational bottlenecks can appear in specific locations or under heavier workload conditions.

This project frames NYC 311 data as an operational analytics problem:

Can statistical inference help identify workload pressure, service-delay patterns, and meaningful differences in on-time performance?

The goal was not only to run statistical tests, but to translate public-service data into insights that could support workload monitoring, performance reporting, and resource-allocation discussions.


Research Questions

RQ1: Do on-time closure rates differ across boroughs?

The project compares borough-level on-time closure performance using large-sample confidence intervals and pooled two-proportion z-tests. The main comparison focused on the largest borough samples.

RQ2: Are higher workload days associated with lower service efficiency?

Daily closure volume was split into high-load and low-load days using the median daily count. Requests were then linked to the workload condition of their closure day to compare on-time closure rates and median time-to-close.

This comparison is interpreted as descriptive and associational, not causal.


Methodology

1. Data Preparation

The analysis uses a filtered NYC 311 closure cohort with valid creation and closure timestamps. The workflow calculates:

  • time_to_close_days
  • on_time_7_days
  • daily closure counts
  • borough-level service summaries
  • high-load versus low-load group labels

2. Workload Count Modeling

Daily closure counts were first compared against a Poisson benchmark. Since the variance was much larger than the mean, the project used a negative binomial model as a more flexible working model for overdispersed count data.

3. Closure-Time Distribution Modeling

Time-to-close was modeled using Gamma and lognormal distributions because request closure times are non-negative and strongly right-skewed. Visual diagnostics and QQ plots were used to assess how well each distribution captured the long right tail.

4. On-Time Performance Inference

The project uses:

  • Wald confidence intervals for proportions
  • confidence intervals for differences in proportions
  • pooled two-proportion z-tests
  • group-level comparisons across borough and workload conditions

5. Design-Based Power Simulation

Monte Carlo simulation was used to estimate the power of two-proportion tests under different sample sizes and effect sizes. This helped evaluate whether practical differences in service performance would be detectable under realistic study designs.


Key Results

Area Result
Final cohort size 764,584 requests
Daily closure count mean 10,475.90
Daily closure count variance 2,857,086.14
Variance-to-mean ratio 272.73
Overall on-time closure rate 85.70%
Brooklyn on-time rate 84.61%
Bronx on-time rate 87.89%
High-load on-time rate 84.17%
Low-load on-time rate 87.62%
High-load median time-to-close 0.4174 days
Low-load median time-to-close 0.1655 days

Interpretation

The results show that NYC 311 closure activity is highly variable, making a simple Poisson model too restrictive for daily workload counts. Closure times are also strongly right-skewed, meaning that a small share of requests remain open much longer than typical requests.

The group comparisons suggest that service efficiency is not uniform across boroughs and that higher workload days are associated with slower closures and lower on-time performance. These patterns could help operations teams flag workload pressure, monitor service performance, and identify areas for deeper investigation.

Because the analysis is observational, the results should not be interpreted as proof that borough or workload directly causes slower service. Future work could incorporate request type, agency, seasonality, day-of-week effects, and regression-based modeling.


Technical Skills Demonstrated

  • Data cleaning and cohort construction
  • Feature engineering for operational metrics
  • Exploratory data analysis
  • Statistical inference for proportions
  • Poisson and negative binomial count modeling
  • Gamma and lognormal distribution fitting
  • Monte Carlo simulation
  • Power analysis
  • Data visualization
  • Public-sector operations analytics
  • Python-based reproducible analysis

Tools and Libraries

  • Python
  • pandas
  • NumPy
  • SciPy
  • Matplotlib
  • Jupyter Notebook

Repository Structure

.
├── Analysis_Report.ipynb              # Main notebook for analysis, figures, and results
├── EDA.ipynb                          # Exploratory analysis notebook
├── distribution_estimation.py         # Core data preparation, modeling, inference, and plotting functions
├── efficiency_gap_power.py            # Monte Carlo power simulation for two-proportion tests
├── negative_binomial_check.py         # Legacy/wrapper script for negative binomial checks
├── distribution_estimation_legacy.py  # Older reference version of the modeling workflow
├── Project_Report.pdf		       # Research Report detailing entire analysis and results
├── data/                              # Local NYC 311 CSV exports
├── outputs/                           # Generated figures, exported artifacts
├── requirements.txt
└── README.md

How to Run

1. Clone the repository

git clone <repo-url>
cd Operational-Efficiency-NYC-311

2. Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate

For Windows:

.venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Add the data

Place a NYC 311 CSV export inside the data/ folder. The expected file pattern is:

311_Service_Requests_from_2020_to_Present_*.csv

5. Run the main analysis

Open Jupyter from the repository root and run:

Analysis_Report.ipynb

Or run the main script:

python distribution_estimation.py

To run the power simulation separately:

python efficiency_gap_power.py

Main Outputs

The project produces:

  • daily closure count plots
  • overdispersion diagnostics
  • time-to-close histograms
  • Gamma versus lognormal diagnostic plots
  • borough-level on-time closure comparisons
  • high-load versus low-load performance comparisons
  • Monte Carlo power simulation results

Generated figures and reports should be saved in the outputs/ directory.


Limitations

This project is designed as an inference-focused public-sector analytics case study. It does not claim causal effects because the comparisons do not fully adjust for agency, request type, borough composition, seasonality, or day-of-week patterns.

The negative binomial model is used as a practical overdispersion model rather than a full time-series model. The time-to-close distribution comparisons are descriptive and visual rather than a formal model-selection claim.


Future Improvements

  • Add regression models controlling for agency, complaint type, borough, and calendar effects.
  • Build a dashboard for monitoring on-time closure performance.
  • Add forecasting for daily request volume.
  • Compare service performance by complaint category.
  • Include geospatial analysis of delay hotspots.
  • Develop an automated reporting pipeline for city operations teams.

Project Takeaway

This project demonstrates how statistical inference can be applied to a real operational dataset to evaluate service efficiency, quantify performance differences, and identify workload-related service patterns.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors