NYC 311 Operational Efficiency Analysis

Statistical inference project analyzing 764K+ NYC 311 service requests to evaluate workload patterns, closure-time behavior, and on-time service performance across boroughs and workload conditions.

This project turns public-service request data into an operations-focused analytics case study. Using NYC 311 service request records, I built a Python-based inference workflow to study whether service efficiency differs across boroughs and whether higher daily workload is associated with slower closures and lower on-time performance.

Summary

NYC 311 Operational Efficiency Analysis Built an end-to-end statistical analysis pipeline in Python to evaluate operational efficiency across 764,584 NYC 311 service requests. Engineered closure-time, workload, and on-time performance metrics; diagnosed strong count overdispersion using Poisson and negative binomial models; compared borough-level service rates with two-proportion z-tests; and ran Monte Carlo power simulations to assess detectability of practical efficiency gaps.

Project Highlights

Analyzed 764,584 valid NYC 311 service requests across a defined closure cohort.
Engineered key operational metrics:
- Daily workload: number of requests closed per day.
- Time-to-close: days between request creation and closure.
- On-time closure: whether a request closed within 7 days.
Found strong overdispersion in daily closure counts:
- Mean daily closures: 10,475.90
- Variance: 2,857,086.14
- Variance-to-mean ratio: 272.73
Estimated an overall on-time closure rate of 85.70%.
Compared service performance across boroughs using confidence intervals and two-proportion z-tests.
Identified an association between higher workload days and lower on-time performance:
- High-load days: 84.17% on-time closure
- Low-load days: 87.62% on-time closure
Compared Gamma and lognormal models for right-skewed closure times.
Built Monte Carlo simulations to evaluate statistical power for detecting realistic service-performance gaps.

Problem

City service teams handle large volumes of 311 requests across many agencies, neighborhoods, and complaint types. Even when the overall service rate looks strong, operational bottlenecks can appear in specific locations or under heavier workload conditions.

This project frames NYC 311 data as an operational analytics problem:

Can statistical inference help identify workload pressure, service-delay patterns, and meaningful differences in on-time performance?

The goal was not only to run statistical tests, but to translate public-service data into insights that could support workload monitoring, performance reporting, and resource-allocation discussions.

Research Questions

RQ1: Do on-time closure rates differ across boroughs?

The project compares borough-level on-time closure performance using large-sample confidence intervals and pooled two-proportion z-tests. The main comparison focused on the largest borough samples.

RQ2: Are higher workload days associated with lower service efficiency?

Daily closure volume was split into high-load and low-load days using the median daily count. Requests were then linked to the workload condition of their closure day to compare on-time closure rates and median time-to-close.

This comparison is interpreted as descriptive and associational, not causal.

Methodology

1. Data Preparation

The analysis uses a filtered NYC 311 closure cohort with valid creation and closure timestamps. The workflow calculates:

time_to_close_days
on_time_7_days
daily closure counts
borough-level service summaries
high-load versus low-load group labels

2. Workload Count Modeling

Daily closure counts were first compared against a Poisson benchmark. Since the variance was much larger than the mean, the project used a negative binomial model as a more flexible working model for overdispersed count data.

3. Closure-Time Distribution Modeling

Time-to-close was modeled using Gamma and lognormal distributions because request closure times are non-negative and strongly right-skewed. Visual diagnostics and QQ plots were used to assess how well each distribution captured the long right tail.

4. On-Time Performance Inference

The project uses:

Wald confidence intervals for proportions
confidence intervals for differences in proportions
pooled two-proportion z-tests
group-level comparisons across borough and workload conditions

5. Design-Based Power Simulation

Monte Carlo simulation was used to estimate the power of two-proportion tests under different sample sizes and effect sizes. This helped evaluate whether practical differences in service performance would be detectable under realistic study designs.

Key Results

Area	Result
Final cohort size	764,584 requests
Daily closure count mean	10,475.90
Daily closure count variance	2,857,086.14
Variance-to-mean ratio	272.73
Overall on-time closure rate	85.70%
Brooklyn on-time rate	84.61%
Bronx on-time rate	87.89%
High-load on-time rate	84.17%
Low-load on-time rate	87.62%
High-load median time-to-close	0.4174 days
Low-load median time-to-close	0.1655 days

Interpretation

The results show that NYC 311 closure activity is highly variable, making a simple Poisson model too restrictive for daily workload counts. Closure times are also strongly right-skewed, meaning that a small share of requests remain open much longer than typical requests.

The group comparisons suggest that service efficiency is not uniform across boroughs and that higher workload days are associated with slower closures and lower on-time performance. These patterns could help operations teams flag workload pressure, monitor service performance, and identify areas for deeper investigation.

Because the analysis is observational, the results should not be interpreted as proof that borough or workload directly causes slower service. Future work could incorporate request type, agency, seasonality, day-of-week effects, and regression-based modeling.

Technical Skills Demonstrated

Data cleaning and cohort construction
Feature engineering for operational metrics
Exploratory data analysis
Statistical inference for proportions
Poisson and negative binomial count modeling
Gamma and lognormal distribution fitting
Monte Carlo simulation
Power analysis
Data visualization
Public-sector operations analytics
Python-based reproducible analysis

Tools and Libraries

Python
pandas
NumPy
SciPy
Matplotlib
Jupyter Notebook

Repository Structure

.
├── Analysis_Report.ipynb              # Main notebook for analysis, figures, and results
├── EDA.ipynb                          # Exploratory analysis notebook
├── distribution_estimation.py         # Core data preparation, modeling, inference, and plotting functions
├── efficiency_gap_power.py            # Monte Carlo power simulation for two-proportion tests
├── negative_binomial_check.py         # Legacy/wrapper script for negative binomial checks
├── distribution_estimation_legacy.py  # Older reference version of the modeling workflow
├── Project_Report.pdf		       # Research Report detailing entire analysis and results
├── data/                              # Local NYC 311 CSV exports
├── outputs/                           # Generated figures, exported artifacts
├── requirements.txt
└── README.md

How to Run

1. Clone the repository

git clone <repo-url>
cd Operational-Efficiency-NYC-311

2. Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate

For Windows:

.venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Add the data

Place a NYC 311 CSV export inside the data/ folder. The expected file pattern is:

311_Service_Requests_from_2020_to_Present_*.csv

5. Run the main analysis

Open Jupyter from the repository root and run:

Analysis_Report.ipynb

Or run the main script:

python distribution_estimation.py

To run the power simulation separately:

python efficiency_gap_power.py

Main Outputs

The project produces:

daily closure count plots
overdispersion diagnostics
time-to-close histograms
Gamma versus lognormal diagnostic plots
borough-level on-time closure comparisons
high-load versus low-load performance comparisons
Monte Carlo power simulation results

Generated figures and reports should be saved in the outputs/ directory.

Limitations

This project is designed as an inference-focused public-sector analytics case study. It does not claim causal effects because the comparisons do not fully adjust for agency, request type, borough composition, seasonality, or day-of-week patterns.

The negative binomial model is used as a practical overdispersion model rather than a full time-series model. The time-to-close distribution comparisons are descriptive and visual rather than a formal model-selection claim.

Future Improvements

Add regression models controlling for agency, complaint type, borough, and calendar effects.
Build a dashboard for monitoring on-time closure performance.
Add forecasting for daily request volume.
Compare service performance by complaint category.
Include geospatial analysis of delay hotspots.
Develop an automated reporting pipeline for city operations teams.

Project Takeaway

This project demonstrates how statistical inference can be applied to a real operational dataset to evaluate service efficiency, quantify performance differences, and identify workload-related service patterns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NYC 311 Operational Efficiency Analysis

Summary

Project Highlights

Problem

Research Questions

RQ1: Do on-time closure rates differ across boroughs?

RQ2: Are higher workload days associated with lower service efficiency?

Methodology

1. Data Preparation

2. Workload Count Modeling

3. Closure-Time Distribution Modeling

4. On-Time Performance Inference

5. Design-Based Power Simulation

Key Results

Interpretation

Technical Skills Demonstrated

Tools and Libraries

Repository Structure

How to Run

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Add the data

5. Run the main analysis

Main Outputs

Limitations

Future Improvements

Project Takeaway

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
outputs		outputs
.DS_Store		.DS_Store
.gitignore		.gitignore
Analysis_Report.ipynb		Analysis_Report.ipynb
EDA.ipynb		EDA.ipynb
Project_Report.pdf		Project_Report.pdf
README.md		README.md
distribution_estimation.py		distribution_estimation.py
distribution_estimation_legacy.py		distribution_estimation_legacy.py
efficiency_gap_power.py		efficiency_gap_power.py
negative_binomial_check.py		negative_binomial_check.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

NYC 311 Operational Efficiency Analysis

Summary

Project Highlights

Problem

Research Questions

RQ1: Do on-time closure rates differ across boroughs?

RQ2: Are higher workload days associated with lower service efficiency?

Methodology

1. Data Preparation

2. Workload Count Modeling

3. Closure-Time Distribution Modeling

4. On-Time Performance Inference

5. Design-Based Power Simulation

Key Results

Interpretation

Technical Skills Demonstrated

Tools and Libraries

Repository Structure

How to Run

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Add the data

5. Run the main analysis

Main Outputs

Limitations

Future Improvements

Project Takeaway

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages