This project analyzes the risk of latent tuberculosis (TB) reactivation among high-risk populations in the United States.
This work was developed as part of my research at Harvard Global Health, focusing on understanding how historical TB infections contribute to current TB incidence and how risk varies across population groups.
In low-transmission settings like the United States, a large proportion of TB cases arise from reactivation of latent TB infections acquired in the past. Identifying high-risk populations is critical for targeted prevention strategies such as TB preventive therapy (TPT).
This project estimates TB reactivation risk by combining multiple public health datasets and modeling uncertainty across inputs.
-
Integrated multiple datasets including:
- NHANES (demographic, lab, and questionnaire data)
- HIV Surveillance data
- National health surveys and administrative sources
-
Applied complex survey design methods using weighted analyses (
surveypackage in R) -
Built logistic regression models to estimate associations between TB infection and risk factors:
- HIV
- Diabetes
- Renal disease
- Prior TB exposure
- Household TB contact
- Immunosuppression (e.g., prednisone use)
-
Used Monte Carlo simulation (10,000 draws) to propagate uncertainty across:
- TB case counts
- IGRA positivity estimates
- Test sensitivity/specificity
- Population estimates
-
Generated:
- Odds ratios for risk factors
- Population-level estimates
- 95% uncertainty intervals
- Data ingestion and cleaning (NHANES + external datasets)
- Feature engineering (risk factors, demographic variables)
- Survey design setup (weights, strata, PSU)
- Logistic regression models (survey-adjusted)
- Post-processing (odds ratios, export tables)
- Survey data analysis (complex sampling, weighting)
- Statistical modeling (logistic regression, uncertainty propagation)
- Data integration across multiple public datasets
- Reproducible research workflows in R
- This repository contains selected scripts used for analysis.
- Data sources are publicly available but may require preprocessing or access permissions.