Project Report: Aadhaar Biometric Forecast

Predicting Mandatory Update Demand Among Youth (Age 15)

1. Executive Summary

This project successfully developed a Machine Learning model to forecast the volume of mandatory biometric updates for 15-year-olds across Indian states. By analyzing historical enrolment and update trends, we built a Gradient Boosting Regressor that achieves ~70% accuracy (). The model identifies critical bottlenecks in advance (e.g., a predicted surge in Uttar Pradesh in Jan 2026), enabling policymakers to allocate resources proactively rather than reactively.

2. Problem Statement

The Challenge: Every Indian resident must update their biometrics (fingerprint, iris, photo) upon turning 15. Failure to do so leads to authentication failures.
The Gap: Current administrative planning is often reactive. There is no predictive tool to estimate when and where teenagers will show up for updates.
The Goal: Build a time-series forecasting model to predict monthly biometric update volume by state for the next quarter.

3. Methodology: How We Did It

A. Data Strategy

We utilized official Open Data from UIDAI, specifically:

Biometric Update Dataset: The target variable (specifically bio_age_5_17).
Aadhaar Enrolment Dataset: The predictor signal (population density of age_5_17).
Demographic Update Dataset: Tested as a leading indicator (discarded later due to noise).

B. Data Preparation

Aggregation: We aggregated millions of raw rows into a structured time-series format: [State, Month, Year, Count].
Cleaning: Handled missing dates and aligned inconsistent time formats.
Feature Engineering: This was the critical success factor. We created "Lag Features" (Memory), teaching the model that updates last month are the strongest predictor of updates this month.

4. The Experiment Log: What Worked vs. What Failed

We followed a rigorous scientific process, iterating through 9 different versions (V1–V9).

Experiment	Approach	Result ()	Verdict
V1–V2	Baseline Random Forest

Grid Search for optimal settings. | 0.65 | Failed. Confirmed that our V5 baseline settings were already optimal. |

5. The Solution

We selected Model V5 as the final production model.

Algorithm: HistGradientBoostingRegressor (Scikit-Learn implementation of LightGBM).
Key Features:
updates_last_month: The primary driver (Momentum).
age_5_17: The population base (Capacity).
month: Seasonality (School holidays/Exam cycles).
Validation Strategy: Time-series split (Training on past, Testing on "future" unseen data).
Final Accuracy: 70% (). In human behavioral forecasting, this is considered a high-performance score.

6. Key Findings & Actionable Insights

A. Bottleneck Alert (Jan 2026)

The model forecasts a massive surge in update demand for Q1 2026.

Hotspot: Uttar Pradesh.
Projected Volume: >600,000 updates required in January alone.
Recommendation: Deploy 30% of mobile enrollment units to UP districts immediately to prevent overcrowding.

7. Conclusion

This project demonstrates that machine learning can effectively guide Aadhaar administrative planning. By shifting from reactive counting to proactive forecasting, UIDAI can ensure smoother service delivery for millions of Indian teenagers.

While we explored granular District-Level modeling ( 0.64), we concluded that State-Level forecasting ( 0.70) offers the optimal balance of accuracy and reliability for national resource allocation.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api_data_aadhar_biometric		api_data_aadhar_biometric
api_data_aadhar_demographic		api_data_aadhar_demographic
api_data_aadhar_enrolment		api_data_aadhar_enrolment
.DS_Store		.DS_Store
Aadhaar_Forecast_Q1_2026_FINAL.csv		Aadhaar_Forecast_Q1_2026_FINAL.csv
Champion_Model_Performance.png		Champion_Model_Performance.png
Jan2026_Hotspots.png		Jan2026_Hotspots.png
Policy_Stress_Test.png		Policy_Stress_Test.png
README.md		README.md
generate_forecast.py		generate_forecast.py
policy_scenario_tool.py		policy_scenario_tool.py
training_model.py		training_model.py
visualize_hotspots.py		visualize_hotspots.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Report: Aadhaar Biometric Forecast

1. Executive Summary

2. Problem Statement

3. Methodology: How We Did It

A. Data Strategy

B. Data Preparation

4. The Experiment Log: What Worked vs. What Failed

5. The Solution

6. Key Findings & Actionable Insights

A. Bottleneck Alert (Jan 2026)

7. Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Report: Aadhaar Biometric Forecast

1. Executive Summary

2. Problem Statement

3. Methodology: How We Did It

A. Data Strategy

B. Data Preparation

4. The Experiment Log: What Worked vs. What Failed

5. The Solution

6. Key Findings & Actionable Insights

A. Bottleneck Alert (Jan 2026)

7. Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages