Skip to content

wildcraft958/wb-election-2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

West Bengal 2026 Assembly Election: Data Analytics

By Animesh Raj. Twelve analyses across 293 constituencies and five Assembly cycles. Numbers reported as found.

Disclaimer

This is a portfolio data-engineering and statistics exercise on publicly available election data. It is not partisan commentary, journalism, or a definitive account of the 2026 West Bengal Assembly Election outcome. Every finding here is an illustration of methodology applied to public sources (Wikipedia AC tables, CEO West Bengal SIR PDF, Datameet shapefiles, 2011 Census, hand-curated defection news clippings) and is conditional on those sources plus the documented data caveats below. Headline state numbers (45.4%, 40.8%, 92.93%) are the published ECI values; per-AC numbers are computed from Wikipedia tables and may differ by 1 to 2 percentage points from official tallies. Anyone is welcome to replicate, dispute, or extend any specific number by running the script that produced it (./run_all.sh).

Headline findings (real numbers from the published ECI tally)

Metric Value
BJP seats 207 (up from 77 in 2021)
AITC seats 80 (down from 215)
BJP vote share swing +7.3 pp (38.2% to 45.4%)
AITC vote share swing -7.1 pp (47.9% to 40.8%)
Statewide turnout 92.93% (highest ever)
Voters removed during SIR ~91 lakh (11.88% of pre-SIR roll)

What the analyses say

# Analysis Key result
02 District-level swing BJP gained vote share in 23 of 24 districts. Three swung over 30 pp (Purulia, Bankura, Purba Bardhaman).
03 Turnout regression (OLS) Slope +0.53, R² = 0.34, p < 0.001. Where turnout fell, TMC fell harder.
04 SIR deletion impact In 27 of 207 BJP wins, estimated voter deletions exceed the BJP margin. Spearman ρ = +0.39, p < 0.001.
05 Opposition fragmentation In 68 of 207 BJP wins (33%), 3rd-place-onward votes alone exceed BJP margin.
06 Spatial autocorrelation Moran's I = 0.46, p = 0.001. Wave clustered in Jangalmahal; cold spot in Murshidabad-Malda.
07 Random Forest + SHAP 88% CV accuracy. 2021 baseline vote shares dominate predictions.
08 K-means typology (k=5) Five archetypes: Jangalmahal Sweep, Minority Fortress, SIR Battlegrounds, Urban Flippers, Mixed.
09 OLS margin model Adjusted R², coefficient table, residual diagnostics.
10 DiD on SIR exposure +9.2 pp gap; parallel-trends fail (5.02 pp/yr gap), so descriptive, not causal.
11 5-cycle panel 2006-2026 85 ACs voted Left → TMC → BJP across the three cycles.
12 Defection network Welch t-test p = 0.33; defections did not, on their own, predict where the wave broke.

A booth-level Form-20 + Benford's Law analysis is deferred until CEO West Bengal publishes Form-20 (expected June 2026). Parser scaffold lives in scripts/07_parse_form20.py.

Reproduce

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
playwright install chromium  # only used by future scrapers

./run_all.sh

Pipeline regenerates data/processed/master.csv, the figures under data/outputs/, and the LinkedIn carousel PDF.

To open the dashboard locally:

cd dashboard && python -m http.server 8765
# open http://localhost:8765

Method and caveats

  • Headline state numbers (45.4% / 40.8% / 92.93%) are the published ECI values.
  • Per-AC vote shares come from Wikipedia AC tables, which expose only the top two candidates per seat. Per-AC totals therefore under-attribute votes to 3rd-place candidates by 1 to 2 pp on average. The relative pattern (correlation, regression, clustering) is unaffected.
  • AC-level SIR deletion totals are estimated by allocating the published state aggregate proportionally to polling-station counts. The voter-level deletion lists themselves are captcha-gated on the ECI portal.
  • Falta (AC 144) is excluded; repoll on 21 May 2026, results 24 May.
  • DiD's parallel-trends assumption fails on the 2006-2021 panel, so the +9.2 pp DiD estimate is reported as descriptive, not causal.

Data sources

License

Code under MIT. Data under the licenses of the underlying sources (Datameet CC-BY 4.0, Wikipedia CC-BY-SA 4.0, ECI public-record tabulations, Census of India public release).

About

West Bengal 2026 Assembly Election: data analytics by Animesh Raj. Twelve analyses, 293 constituencies, five cycles. Interactive dashboard + LinkedIn carousel + end-to-end pipeline.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages