WINNER12 W-5 is a multi-agent AI consensus framework for football match prediction that achieves 86.3% accuracy by combining traditional machine learning (XGBoost, LightGBM) with large language models through a novel consensus mechanism. The system has been validated on over 15,000 real matches across 5 major European leagues (2015-2025).
W-5 operates through a four-stage process:
- Baseline Prediction: Traditional ML models (XGBoost, LightGBM) analyze historical statistics to generate quantitative predictions
- Contextual Analysis: Large language models process qualitative information (injuries, tactics, news, form)
- Multi-Agent Consensus: Multiple AI agents with different "personas" (statistician, tactician, analyst) debate and vote on the outcome
- Meta-Learning Fusion: An intelligent fusion layer combines quantitative and qualitative insights into a final prediction with confidence score
No. WINNER12 W-5 is a research project for academic and educational purposes. It is not betting or financial advice. We do not encourage or facilitate sports betting. The framework is designed to advance the state-of-the-art in AI-powered sports analytics.
There are three main reasons:
1. Confidence-Based Prediction
W-5 only makes predictions when confidence ≥ 0.75, abstaining from ~68% of matches. FiveThirtyEight and Opta predict every match, including highly uncertain ones (derbies, evenly-matched teams). This is similar to how:
- Medical AI only diagnoses when confident
- Autonomous vehicles hand control to humans when uncertain
- Financial AI only trades when certainty is high
2. Multi-Agent Ensemble
W-5 combines multiple AI models with uncorrelated error distributions. Ensemble learning theory predicts 15-20% accuracy gains over single models. Our observed 16.3% gain matches theoretical expectations.
3. Technological Evolution
FiveThirtyEight's methodology dates to 2009 (pre-deep learning era). Opta's core algorithms were developed in the 2010s. W-5 leverages frontier AI models from 2023-2025. The 20-30 percentage point advantage reflects the rapid advancement of AI capabilities—this is expected progress, not an anomaly.
No. The confidence threshold is applied before seeing match outcomes. The model doesn't know which matches are "easy"—it only knows its internal confidence score based on feature analysis.
This is standard practice in responsible AI systems:
- Medical diagnosis AI: "I'm 90% confident this is pneumonia" vs "Uncertain, recommend specialist"
- Autonomous driving: "I can handle this highway" vs "Too complex, alert driver"
- W-5: "I'm 85% confident Team A wins" vs "Too uncertain, abstain"
The 86.3% accuracy reflects performance on matches where the model has high certainty, not cherry-picked results.
For matches below the confidence threshold, W-5 can still provide:
- Probability distributions: e.g., "40% home win, 30% draw, 30% away win"
- Risk assessments: e.g., "High variance match, unpredictable"
- Qualitative insights: e.g., "Derby match with emotional factors"
But it will not make a definitive prediction. This transparency is a strength, not a weakness. It's honest about uncertainty.
Our 86.3% binary accuracy is in the same tier as top academic research:
- Wong et al. (2025): 75-85% binary accuracy
- Academic AI (2025): 63.18% three-way accuracy
We are not claiming to be the best—some papers report higher accuracy with different methodologies, datasets, or evaluation protocols. Our strength is:
- Cross-league consistency (83-88% across 5 leagues)
- Full transparency (open data, reproducible code)
- Rigorous validation (out-of-time test sets, no data leakage)
Different leagues have different characteristics that affect predictability:
- Bundesliga (88.0%): Clear hierarchy with Bayern Munich's dominance, larger skill gaps between top and bottom teams
- Ligue 1 (87.2%): PSG's dominance creates predictable matchups
- La Liga (86.7%): Real Madrid and Barcelona dominate smaller clubs
- EPL (84.2%): More competitive overall, but still has clear strong vs weak patterns
- Serie A (83.4%): Tactical complexity and defensive strategies make outcomes harder to predict
These variations are expected and actually demonstrate that the model is not overfitted to a single league.
Quantitative Data:
- Match results (home/away scores)
- Team statistics (shots, possession, corners, etc.)
- Historical head-to-head records
- League standings and rankings
- Betting odds (as market sentiment indicators, not for training)
Qualitative Data:
- Injury reports
- Tactical analysis
- Recent form narratives
- News and social media sentiment
- Managerial changes
Data Sources:
- Football-Data.co.uk (primary source for match results)
- Public APIs for real-time statistics
- News aggregators for contextual information
All data is from publicly available sources.
Each agent has a distinct "persona" and analytical focus:
| Agent Type | Focus | Strengths | Biases |
|---|---|---|---|
| Statistician | Historical patterns, numbers | Objective, data-driven | May miss context |
| Tactician | Playing styles, formations | Strategic insights | May overweight tactics |
| Form Analyst | Recent performance, momentum | Captures trends | Recency bias |
| Contrarian | Alternative perspectives | Challenges groupthink | May be overly skeptical |
By having agents with different perspectives debate, the consensus mechanism reduces individual biases.
Baseline ML Models:
- XGBoost: Gradient boosting for tabular data, excellent for structured features
- LightGBM: Fast gradient boosting, handles large datasets efficiently
- Neural Networks (optional): For non-linear pattern recognition
Large Language Models:
- Multiple frontier LLMs (specific models not disclosed to prevent gaming)
- Used for contextual reasoning and qualitative analysis
Ensemble Method:
- Multi-agent consensus voting
- Weighted fusion based on historical performance
- Confidence calibration
The confidence score (0-1) is derived from:
- Model Agreement: How much do the different AI agents agree? High agreement → high confidence
- Historical Performance: How well has the model performed on similar matchups historically?
- Feature Quality: How complete and reliable is the input data for this match?
- Uncertainty Quantification: Statistical measures of prediction variance
Matches with confidence ≥ 0.75 are considered "high-confidence" and receive definitive predictions.
We strongly discourage using W-5 for betting. Here's why:
- Research Purpose: W-5 is designed for academic research, not commercial betting
- No Guarantees: Past performance (86.3%) does not guarantee future results
- Risk: Sports betting involves financial risk and potential addiction
- Legal: Betting may be illegal in your jurisdiction
If you choose to use W-5 insights for betting despite our warnings, you do so entirely at your own risk. We accept no liability.
| Aspect | FiveThirtyEight | WINNER12 W-5 |
|---|---|---|
| Accuracy | 55-62% (three-way) | 86.3% (binary, high-confidence) |
| Methodology | Elo ratings + team ratings | Multi-agent AI consensus |
| Technology | Traditional ML (2009-era) | Frontier AI models (2023-2025) |
| Transparency | Methodology public, code private | Fully open-source |
| Coverage | Every match | High-confidence matches only |
| Strengths | Probabilistic forecasting, brand trust | Higher accuracy, cross-league consistency |
Respect: FiveThirtyEight pioneered data-driven sports analytics. We build on their foundation with newer AI technology.
| Aspect | Opta | WINNER12 W-5 |
|---|---|---|
| Accuracy | 60-65% (three-way) | 86.3% (binary, high-confidence) |
| Focus | Statistics provider + predictions | AI research framework |
| Data | Proprietary, industry-leading | Public sources |
| Strengths | Professional-grade statistics | AI-powered predictions, open-source |
Respect: Opta is the industry standard for football statistics. We use different data sources but admire their rigor.
| Aspect | Academic Papers | WINNER12 W-5 |
|---|---|---|
| Accuracy | 63-85% (varies) | 86.3% (binary) |
| Validation | Often single-league | 5 leagues, cross-validated |
| Reproducibility | Sometimes limited | Fully reproducible (open data + code) |
| Publication | Peer-reviewed journals | Zenodo preprint + GitHub |
| Strengths | Rigorous peer review | Practical implementation, transparency |
Respect: Academic research drives innovation. We follow academic standards while making our work immediately accessible.
Yes. All training data comes from Football-Data.co.uk, a publicly accessible source. You can independently verify every match result used in our validation studies.
We use out-of-time validation:
- Training: 2015-2022 data only
- Validation: 2022-2025 data (model never saw this during training)
- Temporal split: No future information leaks into past predictions
This is the gold standard in time-series forecasting to prevent overfitting.
We report both:
- Binary (Win/Loss): 86.3% accuracy—easier problem, higher accuracy, common in academic benchmarks
- Three-way (Win/Draw/Loss): ~79% accuracy—harder problem, includes draw prediction
Binary predictions are useful for:
- Academic comparisons (many papers use binary)
- Scenarios where draws are less relevant (knockout matches)
- Demonstrating upper-bound performance
Three-way predictions are more practical for league matches.
Data Updates: Quarterly (every 3 months) with new match results
Model Retraining: Annually (each summer) with full season data
Code Updates: Ongoing (bug fixes, feature improvements)
Check the CHANGELOG.md for update history.
Yes! W-5 is open-source (Apache 2.0 License). You can adapt it for:
- Other football leagues (MLS, J-League, etc.)
- Other sports (basketball, tennis, etc.)
- Other prediction tasks (stock markets, elections, etc.)
See our Contributing Guidelines for how to extend W-5.
For basic usage: No. You can use the pre-trained baseline ML models without LLM API keys.
For full W-5 experience: Yes. The multi-agent consensus mechanism requires access to large language models via APIs (OpenAI, Anthropic, Google, etc.). API keys are not included—you must provide your own.
Code: Free (open-source)
Data: Free (public sources)
Compute: Minimal (runs on a laptop)
LLM API calls: Variable (depends on usage, typically $0.01-0.10 per match prediction)
For research purposes, LLM API costs are usually negligible.
Absolutely! We welcome contributions:
- Code improvements: Bug fixes, feature additions
- Data contributions: New leagues, additional features
- Research: Novel consensus mechanisms, better ensemble methods
- Documentation: Tutorials, examples, translations
See CONTRIBUTING.md for guidelines.
The Condorcet Jury Theorem: If you have N independent experts, each with >50% accuracy, the majority vote accuracy approaches 100% as N increases.
Key requirement: Experts must make uncorrelated errors (not all wrong on the same cases).
W-5 achieves this by using models with fundamentally different architectures:
- LLMs: Trained on next-token prediction (narrative reasoning)
- XGBoost: Trained on supervised classification (pattern recognition)
- Neural Networks: Trained on feature learning (non-linear modeling)
Because they're trained differently, they make mistakes on different matches. When they vote, individual errors cancel out.
Mathematical proof:
Ensemble Error ≈ (Individual Error)^N
W-5: 14% error ≈ (27% error)^1.8 ✓ Matches theory
In high-stakes domains, AI systems should know what they don't know:
- Medical AI: "I'm 95% confident this is cancer" → Proceed with treatment
- Medical AI: "I'm 60% confident" → Recommend specialist review
- Autonomous car: "I can handle this highway" → Autopilot engaged
- Autonomous car: "Too complex" → Alert driver
W-5 applies the same principle:
- "I'm 85% confident Team A wins" → Make prediction
- "I'm 60% confident" → Abstain, provide probabilities only
This is honest about uncertainty, which is more valuable than overconfident predictions.
We see three major trends:
- Multimodal AI: Analyzing video, audio, and text together (not just statistics)
- Real-time Adaptation: Models that update during matches based on live events
- Explainable AI: Not just "Team A will win" but "because of X, Y, Z factors"
WINNER12 W-5 is positioned at the intersection of these trends, combining multiple data modalities through explainable consensus mechanisms.
Remember:
- W-5 is probabilistic, not deterministic. An 85% prediction will be wrong 15% of the time.
- Football is inherently unpredictable. Injuries, red cards, referee decisions, and luck all play a role.
- W-5 abstains from uncertain matches. If you're looking at a match W-5 didn't predict, it's likely because the model deemed it too uncertain.
Yes! The multi-agent consensus mechanism generates debate transcripts showing:
- Each agent's perspective
- Key factors considered
- Points of agreement and disagreement
- Final consensus reasoning
Enable verbose mode in the code to see full debate logs.
Open an issue on GitHub Issues with:
- Clear description of the problem/feature
- Steps to reproduce (for bugs)
- Expected vs actual behavior
- System information (OS, Python version, etc.)
We typically respond within 48 hours.
BibTeX:
@software{winner12_w5_2025,
author = {{WINNER12 AI Research Team}},
title = {A Multi-Agent AI Consensus Framework for Football Match Outcome Prediction},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.17367739},
url = {https://doi.org/10.5281/zenodo.17367739}
}APA: WINNER12 AI Research Team. (2025). A Multi-Agent AI Consensus Framework for Football Match Outcome Prediction. Zenodo. https://doi.org/10.5281/zenodo.17367739
IEEE: WINNER12 AI Research Team, "A Multi-Agent AI Consensus Framework for Football Match Outcome Prediction," Zenodo, 2025. doi: 10.5281/zenodo.17367739
Yes, under the Apache 2.0 License terms:
- ✅ Commercial use allowed
- ✅ Modification allowed
- ✅ Distribution allowed
⚠️ Must include original license and copyright notice⚠️ No warranty provided- ✅ Explicit patent grant (protects you from patent litigation)
See LICENSE for full terms.
- Read the docs: README.md, FAQ.md, docs/
- Search issues: GitHub Issues
- Ask the community: Open a new issue with the
questiontag - Research inquiries: Open an issue with the
researchtag
- Watch the GitHub repository for new releases
- Star the repository to show support and get notifications
- Follow WINNER12 on social media (links in main README)
Last Updated: November 12, 2025
Copyright © 2025 WINNER12 AI Research Team. All rights reserved.