Decis Frecasting Bots: Amended Metaculus bot

This repository contains a version of the basic Metaculus forecasting bot forked from metac-bot-template

Info and tournament rules are here

Performance

The bot began well but gradually fell down the rankings and by May 13 was in last place. 🙀

Hard Reset

On May 15 I conducted a hard reset to the Metaculus main fork. Next we will

Add back logging functionality
Revert to Deepseek r1 (Groq) for primary analysis

Issues, Tasks, Changelog

Need better records of how each question was handled
- added logging to keep a better record of how each question was handled (resolved May 13)

General Tasks

Add and amend question parser from Orac
Add core SMES: geopolitics, markets, climate
Add benchmarking and feedback module (see below)

Bot Development

Based on the Ideas for bot improvement and work on the the Decis Reasoning Engine we plan to test and impliment:

🎓 Expert personalities (SMEs)
🌎 WorldSIM
🧭 Orchestration to direct questions to the appropriate SME
🦹 Adversarial agents to challenge predictions
🔍 Supplimental research via the Decis country stability database

Core Metaculus functions and imported forecasting-tools will remain unchanged as these ensure proper perfoance related to the tournaments.

Contact

Contact [email protected]

Benchmarking

Provided in this project is an example of how to benchmark your bot's forecasts against the community prediction for questions on Metaculus. Running community_benchmark.py will run versions of your bot defined by you (e.g. with different LLMs or research paths) and score them on how close they are to the community prediction using expected baseline score (a proper score assuming the community prediction is the true probability). You will want to edit the file to choose which bot configurations you want to test and how many questions you want to test on. Any class inheriting from forecasting-tools.Forecastbot can be passed into the benchmarker. As of March 28, 2025 the benchmarker only works with binary questions.

To run a benchmark: poetry run python community_benchmark.py --mode run

To run a custom benchmark (e.g. remove background info from questions to test retrival): poetry run python community_benchmark.py --mode custom

To view a UI showing your scores, statistical error bars, and your bot's reasoning: poetry run streamlit run community_benchmark.py

See more information in the benchmarking section of the forecasting-tools repo

Name		Name	Last commit message	Last commit date
Latest commit History 4,962 Commits
.github/workflows		.github/workflows
logs		logs
.DS_Store		.DS_Store
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
community_benchmark.py		community_benchmark.py
logging_utils.py		logging_utils.py
main.py		main.py
main_with_no_framework.py		main_with_no_framework.py
old_readme.txt		old_readme.txt
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
test_logging.py		test_logging.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Decis Frecasting Bots: Amended Metaculus bot

Performance

Hard Reset

Issues, Tasks, Changelog

General Tasks

Bot Development

Contact

Benchmarking

About

Uh oh!

Releases

Packages

Languages

agsheves/decis-metaculus-bot1

Folders and files

Latest commit

History

Repository files navigation

Decis Frecasting Bots: Amended Metaculus bot

Performance

Hard Reset

Issues, Tasks, Changelog

General Tasks

Bot Development

Contact

Benchmarking

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages