Skip to content

agsheves/decis-metaculus-bot1

 
 

Repository files navigation

Decis Frecasting Bots: Amended Metaculus bot

This repository contains a version of the basic Metaculus forecasting bot forked from metac-bot-template

Info and tournament rules are here

Performance

The bot began well but gradually fell down the rankings and by May 13 was in last place. 🙀

Hard Reset

On May 15 I conducted a hard reset to the Metaculus main fork. Next we will

  • Add back logging functionality
  • Revert to Deepseek r1 (Groq) for primary analysis

Issues, Tasks, Changelog

  • Need better records of how each question was handled
    • added logging to keep a better record of how each question was handled (resolved May 13)

General Tasks

  • Add and amend question parser from Orac
  • Add core SMES: geopolitics, markets, climate
  • Add benchmarking and feedback module (see below)

Bot Development

Based on the Ideas for bot improvement and work on the the Decis Reasoning Engine we plan to test and impliment:

  • 🎓 Expert personalities (SMEs)
  • 🌎 WorldSIM
  • 🧭 Orchestration to direct questions to the appropriate SME
  • 🦹 Adversarial agents to challenge predictions
  • 🔍 Supplimental research via the Decis country stability database

Core Metaculus functions and imported forecasting-tools will remain unchanged as these ensure proper perfoance related to the tournaments.

Contact

Contact [email protected]

Benchmarking

Provided in this project is an example of how to benchmark your bot's forecasts against the community prediction for questions on Metaculus. Running community_benchmark.py will run versions of your bot defined by you (e.g. with different LLMs or research paths) and score them on how close they are to the community prediction using expected baseline score (a proper score assuming the community prediction is the true probability). You will want to edit the file to choose which bot configurations you want to test and how many questions you want to test on. Any class inheriting from forecasting-tools.Forecastbot can be passed into the benchmarker. As of March 28, 2025 the benchmarker only works with binary questions.

To run a benchmark: poetry run python community_benchmark.py --mode run

To run a custom benchmark (e.g. remove background info from questions to test retrival): poetry run python community_benchmark.py --mode custom

To view a UI showing your scores, statistical error bars, and your bot's reasoning: poetry run streamlit run community_benchmark.py

See more information in the benchmarking section of the forecasting-tools repo

About

A simple bot template that you can use to forecast a Metaculus tournament

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%