LLM Multi-Round Coding Tournament

This project implements a groundbreaking automated coding tournament where multiple advanced Large Language Models (LLMs) collaboratively compete and iteratively refine their solutions to complex coding challenges across multiple rounds. It's designed to harness collective intelligence, pushing LLM-generated code toward unprecedented quality and efficiency through continual peer feedback and integration.

Project Overview

The best way to understand this project and where it came from is to read the article that led directly to its creation: LLM Multi-Round Coding Tournament.

Why this Project Matters

Traditional methods for utilizing LLMs in software development typically involve single-query, single-response interactions. This project transforms that paradigm by:

Iterative Collaboration: Each model critically analyzes and improves upon solutions generated by others, progressively enhancing code quality across rounds.
Automated Code Synthesis: Merges the strongest features from multiple solutions into optimized, robust, and elegant code.
Deep Analysis: Tracks comprehensive performance metrics, including complexity estimates, execution efficiency, and solution robustness, enabling fine-grained insight into model capabilities.

Core Features

Multi-Round Refinement: Models iteratively build upon and refine each other's solutions.
Adaptive Prompt Engineering: Automatically constructs sophisticated prompts to guide models toward meaningful integration and optimization of solutions.
Robust Code Extraction: Utilizes AI-driven extraction to structure raw LLM outputs into well-formed, executable Python classes.
Concurrent Execution & Error Handling: Manages multiple LLM queries simultaneously, with built-in retry logic and error recovery for reliability.
Automated Testing & Metrics Collection: Generates automated test suites and comprehensive metrics to objectively evaluate each solution.
Analytics: Produces detailed markdown reports to clearly represent improvements and performance across tournament rounds.

Implementation Details

Automated Code Extraction and Structuring

The script employs an innovative approach by harnessing LLM capabilities to convert raw model responses into structured, executable Python code:

LLM-powered prompts precisely instruct AI to encapsulate solution code into cohesive, self-contained classes.
Resulting code is cached and versioned, ensuring consistency and ease of reuse across rounds.

Dynamic and Adaptive Prompts

Each tournament round dynamically synthesizes prompts by carefully analyzing solutions from the previous round, ensuring each iteration meaningfully integrates new insights and improvements. This method maximizes model performance by systematically leveraging collective strengths.

Comprehensive Performance Metrics

The project meticulously tracks and computes metrics such as:

Code complexity (functions, classes, decision points)
Efficiency metrics (execution time, output size)
Robustness indicators (error handling, edge-case coverage)

These metrics are critical for evaluating not just correctness, but the overall quality and maintainability of generated solutions.

Prerequisites

Python 3.13
API keys for LLM providers (Anthropic, OpenAI, Mistral)

Installation

Clone the repository:

git clone https://github.com/Dicklesworthstone/llm-tournament
cd llm-tournament

Install dependencies using uv:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python 3.13
source .venv/bin/activate
uv pip install -r requirements.txt

Configure API keys in the .env file:

 ANTHROPIC_API_KEY="your_anthropic_key"
 OPENAI_API_KEY="your_openai_key"
 MISTRAL_API_KEY="your_mistral_key"

File Structure

llm_tournament.py: Main automation script orchestrating the tournament.
challenge_prompt.md: Initial coding challenge.
messy_csv_sample.csv: Test data for evaluating solution accuracy.
README.md: Comprehensive project documentation.

Usage

Basic Execution

Run a full tournament cycle with default parameters:

python llm_tournament.py --prompt challenge_prompt.md --test-file messy_csv_sample.csv

If previous responses exist, the script intelligently skips redundant calls, ensuring efficiency.

Advanced Execution

Customize your tournament with detailed control:

python llm_tournament.py --prompt challenge_prompt.md --test-file messy_csv_sample.csv --rounds 3 --temperature 0.8 --concurrent-requests 4 --verbose

Command-Line Options

--prompt: Initial challenge prompt (required).
--rounds: Number of iterative refinement rounds (default: 5).
--output-dir: Directory for storing tournament artifacts (default: tournament_results).
--test-file: Test file for validating solutions.
--temperature: Controls creativity/randomness of LLM responses (default: 0.7).
--concurrent-requests: Limits concurrent API calls (default: 4).
--skip-tests: Skips solution validation tests.
--verbose: Enables detailed logging.

Outputs & Results

Running the script generates a comprehensive suite of outputs:

Individual model solutions organized by rounds.
Hybrid synthesized solutions representing collective model intelligence.
Detailed performance metrics, visual analytics, and insightful markdown reports.
A test harness facilitating straightforward evaluation and comparison of solutions.

Example Workflow

A typical execution:

python llm_tournament.py --prompt challenge_prompt.md --test-file messy_csv_sample.csv --rounds 3

This workflow will:

Prompt multiple LLMs with the challenge.
Collect, analyze, and refine solutions iteratively.
Automatically test each refined solution.
Generate detailed performance reports and visualization of improvement.

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Multi-Round Coding Tournament

Project Overview

Why this Project Matters

Core Features

Implementation Details

Automated Code Extraction and Structuring

Dynamic and Adaptive Prompts

Comprehensive Performance Metrics

Prerequisites

Installation

File Structure

Usage

Basic Execution

Advanced Execution

Command-Line Options

Outputs & Results

Example Workflow

License

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
metrics		metrics
output_results_for_each_round_and_model		output_results_for_each_round_and_model
tournament_results		tournament_results
.env		.env
.gitignore		.gitignore
README.md		README.md
challenge_prompt.md		challenge_prompt.md
llm-tournament.webp		llm-tournament.webp
llm_tournament.py		llm_tournament.py
messy_csv_sample.csv		messy_csv_sample.csv
requirements.txt		requirements.txt

Dicklesworthstone/llm-tournament

Folders and files

Latest commit

History

Repository files navigation

LLM Multi-Round Coding Tournament

Project Overview

Why this Project Matters

Core Features

Implementation Details

Automated Code Extraction and Structuring

Dynamic and Adaptive Prompts

Comprehensive Performance Metrics

Prerequisites

Installation

File Structure

Usage

Basic Execution

Advanced Execution

Command-Line Options

Outputs & Results

Example Workflow

License

About

Topics

Resources

Stars

Watchers

Forks

Languages