Skip to content

Commit 0ab50fa

Browse files
first commit
0 parents  commit 0ab50fa

39 files changed

Lines changed: 7019 additions & 0 deletions

.gitignore

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Python-generated files
2+
__pycache__/
3+
*.py[oc]
4+
build/
5+
dist/
6+
wheels/
7+
*.egg-info
8+
9+
# Virtual environments
10+
.venv
11+
12+
# Keys and secrets
13+
.env
14+
15+
# Data files
16+
data/
17+
18+
# Frontend
19+
frontend/node_modules/
20+
frontend/dist/
21+
frontend/.vite/

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.10

CLAUDE.md

Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# CLAUDE.md - Technical Notes for LLM Council
2+
3+
This file contains technical details, architectural decisions, and important implementation notes for future development sessions.
4+
5+
## Project Overview
6+
7+
LLM Council is a 3-stage deliberation system where multiple LLMs collaboratively answer user questions. The key innovation is anonymized peer review in Stage 2, preventing models from playing favorites.
8+
9+
## Architecture
10+
11+
### Backend Structure (`backend/`)
12+
13+
**`config.py`**
14+
- Contains `COUNCIL_MODELS` (list of OpenRouter model identifiers)
15+
- Contains `CHAIRMAN_MODEL` (model that synthesizes final answer)
16+
- Uses environment variable `OPENROUTER_API_KEY` from `.env`
17+
- Backend runs on **port 8001** (NOT 8000 - user had another app on 8000)
18+
19+
**`openrouter.py`**
20+
- `query_model()`: Single async model query
21+
- `query_models_parallel()`: Parallel queries using `asyncio.gather()`
22+
- Returns dict with 'content' and optional 'reasoning_details'
23+
- Graceful degradation: returns None on failure, continues with successful responses
24+
25+
**`council.py`** - The Core Logic
26+
- `stage1_collect_responses()`: Parallel queries to all council models
27+
- `stage2_collect_rankings()`:
28+
- Anonymizes responses as "Response A, B, C, etc."
29+
- Creates `label_to_model` mapping for de-anonymization
30+
- Prompts models to evaluate and rank (with strict format requirements)
31+
- Returns tuple: (rankings_list, label_to_model_dict)
32+
- Each ranking includes both raw text and `parsed_ranking` list
33+
- `stage3_synthesize_final()`: Chairman synthesizes from all responses + rankings
34+
- `parse_ranking_from_text()`: Extracts "FINAL RANKING:" section, handles both numbered lists and plain format
35+
- `calculate_aggregate_rankings()`: Computes average rank position across all peer evaluations
36+
37+
**`storage.py`**
38+
- JSON-based conversation storage in `data/conversations/`
39+
- Each conversation: `{id, created_at, messages[]}`
40+
- Assistant messages contain: `{role, stage1, stage2, stage3}`
41+
- Note: metadata (label_to_model, aggregate_rankings) is NOT persisted to storage, only returned via API
42+
43+
**`main.py`**
44+
- FastAPI app with CORS enabled for localhost:5173 and localhost:3000
45+
- POST `/api/conversations/{id}/message` returns metadata in addition to stages
46+
- Metadata includes: label_to_model mapping and aggregate_rankings
47+
48+
### Frontend Structure (`frontend/src/`)
49+
50+
**`App.jsx`**
51+
- Main orchestration: manages conversations list and current conversation
52+
- Handles message sending and metadata storage
53+
- Important: metadata is stored in the UI state for display but not persisted to backend JSON
54+
55+
**`components/ChatInterface.jsx`**
56+
- Multiline textarea (3 rows, resizable)
57+
- Enter to send, Shift+Enter for new line
58+
- User messages wrapped in markdown-content class for padding
59+
60+
**`components/Stage1.jsx`**
61+
- Tab view of individual model responses
62+
- ReactMarkdown rendering with markdown-content wrapper
63+
64+
**`components/Stage2.jsx`**
65+
- **Critical Feature**: Tab view showing RAW evaluation text from each model
66+
- De-anonymization happens CLIENT-SIDE for display (models receive anonymous labels)
67+
- Shows "Extracted Ranking" below each evaluation so users can validate parsing
68+
- Aggregate rankings shown with average position and vote count
69+
- Explanatory text clarifies that boldface model names are for readability only
70+
71+
**`components/Stage3.jsx`**
72+
- Final synthesized answer from chairman
73+
- Green-tinted background (#f0fff0) to highlight conclusion
74+
75+
**Styling (`*.css`)**
76+
- Light mode theme (not dark mode)
77+
- Primary color: #4a90e2 (blue)
78+
- Global markdown styling in `index.css` with `.markdown-content` class
79+
- 12px padding on all markdown content to prevent cluttered appearance
80+
81+
## Key Design Decisions
82+
83+
### Stage 2 Prompt Format
84+
The Stage 2 prompt is very specific to ensure parseable output:
85+
```
86+
1. Evaluate each response individually first
87+
2. Provide "FINAL RANKING:" header
88+
3. Numbered list format: "1. Response C", "2. Response A", etc.
89+
4. No additional text after ranking section
90+
```
91+
92+
This strict format allows reliable parsing while still getting thoughtful evaluations.
93+
94+
### De-anonymization Strategy
95+
- Models receive: "Response A", "Response B", etc.
96+
- Backend creates mapping: `{"Response A": "openai/gpt-5.1", ...}`
97+
- Frontend displays model names in **bold** for readability
98+
- Users see explanation that original evaluation used anonymous labels
99+
- This prevents bias while maintaining transparency
100+
101+
### Error Handling Philosophy
102+
- Continue with successful responses if some models fail (graceful degradation)
103+
- Never fail the entire request due to single model failure
104+
- Log errors but don't expose to user unless all models fail
105+
106+
### UI/UX Transparency
107+
- All raw outputs are inspectable via tabs
108+
- Parsed rankings shown below raw text for validation
109+
- Users can verify system's interpretation of model outputs
110+
- This builds trust and allows debugging of edge cases
111+
112+
## Important Implementation Details
113+
114+
### Relative Imports
115+
All backend modules use relative imports (e.g., `from .config import ...`) not absolute imports. This is critical for Python's module system to work correctly when running as `python -m backend.main`.
116+
117+
### Port Configuration
118+
- Backend: 8001 (changed from 8000 to avoid conflict)
119+
- Frontend: 5173 (Vite default)
120+
- Update both `backend/main.py` and `frontend/src/api.js` if changing
121+
122+
### Markdown Rendering
123+
All ReactMarkdown components must be wrapped in `<div className="markdown-content">` for proper spacing. This class is defined globally in `index.css`.
124+
125+
### Model Configuration
126+
Models are hardcoded in `backend/config.py`. Chairman can be same or different from council members. The current default is Gemini as chairman per user preference.
127+
128+
## Common Gotchas
129+
130+
1. **Module Import Errors**: Always run backend as `python -m backend.main` from project root, not from backend directory
131+
2. **CORS Issues**: Frontend must match allowed origins in `main.py` CORS middleware
132+
3. **Ranking Parse Failures**: If models don't follow format, fallback regex extracts any "Response X" patterns in order
133+
4. **Missing Metadata**: Metadata is ephemeral (not persisted), only available in API responses
134+
135+
## Future Enhancement Ideas
136+
137+
- Configurable council/chairman via UI instead of config file
138+
- Streaming responses instead of batch loading
139+
- Export conversations to markdown/PDF
140+
- Model performance analytics over time
141+
- Custom ranking criteria (not just accuracy/insight)
142+
- Support for reasoning models (o1, etc.) with special handling
143+
144+
## Testing Notes
145+
146+
Use `test_openrouter.py` to verify API connectivity and test different model identifiers before adding to council. The script tests both streaming and non-streaming modes.
147+
148+
## Data Flow Summary
149+
150+
```
151+
User Query
152+
153+
Stage 1: Parallel queries → [individual responses]
154+
155+
Stage 2: Anonymize → Parallel ranking queries → [evaluations + parsed rankings]
156+
157+
Aggregate Rankings Calculation → [sorted by avg position]
158+
159+
Stage 3: Chairman synthesis with full context
160+
161+
Return: {stage1, stage2, stage3, metadata}
162+
163+
Frontend: Display with tabs + validation UI
164+
```
165+
166+
The entire flow is async/parallel where possible to minimize latency.

README.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# LLM Council
2+
3+
![llmcouncil](header.jpg)
4+
5+
The idea of this repo is that instead of asking a question to your favorite LLM provider (e.g. OpenAI GPT 5.1, Google Gemini 3.0 Pro, Anthropic Claude Sonnet 4.5, xAI Grok 4, eg.c), you can group them into your "LLM Council". This repo is a simple, local web app that essentially looks like ChatGPT except it uses OpenRouter to send your query to multiple LLMs, it then asks them to review and rank each other's work, and finally a Chairman LLM produces the final response.
6+
7+
In a bit more detail, here is what happens when you submit a query:
8+
9+
1. **Stage 1: First opinions**. The user query is given to all LLMs individually, and the responses are collected. The individual responses are shown in a "tab view", so that the user can inspect them all one by one.
10+
2. **Stage 2: Review**. Each individual LLM is given the responses of the other LLMs. Under the hood, the LLM identities are anonymized so that the LLM can't play favorites when judging their outputs. The LLM is asked to rank them in accuracy and insight.
11+
3. **Stage 3: Final response**. The designated Chairman of the LLM Council takes all of the model's responses and compiles them into a single final answer that is presented to the user.
12+
13+
## Vibe Code Alert
14+
15+
This project was 99% vibe coded as a fun Saturday hack because I wanted to explore and evaluate a number of LLMs side by side in the process of [reading books together with LLMs](https://x.com/karpathy/status/1990577951671509438). It's nice and useful to see multiple responses side by side, and also the cross-opinions of all LLMs on each other's outputs. I'm not going to support it in any way, it's provided here as is for other people's inspiration and I don't intend to improve it. Code is ephemeral now and libraries are over, ask your LLM to change it in whatever way you like.
16+
17+
## Setup
18+
19+
### 1. Install Dependencies
20+
21+
The project uses [uv](https://docs.astral.sh/uv/) for project management.
22+
23+
**Backend:**
24+
```bash
25+
uv sync
26+
```
27+
28+
**Frontend:**
29+
```bash
30+
cd frontend
31+
npm install
32+
cd ..
33+
```
34+
35+
### 2. Configure API Key
36+
37+
Create a `.env` file in the project root:
38+
39+
```bash
40+
OPENROUTER_API_KEY=sk-or-v1-...
41+
```
42+
43+
Get your API key at [openrouter.ai](https://openrouter.ai/). Make sure to purchase the credits you need, or sign up for automatic top up.
44+
45+
### 3. Configure Models (Optional)
46+
47+
Edit `backend/config.py` to customize the council:
48+
49+
```python
50+
COUNCIL_MODELS = [
51+
"openai/gpt-5.1",
52+
"google/gemini-3-pro-preview",
53+
"anthropic/claude-sonnet-4.5",
54+
"x-ai/grok-4",
55+
]
56+
57+
CHAIRMAN_MODEL = "google/gemini-3-pro-preview"
58+
```
59+
60+
## Running the Application
61+
62+
**Option 1: Use the start script**
63+
```bash
64+
./start.sh
65+
```
66+
67+
**Option 2: Run manually**
68+
69+
Terminal 1 (Backend):
70+
```bash
71+
uv run python -m backend.main
72+
```
73+
74+
Terminal 2 (Frontend):
75+
```bash
76+
cd frontend
77+
npm run dev
78+
```
79+
80+
Then open http://localhost:5173 in your browser.
81+
82+
## Tech Stack
83+
84+
- **Backend:** FastAPI (Python 3.10+), async httpx, OpenRouter API
85+
- **Frontend:** React + Vite, react-markdown for rendering
86+
- **Storage:** JSON files in `data/conversations/`
87+
- **Package Management:** uv for Python, npm for JavaScript

backend/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""LLM Council backend package."""

backend/config.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
"""Configuration for the LLM Council."""
2+
3+
import os
4+
from dotenv import load_dotenv
5+
6+
load_dotenv()
7+
8+
# OpenRouter API key
9+
OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
10+
11+
# Council members - list of OpenRouter model identifiers
12+
COUNCIL_MODELS = [
13+
"mistralai/devstral-2512:free",
14+
"xiaomi/mimo-v2-flash:free",
15+
"kwaipilot/kat-coder-pro:free",
16+
"tngtech/deepseek-r1t2-chimera:free",
17+
# "z-ai/glm-4.5-air:free",
18+
"nvidia/nemotron-3-nano-30b-a3b:free",
19+
# "qwen/qwen3-coder:free",
20+
"deepseek/deepseek-r1-0528:free",
21+
"meta-llama/llama-3.3-70b-instruct:free"
22+
]
23+
24+
# Chairman model - synthesizes final response
25+
CHAIRMAN_MODEL = "openai/gpt-oss-120b:free"
26+
27+
# OpenRouter API endpoint
28+
OPENROUTER_API_URL = "https://openrouter.ai/api/v1/chat/completions"
29+
30+
# Data directory for conversation storage
31+
DATA_DIR = "data/conversations"

0 commit comments

Comments
 (0)