Goal: Working demo for $8500+ in prizes in 8-12 hours.
The core problem with naive parallel development: Person 3 needs Person 1's DB models AND Person 2's LLMClient before they can write simulation logic. Person 2 needs Person 1's DB models before they can write claims generation. In the naive plan, Person 3 is blocked until hour 3.
The solution: Contract-first development. In the first 30-45 minutes, Person 1 writes skeleton model files and Person 2 writes a skeleton LLMClient. These stubs define the exact field names, method signatures, and import paths. Everyone else writes real logic against those stubs from minute one. Person 1 and 2 then fill in the real implementations behind the agreed interfaces.
Result: All 4 people write real code from minute one. Everything is consistent because everyone imports from the same stubs.
Once stubs are committed to git (target: minute 45), no one changes a field name, method signature, or import path without telling the whole team first.
These things are the glue between all four workstreams. Changing them unilaterally breaks someone else's code silently.
All backend code lives under backend/app/. All frontend code lives under frontend/.
Backend layout:
app/core/config.py— Person 1app/core/database.py— Person 1app/core/llm_client.py— Person 2 (stub by minute 30)app/models/market.py— Person 1app/models/session.py— Person 1app/models/claim.py— Person 1 (stub by minute 45)app/models/agent.py— Person 1 (stub by minute 45)app/models/simulation.py— Person 1 (stub by minute 45)app/models/claim_share.py— Person 1 (stub by minute 45)app/models/report.py— Person 1app/schemas/market.py— Person 1app/schemas/claim.py— Person 1app/schemas/simulation.py— Person 1 and Person 3 agree on this together (critical shared contract)app/schemas/report.py— Person 1app/services/market_ingestion.py— Person 1app/services/polymarket_client.py— Person 1app/services/claims_generator.py— Person 2app/services/apollo_service.py— Person 2app/services/world_builder.py— Person 3app/services/simulation_runner.py— Person 3app/services/report_agent.py— Person 3app/api/routes/markets.py— Person 1app/api/routes/claims.py— Person 1 (route shell) + Person 2 (service logic)app/api/routes/simulations.py— Person 1 (route shell) + Person 3 (service logic)app/api/routes/reports.py— Person 1 (route shell) + Person 3 (service logic)app/workers/simulation_worker.py— Person 3alembic/— Person 1
Frontend layout:
app/page.tsx— Person 4 (market import page)app/simulation/[id]/page.tsx— Person 4 (simulation dashboard and replay)app/reports/[id]/page.tsx— Person 4 (report view)components/SimulationReplay.tsx— Person 4 (THE critical component)components/AgentDebateFeed.tsx— Person 4components/BeliefChart.tsx— Person 4components/TrustNetwork.tsx— Person 4components/MarketImport.tsx— Person 4lib/api.ts— Person 4 (typed API client)lib/types.ts— Person 4 (TypeScript types that mirror the backend schemas exactly)
This is the most important phase of the entire hackathon. Before anyone writes real logic, the following stub files must exist and be committed to git.
The LLMClient stub must define:
- A
complete()async method that accepts a prompt, an optional system string, a model name, and a response format flag. It returns a string. The stub can return a hardcoded placeholder string — that is enough for Person 3 to write simulation logic against it. - A
call_apollo()async method that accepts job titles, keywords, and a limit. It returns a list of dicts. The stub can return an empty list. - A module-level
llm_clientsingleton instance. Everyone imports this singleton — no one creates their own instance.
Model name conventions that must be consistent everywhere:
"gemini-flash"routes to gemini-1.5-flash-002 via Lava (fast, for simulation ticks)"gemini-pro"routes to gemini-1.5-pro-002 via Lava (smart, for claims and reports)"k2-think"routes to Kindo/K2-Think-V2 via LiteLLM (reasoning, for world-building and report planning)
Each model stub must define the SQLAlchemy class with the correct tablename and all column names — even if the column definitions are minimal. The field names are the contract. Person 2 and Person 3 will write code that reads and writes these exact field names.
Claim model fields: id, session_id, market_id, text, stance, strength_score, novelty_score
Agent model fields: id, simulation_id, name, archetype, initial_belief, current_belief, confidence, professional_background, trust_scores
archetypevalues: bayesian_updater, trend_follower, contrarian, data_skeptic, narrative_focused, quantitative_analystprofessional_backgroundis a JSON column storing: title, company, industry, apollo_enriched (bool)trust_scoresis a JSON column storing a dict of other_agent_id to float (0.0–1.0)
Simulation model fields: id, session_id, market_id, status, current_tick, total_ticks, tick_data, created_at, completed_at
statusvalues: pending, building, running, complete, failedtotal_ticksalways defaults to 30tick_datais a JSON column storing a list of TickSnapshot dicts (see schemas contract below)
ClaimShare model fields: id, simulation_id, from_agent_id, to_agent_id, claim_id, commentary, tick_number, delivered
deliveredis a boolean, defaults to Falsetick_numberis the tick when the share was created (recipient sees it on tick_number + 1)
These are the exact data shapes that flow between all four workstreams. Person 3 produces them. Person 1 stores and returns them. Person 4 reads and displays them.
Fields: id, text, stance (yes or no), strength_score (0.0–1.0), novelty_score (0.0–1.0)
Fields: title, company, industry, apollo_enriched (bool)
Fields: id, name, archetype, initial_belief, current_belief, confidence, professional_background (ProfessionalBackground shape)
Fields: agent_id, name, belief (probability at end of this tick), confidence, action_taken (either "update_belief" or "share_claim"), reasoning (the agent's reasoning text — this is what appears in the frontend debate feed)
Fields: from_agent_id, from_agent_name, to_agent_id, to_agent_name, claim_id, claim_text, commentary, tick
Fields: from_agent_id, to_agent_id, old_trust, new_trust
Fields: tick (integer 1–30), agent_states (list of AgentTickState), claim_shares (list of ClaimShareRecord for this tick), trust_updates (list of TrustUpdate for this tick), faction_clusters (list of lists of agent_ids — groups of agents whose beliefs are close)
Fields: id, session_id, market_id, status, current_tick, total_ticks, agents (list of AgentSummary), tick_data (list of TickSnapshot — grows as simulation runs), created_at, completed_at
Fields: id, polymarket_id, question, resolution_criteria, current_probability (float 0.0–1.0), volume
Fields: id, simulation_id, market_probability (Polymarket's number), simulation_probability (emergent consensus), summary, key_drivers (list of strings), faction_analysis, trust_insights, recommendation
These are the exact endpoint signatures. Person 4 builds their entire API client against these from day one using mock data that matches the shapes above.
POST /api/markets/import— accepts a Polymarket URL, returns MarketResponsePOST /api/sessions/{market_id}/claims/generate— returns ClaimsGenerateResponse (session_id, market_id, list of ClaimSchema)POST /api/simulations/build-world— accepts session_id, returns SimulationResponse with status="building" and agents populated, tick_data emptyPOST /api/simulations/{id}/start— returns SimulationResponse with status="running"GET /api/simulations/{id}— returns SimulationResponse with tick_data growing as simulation runs; this is the polling endpointGET /api/reports/{simulation_id}— returns ReportResponse
On each tick, every agent LLM call must return exactly one of two JSON shapes.
Shape 1 — update_belief: action field set to "update_belief", new_probability (float 0.0–1.0), confidence (float 0.0–1.0), reasoning (string shown in debate feed)
Shape 2 — share_claim: action field set to "share_claim", claim_id (must be an id from the visible claims or incoming claims in the agent's prompt), target_agent_ids (list of agent ids, usually 1–2 trusted agents), commentary (string shown in debate feed), reasoning (internal reasoning not shared with other agents)
Person 3 owns the prompt format that produces these responses. Person 1 does not need to know the prompt internals — only the output shape matters for routing and storage.
Full day. Must deliver stubs in the first 45 minutes before writing any real implementations.
- Create full FastAPI scaffold with routers and app entry point
- Create async SQLAlchemy session setup in database.py
- Write all model stub files with correct field names (see Hour 0 section above)
- Write all schema files with correct shapes (see Shared Data Contracts above)
- Commit and announce in
#backend-status: "Stubs ready. Everyone can start."
- Flesh out all model files with indexes, relationships, and constraints
- Write and run Alembic migrations against Supabase
- Announce in
#backend-status: "DB migrations done."
- Build Polymarket HTTP wrapper (no official Python SDK exists — use httpx with retry logic)
- Build market ingestion service: parses URL slug, fetches market data, saves to DB
- Build route shells for all endpoints listed in API Route Contracts
- Wire Person 2's claims_generator into the claims route
- Wire Person 3's world_builder into POST /api/simulations/build-world
- Wire Person 3's simulation_worker into POST /api/simulations/{id}/start
- Deploy to Railway
- Post Railway URL to
#backend-status
- To Person 2: DB models importable from app.models.claim and app.models.session
- To Person 3: DB models importable from app.models.agent, app.models.simulation, app.models.claim_share
- To Person 4: All API endpoints at the exact paths listed above
Full day. LLMClient stub must be committed within 30 minutes — this is the single most important early action of the whole hackathon.
- Create llm_client.py with the stub interface described in Hour 0
- The stub just returns placeholder strings — that is enough to unblock Person 3
- Commit and announce in
#llm-status: "LLMClient stub committed. Person 3 can start."
- Implement complete() to route by model name: gemini-flash and gemini-pro go through Lava, k2-think goes through LiteLLM
- When response_format is "json", append a JSON instruction to the prompt and parse the response before returning
- Implement call_apollo() via Lava's Apollo.io endpoint
- Announce in
#llm-status: "Real LLMClient done."
- claims_generator.py takes market_id and session_id
- Loads market question and resolution criteria from DB
- Sends one Gemini Pro prompt asking for 20-30 structured claims with stance, strength_score, and novelty_score
- Parses the JSON response and saves each claim to the DB as a Claim model
- Returns ClaimsGenerateResponse
The claims generation prompt must ask for: claim text, stance (yes or no — where yes means the claim supports the market resolving YES), strength_score (how strong is this evidence), novelty_score (how surprising or non-obvious is this claim)
- apollo_service.py exposes get_relevant_professionals(market_question) returning a list of ProfessionalBackground
- First calls Gemini Pro to extract relevant job titles and keywords from the market question
- Then calls llm_client.call_apollo() with those titles and keywords
- Maps the Apollo response to the ProfessionalBackground shape
- Falls back to K2 Think-generated synthetic profiles if Apollo returns fewer than 6 results
- Announce in
#llm-status: "Apollo service ready."
- To Person 3: llm_client singleton importable from app.core.llm_client
- To Person 3: apollo_service importable from app.services.apollo_service
- To Person 1: claims_generator service for Person 1 to wire into the claims route
Full day. Can start writing real simulation logic from minute one because Person 2's stub exists within 30 minutes. Write against the stubs — real implementations arrive behind them.
- Do not wait. Person 2's LLMClient stub is available within 30 minutes.
- Import from app.core.llm_client and app.models immediately — stubs exist
- Start writing simulation_runner.py structure, agent tick logic, and the tick loop
- If a model file is not committed yet, write a local dataclass with the same field names and swap to the real import when Person 1 commits
- world_builder.py exposes build_world(session_id, simulation_id) returning a list of Agent DB objects
- Calls apollo_service.get_relevant_professionals() to get real personas, or falls back to K2 Think-generated synthetic backgrounds
- Creates 12 agents covering all 6 archetypes (2 of each): bayesian_updater, trend_follower, contrarian, data_skeptic, narrative_focused, quantitative_analyst
- Assigns initial beliefs spread across 0.35–0.65 based on archetype
- Assigns initial trust scores between agents (0.4–0.8 range, partially randomized)
- Saves all Agent models to DB
The loop runs for 30 ticks. Each tick:
- Load all agents for the simulation from DB
- Load all claims for the session from DB
- Select visible_claims: rank all claims by (0.7 × strength_score + 0.3 × novelty_score), take the top 4 yes-stance claims and top 4 no-stance claims
- Load pending ClaimShares where tick_number equals the current tick and delivered is False, grouped by to_agent_id — these become incoming_claims for each agent
- For each agent, build a private prompt containing: the agent's name and archetype, current belief and confidence, list of trusted agents with trust weights, the visible_claims list, and the incoming_claims specific to that agent
- Call llm_client.complete() with model="gemini-flash" and response_format="json"
- Parse the response into either an update_belief action or a share_claim action (see Agent Action Contract)
- Apply the action: if update_belief, update agent.current_belief and agent.confidence in DB; if share_claim, create a new ClaimShare record with delivered=False
- Update trust scores: when an agent shares a claim, increase trust toward the target by 0.02; when an agent ignores a received claim, decrease trust toward the sender by 0.01
- Detect factions: group agents whose beliefs are within 0.08 of each other
- Build a TickSnapshot dict using the shapes defined in Shared Data Contracts and append it to simulation.tick_data
- Update simulation.current_tick
- Mark all ClaimShares processed this tick as delivered=True
Key simulation rules:
- Claim stance belongs to the claim and never changes — it is set at generation time
- Agent belief belongs to the agent — it changes every tick based on evidence
- Shares are not instant: a claim shared at Tick N appears in the recipient's prompt at Tick N+1
- The backend is the mailman — agents never directly read each other's prompts
- report_agent.py generates the final report after the simulation completes
- First calls K2 Think to plan the report structure with multi-step reasoning over the tick_data summary
- Then calls Gemini Pro to draft each section: executive summary, probability comparison, key evidence drivers, faction analysis, trust network insights, recommendation
- Saves to Report DB model and returns ReportResponse
The report's simulation_probability field is the average agent belief at the final tick (tick 30).
- To Person 1: simulation_worker.run_simulation(simulation_id) async function for Person 1 to wire into the start route
- To Person 4: The SimulationResponse shape with tick_data (Person 4 builds the replay UI against this)
Full day. Start immediately with mock data. Never blocked.
- Initialize Next.js 15 app with TypeScript and Tailwind
- Create lib/types.ts with TypeScript interfaces that exactly mirror every shape in the Shared Data Contracts section above — field names and types must match exactly
- Create lib/mockData.ts with hardcoded data matching those types (3 agents, 5 ticks is enough to build all UI components)
Priority order:
- SimulationReplay.tsx — the critical demo component. Split-screen. Left pane: horizontal tick scrubber (1–30) plus a Recharts line chart with one line per agent (x=tick, y=belief 0–1) plus trust network visualization plus faction cluster display. Right pane: chat-style debate feed showing each agent's reasoning and claim shares at the selected tick. Clicking a tick on the scrubber updates both panes simultaneously.
- AgentDebateFeed.tsx — at each tick, shows each agent's reasoning field and any ClaimShareRecords where they are the sender
- BeliefChart.tsx — standalone Recharts line chart reused inside SimulationReplay
- Market import page (app/page.tsx) — text input for Polymarket URL and a submit button
- Report view page (app/reports/[id]/page.tsx) — displays all ReportResponse fields
- Build lib/api.ts with typed functions for every endpoint in the API Route Contracts section
- Use Tanstack Query for all data fetching
- Polling: use refetchInterval of 2000ms while simulation status is "running"
- Stop polling when status is "complete" or "failed"
- As tick_data grows with each poll, the scrubber and chart update automatically
- When Person 1 posts the Railway URL to
#backend-status, set NEXT_PUBLIC_API_URL and test each route - Replace mock data with real API calls one page at a time
- Test the full user flow: import market → generate claims → build world → start simulation → watch replay → view report
- Deploy to Vercel, set NEXT_PUBLIC_API_URL env var in the Vercel dashboard
- Post Vercel URL to
#frontend-status
- To everyone: the demo. The split-screen simulation replay is the judge-facing proof the system works.
- Person 1: FastAPI scaffold + all model stubs + all schema files committed
- Person 2: LLMClient stub committed, announced in
#llm-status - Person 3: Start writing simulation_runner.py and world_builder.py structure against stubs
- Person 4: Next.js setup + create lib/types.ts from the Shared Data Contracts section + build mock data
Checkpoint: Can Person 3 import llm_client without error? Can Person 2 import Claim model without error? Resolve before anything else.
- Person 1: Full model implementations + Alembic migrations. Announce "DB ready" in
#backend-status. - Person 2: Real LLMClient implementation. Start claims generator.
- Person 3: World builder logic. Simulation loop with real DB model imports.
- Person 4: Build SimulationReplay component with mock data.
- Person 1: API routes for market import and claims. Wire in Person 2's claims_generator.
- Person 2: Claims generator finished. Start Apollo service.
- Person 3: Start simulation runner. Wire in real LLMClient when Person 2 announces it's done.
- Person 4: AgentDebateFeed and BeliefChart components.
- Person 1: Deploy backend. Support incoming bugs. Wire in Person 3's world_builder.
- Person 2: Apollo service finished. Help Person 3 debug LLM prompt issues.
- Person 3: Simulation running end-to-end. Announce in
#simulation-status. - Person 4: Full mock simulation replay working. Start polling logic.
- Person 1: Post Railway URL to
#backend-status. - Person 2: Report agent service. Write K2 Think submission docs.
- Person 3: Report generation. Test full pipeline.
- Person 4: Replace mock data with real API. Test end-to-end.
- Person 1: Monitor backend, fix bugs.
- Person 2: Help with LLM debugging.
- Person 3: Full pipeline test. Reduce to 10 ticks if running slow.
- Person 4: Deploy to Vercel. Polish UI. Record demo.
- Person 2: Apollo.io creative submission writeup
- Person 3: K2 Think reasoning documentation
- Person 4: Screenshots and demo video
- Everyone: Prize submission forms
#backend-status— Person 1 posts: "Stubs committed", "DB migrations done", "Railway URL: ..."#llm-status— Person 2 posts: "LLMClient stub committed", "Real LLM ready", "Apollo ready"#simulation-status— Person 3 posts: "World builder working", "Simulation running end-to-end"#frontend-status— Person 4 posts: "Mock replay working", "Vercel URL: ..."#blockers— Anyone blocked posts here immediately
Standups:
- Hour 1: "Stubs committed? Can everyone import without error?"
- Hour 3: "DB ready? LLMClient real? Simulation loop started?"
- Hour 5: "Simulation running? Frontend mock done?"
- Hour 7: "Integration working? Any blockers?"
- Hour 9: "Demo working end-to-end?"
Minute 30: Person 2 commits LLMClient stub → Person 3 can write all simulation logic
Minute 45: Person 1 commits all model stubs and schemas → Person 2 can write claims generator, Person 3 can write world builder, Person 4 has all TypeScript types and can build all UI components
Hour 2: Person 1 finishes real DB migrations → Person 2 and Person 3 can persist to DB for real
Hour 3: Person 2 finishes real LLMClient → Person 3 simulation loop makes real LLM calls
Hour 6: Person 3 simulation running end-to-end → Person 4 can test with real data shapes
Hour 6: Person 1 deploys to Railway → Person 4 can replace mock data with real API calls
Hour 8: Full pipeline working
If Person 2 does not commit LLMClient stub by minute 30, Person 3 is blocked from writing any simulation logic. This is the only hard early blocker.
If Person 1 does not commit model stubs by minute 45, Person 2 cannot write claims generator and Person 3 cannot write world builder. Model stubs must be committed before real implementations are written.
Person 4 is never blocked. Mock data matches the real schema exactly because lib/types.ts is written from the Shared Data Contracts section. When the real API is ready, it just works.
Hour 6 — simulation not running: reduce to 6 agents and 10 ticks. Skip Apollo.io and use K2 Think-generated synthetic personas instead.
Hour 8 — frontend not integrated: demo with mock data. The UI is the demo, not the backend.
Hour 10 — major issues: strip down to market import → static simulation output → report. Demo the vision, not the full implementation.
Must have:
- Import a Polymarket market
- Claims generated for the market
- Simulation starts and runs
- Simulation replay shows agents updating beliefs over ticks
- Final report generated
Can skip if tight: Apollo.io (fall back to K2-generated personas), trust network visualization, fancy animations, what-if scenarios
Try hard to include Apollo.io — it is the $500 Most Creative prize differentiator and only takes 2-3 hours.
Every person's code must be consistent with this mental model.
The claim pool: One LLM call at the start generates 20-30 claims for the market. Each claim has a stance (yes or no) indicating which direction it points relative to the market outcome. Stance belongs to the claim and never changes. It is set once at generation time.
Agent beliefs: Each of the 12 agents has its own belief (probability 0.0–1.0). Belief belongs to the agent, not the claim. Two agents can see the same claim and react differently.
Each tick: Every agent gets its own private prompt containing their current belief and confidence, a selection of visible claims from the shared pool (top 4 yes + top 4 no ranked by strength and novelty), and any claims specifically sent to them by other agents in the previous tick. Each agent returns exactly one action: update their own belief OR share one claim with a targeted peer.
Claim shares are not instant: If Agent A shares a claim to Agent B at Tick 1, Agent B sees it in Tick 2. The backend stores the share and injects it into the recipient's next prompt. The backend is the mailman — agents never directly see each other's prompts.
The frontend output: The primary deliverable is the Simulation Replay with Agent Debates. Split-screen. Left: tick scrubber 1–30, belief convergence line chart per agent, trust network, faction clusters. Right: chat-style feed showing each agent's reasoning and claim shares at the selected tick. This is what judges see. This is the demo.