Skip to content

0x0funky/vibehq-hub

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

82 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌐 Language: English | 繁體中文 | ζ—₯本θͺž

⚑ VibeHQ

Running 5 AI agents in parallel is easy.
Making them not break each other's code is the hard part.

VibeHQ adds contracts, task tracking, and idle-aware messaging to Claude Code, Codex & Gemini CLI β€” so they work like an actual engineering team, not 5 interns editing the same file.


The Problem Nobody Talks About

Every "multi-agent" tool lets you run multiple CLI agents in parallel. But parallel β‰  collaboration. Here's what actually happens when 5 agents build the same app:

What Goes Wrong Real Example from Our Logs
Schema conflicts β€” each agent invents its own JSON format Frontend expects { data: [] }, backend writes { results: [] }, third agent creates its own copy
Orchestrator role drift β€” the PM starts writing code PM spent 6 manual JS patches fixing integration bugs instead of coordinating
Ghost files β€” agents publish 43-byte stubs instead of real content Agent writes full file via share_file, then puts "See local file..." in publish_artifact. Loop repeats for 68 minutes
Premature execution β€” agents start before dependencies are ready Agent sees QUEUED task description, ignores the status, starts coding with hardcoded data
Silent failures β€” crashed agents produce no signal Orchestrator waits 18 minutes for a response from a dead process

These aren't edge cases. They're LLM-native behavioral patterns that reliably appear across model families. We documented 7 of them with full session logs.

πŸ“– Read the full analysis: 7 LLM-Native Problems β†’


What VibeHQ Actually Does

VibeHQ is a teamwork protocol layer that sits on top of real CLI agents. Each agent stays a full Claude Code / Codex / Gemini process with all native features β€” VibeHQ adds the coordination they're missing:

Problem VibeHQ's Fix
Schema conflicts Contract system β€” agents must sign API specs before coding begins
Role drift Structured task lifecycle β€” create β†’ accept β†’ in_progress β†’ done with required artifacts
Ghost files Hub-side validation β€” rejects publish_artifact calls with stub content (<200 bytes)
Premature execution Idle-aware queue β€” withholds task details until dependencies are ready
Silent failures Heartbeat monitoring β€” auto-detects offline agents, notifies orchestrator
No quality check Independent QA β€” separate agent validates data against source docs
No post-mortem 13 automated detection rules β€” analyzes session logs for failure patterns

Self-Improving Coordination: The Framework That Debugs Itself

VibeHQ doesn't just coordinate agents β€” it analyzes its own failures and writes code to fix them. Fully automated, zero human intervention.

We built a closed-loop system: run a benchmark β†’ analyze the logs β†’ /optimize-protocol reads the analysis and implements real code changes β†’ rebuild β†’ run again and measure:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Benchmark   │────▢│  vibehq-analyze   │────▢│ /optimize-protocolβ”‚
β”‚  (run team)  β”‚     β”‚  --with-llm       β”‚     β”‚   (Claude skill)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β–²                                              β”‚
       β”‚              writes real code changes        β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Benchmark Results: Todo App (V1 β†’ V5, 4 agents)

Metric V1 V2 V3 V4 V5
Total Tokens 7.2M 3.9M 14.6M 15.0M 5.7M
PM Tokens 0.3M 0.2M 10.1M 9.8M 1.8M
PM % of Total 4% 5% 69% 65% 32%
Turns 233 164 326 308 216
Duration 47min 13min 10min 9min 14min
Flags (issues) 4 3 5 3 0
Context Bloat (PM) 7.07x 10.56x 6.62x 7.04x 2.84x

Benchmark Results: Classroom Quiz (fully automated loop)

Metric V1 (Before) V2 (After Loop) Change
Total Tokens 23.1M 13.8M -40%
PM Tokens ~15.2M ~1.3M -91%
Turns 460 353 -23%
Flags 14 3 -79%
STUB_FILE 8 0 eliminated
Context Bloat (PM) 7.87x 2.84x -64%

What the system learned and built

Iteration Problem Found What Was Built
V1β†’V2 Hub falsely kills agents during boot; PM writes code Startup grace period (180s); role presets with tool bans
V2β†’V3 Codex PM ignores prompt constraints (shell_command 4β†’42x) --disallowedTools CLI enforcement; switched PM to Claude
V3β†’V4 PM uses Glob to monitor workers; artifacts overwritten to 0 bytes Expanded disallowed tools; 0-byte content rejection at MCP layer
V4β†’V5 PM polling explodes (28x check_status); stubs pass validation McpRateLimiter (5 calls/60s); CODE_MIN enforcement; post-completion quiesce
CQ V1β†’V2 8 stub files; PM 66% of tokens on polling Same fixes applied automatically β€” stubs eliminated, tokens -40%
       23.1M ─                         * CQ-V1
             β”‚
       15.0M ─               * V3  * V4
       13.8M ─                            * CQ-V2
             β”‚
        7.2M ─  * V1
        5.7M ─                                  * V5
        3.9M ─      * V2
             β”‚
           0 ┼──────────────────────────────────────
             V1   V2   V3   V4  CQ1  CQ2   V5

Key insight: Prompt constraints are suggestions. CLI-level enforcement is law. Agents adapt and route around soft limits β€” the fix must be architectural.

πŸ“– Full blog post: Self-Improving Multi-Agent Coordination β†’


πŸ“± Web Dashboard β€” Desktop & Mobile

Start agents on your PC, monitor from your phone.

Mobile

vibehq-app.mp4

Desktop

vibehq.mp4

πŸš€ Quick Start

git clone https://github.com/0x0funky/vibehq-hub.git
cd vibehq-hub && npm install
npm run build

Terminal (TUI)

VibeHQ TUI

vibehq

Interactive menu β€” select a team, configure agents, start. Everything runs in your terminal.

Web Dashboard

npm run build:web
vibehq-web

Open http://localhost:3100 β€” create a team, add agents, hit Start. Manage everything from a browser.

# With auth (recommended for LAN/mobile access)
VIBEHQ_AUTH=admin:secret vibehq-web

The server prints your LAN IP β€” open it on your phone and you're in.


πŸ”§ 20 MCP Tools

Every agent gets 20 collaboration tools auto-injected via Model Context Protocol:

Communication (6): ask_teammate, reply_to_team, post_update, get_team_updates, list_teammates, check_status

Tasks (5): create_task, accept_task, update_task, complete_task, list_tasks

Artifacts (5): publish_artifact, list_artifacts, share_file, read_shared_file, list_shared_files

Contracts (3): publish_contract, sign_contract, check_contract

System (1): get_hub_info

🎬 Watch 7 agents collaborate in real-time β†’

MCP tools in action (videos)

List Teammates

list_teammate.mp4

Teammate Talk

Assign_task.mp4

Assign Task

Discuss_teammate.mp4

πŸ“Š Post-Run Analytics & Auto-Optimization

Analyze

vibehq-analyze ./data                        # Analyze session logs
vibehq-analyze --team my-team --with-llm     # Auto-resolve team logs + LLM insights
vibehq-analyze --team my-team --with-llm --save --run-id v1  # Save for optimization
vibehq-analyze compare v1 v2                 # Compare two runs side-by-side
vibehq-analyze history --last 10             # View past runs

13 automated detection rules: artifact regression, orchestrator role drift, stub files, task timeout, incomplete tasks, coordination overhead, unresponsive agents, zero artifacts, context bloat, duplicate artifacts, premature task accept, excessive MCP polling, task reassignment.

Skills: /run-teamwork, /benchmark-loop & /optimize-protocol

VibeHQ ships three skills. Skills work on both Claude Code and Codex CLI β€” same format, different directory.

Cross-Platform Skill Locations

Platform Project-level User-level
Claude Code .claude/skills/<name>/SKILL.md ~/.claude/skills/
Codex CLI .agents/skills/<name>/SKILL.md ~/.codex/skills/

The SKILL.md format is an emerging cross-platform standard β€” same frontmatter (name, description), same markdown body. A skill created for one platform works on the other.

Setup

Claude Code β€” skills are already included in .claude/skills/. Just use them:

# In Claude Code, type:
/run-teamwork "Build an AI investment analysis platform"
/benchmark-loop "Build a todo app" --grade A
/optimize-protocol v1

Codex CLI β€” copy the skills to Codex's directory:

# Project-level (committed to repo)
mkdir -p .agents/skills
cp -r .claude/skills/run-teamwork .agents/skills/
cp -r .claude/skills/optimize-protocol .agents/skills/
cp -r .claude/skills/benchmark-loop .agents/skills/

# Or user-level (available in all projects)
cp -r .claude/skills/run-teamwork ~/.codex/skills/
cp -r .claude/skills/optimize-protocol ~/.codex/skills/
cp -r .claude/skills/benchmark-loop ~/.codex/skills/

Then in Codex CLI, invoke with /skills or type $ to mention a skill.

/run-teamwork β€” One-Shot Team Builder

Give it a project description β€” it designs the team, spawns agents, and builds it. No analysis, no loop.

/run-teamwork "Build an e-commerce site with payments and admin panel"
  1. Analyzes the prompt to determine required domains and team size
  2. Generates PM system prompt with research-first workflow (research before implementation)
  3. Spawns agents in tmux (macOS/Linux) or Windows Terminal
  4. Waits for all tasks to complete
  5. Reports the output directory and file count

/optimize-protocol β€” Framework Engineer

Reads analysis data and writes real code fixes (not parameter tuning):

/optimize-protocol v1    # Read analysis for run v1, implement fixes
  1. Loads current run + all previous optimization reports
  2. Builds cross-run trend table (what's improving, what regressed, what's a side-effect)
  3. Classifies each problem as NEW, RECURRING, or SIDE-EFFECT of a previous fix
  4. Implements real TypeScript changes to the framework
  5. Verifies build passes
  6. Saves a detailed changelog to ~/.vibehq/analytics/optimizations/

/benchmark-loop β€” Autonomous Runner

Runs the full self-improving cycle automatically:

/benchmark-loop "Build a Todo app with REST API, React frontend, and WebSocket real-time updates"
  1. Spawns a fresh team with a standardized project
  2. Waits for the team to finish (heartbeat monitoring)
  3. Analyzes session logs (13 rules + LLM grading)
  4. Triggers /optimize-protocol to write code fixes
  5. Rebuilds the framework (npx tsup)
  6. Repeats with a new team β€” zero human intervention

Manual Step-by-Step (works with any CLI)

The underlying tools are regular CLI commands β€” no skills required:

# 1. Run a benchmark
vibehq start --team your-team

# 2. Analyze
vibehq-analyze --team your-team --with-llm --save --run-id v1

# 3. Auto-optimize (Claude Code / Codex skill)
/optimize-protocol v1

# 4. Run again, compare
vibehq start --team your-team
vibehq-analyze --team your-team --with-llm --save --run-id v2
vibehq-analyze compare v1 v2

All optimization reports are saved to ~/.vibehq/analytics/optimizations/ for tracking and auditing.

Supports both Claude Code and Codex CLI native JSONL log formats.

πŸ“± Remote Access

The web platform is accessible on your LAN by default. For external access:

⚠️ Always set VIBEHQ_AUTH before exposing remotely β€” the web UI gives full terminal access.

Method Best For
Tailscale Personal use β€” private VPN, no config, free
Cloudflare Tunnel Sharing β€” public URL behind Cloudflare, free
ngrok Quick testing β€” ngrok http 3100, temporary URL
SSH Tunnel VPS β€” ssh -R 8080:localhost:3100 your-server

Tailscale (recommended): Install on PC + phone β†’ sign in both β†’ VIBEHQ_AUTH=admin:secret vibehq-web β†’ open http://<tailscale-ip>:3100 on phone.

πŸ“ Configuration

vibehq.config.json

{
  "teams": [{
    "name": "my-project",
    "hub": { "port": 3001 },
    "agents": [
      { "name": "Alex", "role": "Project Manager", "cli": "codex", "cwd": "D:\\project" },
      { "name": "Jordan", "role": "Frontend Engineer", "cli": "claude", "cwd": "D:\\project\\frontend",
        "dangerouslySkipPermissions": true, "additionalDirs": ["D:\\project\\shared"] }
    ]
  }]
}
Field Description
name Agent display name (unique per team)
role Role β€” auto-loads preset if no systemPrompt set
cli claude, codex, or gemini
cwd Working directory (isolated per agent)
systemPrompt Custom prompt (overrides preset)
dangerouslySkipPermissions Auto-approve Claude permissions
additionalDirs Extra directories agent can access

Built-in presets: Project Manager, Product Designer, Frontend Engineer, Backend Engineer, AI Engineer, QA Engineer

πŸ›  CLI Reference
vibehq              # Interactive TUI
vibehq-web          # Web platform (browser + mobile)
vibehq-hub          # Standalone hub server
vibehq-spawn        # Spawn single agent
vibehq-analyze      # Post-run analytics

Manual Spawn

vibehq-spawn --name "Jordan" --role "Frontend Engineer" \
  --team "my-team" --hub "ws://localhost:3001" \
  --skip-permissions --add-dir "/shared" -- claude
πŸ— Architecture
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   VibeHQ Hub                      β”‚
β”‚               (WebSocket Server)                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Tasks  β”‚ β”‚Artifacts β”‚ β”‚Contractβ”‚ β”‚ Message β”‚ β”‚
β”‚  β”‚ Store  β”‚ β”‚ Registry β”‚ β”‚ Store  β”‚ β”‚  Queue  β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”β”‚
β”‚  β”‚  Agent Registry β€” idle/working detection     β”‚β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β–Όβ”€β”€β”€β”€β”
    β”‚ Claude β”‚ β”‚ Claude β”‚ β”‚ Codex  β”‚ β”‚ Claude β”‚
    β”‚  (FE)  β”‚ β”‚  (BE)  β”‚ β”‚  (PM)  β”‚ β”‚  (QA)  β”‚
    β”‚ 20 MCP β”‚ β”‚ 20 MCP β”‚ β”‚ 20 MCP β”‚ β”‚ 20 MCP β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–²          β–²          β–²          β–²
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Web Dashboard   β”‚
                    β”‚ Desktop & Mobile β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key design:

  • Process isolation β€” each agent is a separate OS process. Crashes don't cascade.
  • Contract-driven β€” specs must be signed before coding begins.
  • Idle-aware queue β€” messages queue when busy, flush when idle (JSONL watcher + PTY timeout).
  • State persistence β€” all data survives hub restarts (~/.vibehq/teams/<team>/hub-state.json).
  • MCP-native β€” 20 purpose-built tools, type-safe, auto-configured per agent.
  • Orchestrator enforcement β€” Claude PMs get --disallowedTools (CLI-level hard block on Bash/Write/Edit/Read/Glob); Codex PMs get --sandbox read-only.
  • Content validation β€” MCP rejects 0-byte artifacts, stub patterns, and >80% size regressions at the tool level.
  • Self-improving β€” analyzeβ†’optimize loop with cross-run trend tracking and automated changelogs.
⚠️ Platform Support
Feature Windows Mac Linux
Web Platform βœ… Tested βœ… Should work βœ… Should work
TUI βœ… Tested βœ… Tested ⚠️ Untested
Hub + Spawn βœ… Tested βœ… Tested βœ… Should work
JSONL Watcher βœ… Tested βœ… Tested ⚠️ Path encoding
node-pty βœ… Tested βœ… Tested ⚠️ Untested

Mac: requires xcode-select --install. If posix_spawnp failed: chmod +x node_modules/node-pty/prebuilds/*/spawn-helper

Linux: requires build-essential and python3.

πŸ“ Project Structure
agent-hub/
β”œβ”€β”€ bin/                  # CLI entry points (start, spawn, hub, web, analyze)
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ hub/              # WebSocket hub, agent registry, message relay
β”‚   β”œβ”€β”€ spawner/          # PTY manager, JSONL watcher, idle detection
β”‚   β”œβ”€β”€ web/              # Express server, REST API, WebSocket handlers
β”‚   β”œβ”€β”€ mcp/              # 20 MCP tools + hub-client bridge
β”‚   β”œβ”€β”€ analyzer/         # Post-run analytics pipeline (13 rules)
β”‚   β”œβ”€β”€ shared/           # TypeScript types
β”‚   └── tui/              # Terminal UI screens + role presets
β”œβ”€β”€ web/                  # React frontend (Vite + xterm.js)
β”œβ”€β”€ blog/                 # Technical articles on LLM behavioral patterns
└── benchmarks/           # V1 vs V2 comparison reports

🀝 Contributing

PRs welcome. Modular architecture:

  • New MCP tool? β†’ src/mcp/tools/ + register in hub-client.ts
  • New CLI? β†’ detection in spawner.ts + MCP config in autoConfigureMcp()
  • New widget? β†’ web/src/components/ or src/tui/screens/

πŸ“„ License

MIT


𝕏 @0x0funky

About

Orchestrate Claude, Codex & Gemini agents working as a real engineering team.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors