Skip to content

Latest commit

 

History

History
563 lines (450 loc) · 16.2 KB

File metadata and controls

563 lines (450 loc) · 16.2 KB

AIWB (AI Workbench) - Architecture & Context Management Analysis

Executive Summary

AIWB is a Bash-based CLI toolkit (~6,200 lines) that orchestrates multiple AI models through a Generator-Verifier loop architecture. It provides mode-based workflows for code generation, modification, and debugging across multiple AI providers (Gemini, Claude, OpenAI, Groq, xAI, Ollama).

Key Finding: The system uses a sequential, single-stream architecture with basic context management - it builds entire prompts in memory and sends them synchronously to APIs. There is no parallel processing, distributed task management, or advanced context optimization.


1. ARCHITECTURE & AGENT STRUCTURE

1.1 System Overview

┌─ aiwb (Main Entry Point, 1,886 lines)
│  ├─ Interrupt Handling
│  ├─ Bootstrap & Library Loading
│  ├─ Initialization
│  ├─ Command Dispatch
│  └─ Main REPL Loop
│
└─ lib/ (6 Modules, ~3,200 lines)
   ├─ common.sh (384 lines)     - Platform utilities, logging, JSON handling
   ├─ config.sh (320 lines)     - Configuration management, workspace init
   ├─ api.sh (1,304 lines)      - AI provider integrations (6 providers)
   ├─ modes.sh (1,146 lines)    - Mode workflows (/make, /tweak, /debug)
   ├─ ui.sh (414 lines)         - Terminal UI with gum fallbacks
   ├─ error.sh (339 lines)      - Error handling & diagnostics
   └─ security.sh (416 lines)   - API key encryption & management

1.2 Agent/Model Configuration

Supported Providers (6 total):

  1. Google Gemini: gemini-2.5-flash, gemini-2.0-flash-lite, gemini-2.0-pro (16K tokens)
  2. Anthropic Claude: 3-haiku, 3.5-haiku, 3.5-sonnet, 3-opus, sonnet-4-5 (4-8K tokens)
  3. OpenAI: gpt-4o, gpt-4o-mini, o1, o3 families (16K tokens)
  4. Groq: llama-3.3-70b, llama-4 families, mixtral (16K tokens)
  5. xAI/Grok: grok-beta, grok-4, grok-3 families (16K tokens)
  6. Ollama: Any local model (16K tokens)

Model Configuration (lib/config.sh):

  • Default models per provider are hardcoded in get_default_model()
  • Available models listed in get_available_models()
  • Max token defaults: Claude (4-8K), others (16K)
  • No dynamic model discovery or context window adaptation

1.3 Agent Dispatch Flow

# Main API dispatcher (lib/api.sh:1086)
call_api() 
  ├─ get provider & model from config
  ├─ determine max_tokens based on provider
  └─ dispatch to provider-specific function:
     ├─ call_gemini()        # Synchronous curl
     ├─ call_claude()        # Synchronous curl
     ├─ call_openai()        # Synchronous curl
     ├─ call_groq()          # Synchronous curl
     ├─ call_xai()           # Synchronous curl
     └─ call_ollama()        # Local HTTP endpoint

Key Architecture Traits:

  • Synchronous, single-threaded: Each API call blocks until completion
  • Sequential processing: No parallel requests
  • Single prompt per call: Generator-Verifier happens as separate sequential calls
  • Interrupt-aware: Gracefully handles Ctrl+C with SIGINT trap

2. CONTEXT MANAGEMENT & MEMORY SYSTEMS

2.1 Context Building Flow

Context Assembly (lib/modes.sh:750-838):

mode_run()
  ├─ 1. Load base prompt (text or file)
  ├─ 2. Add mode context (make/tweak/debug prefix)
  ├─ 3. Identify image files in uploads
  ├─ 4. Build text context section:
  │   ├─ === CONTEXT FILES ===
  │   ├─ Files: Show file path + full content
  │   ├─ Directories: Show first 5 files, head -20 lines each
  │   └─ Images: List with metadata
  ├─ 5. Separate images for vision API
  └─ 6. Estimate tokens & ask for cost confirmation

2.2 Context Limitations & Safeguards

NO ACTIVE LIMITS:

  • No context window size checking
  • No file size limits
  • No token counting before send
  • No automatic context truncation

Token Estimation (Very Basic):

estimate_tokens() {
    # Rough approximation: 1 token ≈ 4 characters
    local chars=${#text}
    echo $((chars / 4))
}

Supported Context Types:

Type Handler Notes
Text prompt Direct string Unlimited size
Instruction files cat command Full content read into memory
Source code files cat command Full content
Directories find + head -20 Only first 5 files, head -20 lines
Images (PNG, JPG, GIF, WebP) Base64 encode Sent to vision APIs separately

2.3 Image Handling

Vision API Support (lib/api.sh:1139-1189):

call_api_with_images() {
    # Supported:
    ├─ Gemini: call_gemini_vision()  ✓
    ├─ Claude: call_claude_vision()  ✓
    └─ Others: Text-only fallback
}

Image Encoding (lib/api.sh:164-171):

  • Base64 encoding: base64 < file | tr -d '\n'
  • MIME types: jpeg, png, gif, webp, bmp
  • No size optimization or compression

2.4 Prompt Assembly Example

Generated prompt structure:

Generate code from scratch:

Create a REST API for user management

=== CONTEXT FILES ===

--- File: ./src/server.js ---
[Full file content here]

--- Directory: ./docs ---
File: ./docs/README.md
[First 20 lines]
...
[Shows only 5 files from directory]

=== CONTEXT IMAGES (2) ===
- /path/to/screenshot1.png
- /path/to/screenshot2.png

3. TASK DISTRIBUTION & PARALLEL PROCESSING CAPABILITIES

3.1 Current Capabilities

NONE - System is Sequential:

  • Single prompt → Single API call → Wait for response → Save output
  • No parallel API calls
  • No task queuing
  • No distributed task management
  • No background job management
  • No worker pool

3.2 Workflow Control

Mode-Based Workflows (lib/modes.sh):

/make  → Generate code → Optional verification → Save output
/tweak → Modify code   → Optional verification → Save output
/debug → Fix bugs      → Optional verification → Save output

Each mode is a single interactive menu with state variables:

MODE_CURRENT=""
MODE_PROMPT=""
MODE_INSTRUCT_FILE=""
MODE_UPLOADS=()
MODE_MODEL_PROVIDER=""
MODE_MODEL_NAME=""
MODE_CHECK_PROVIDER=""
MODE_CHECK_MODEL=""

3.3 Generator-Verifier Pattern

Current Implementation (Sequential):

# Step 1: Generate
output=$(call_api "$final_prompt" "$MODE_MODEL_PROVIDER" "$MODE_MODEL_NAME")

# Step 2: [Optional] Verify
if [[ -n "$MODE_CHECK_PROVIDER" ]]; then
    feedback=$(call_api "$output_for_verification" "$check_provider" "$check_model")
fi

Roadmap Vision (Phase 4 - Not Implemented):

  • Autonomous Verifier iterations
  • Convergence detection
  • Configurable iteration limits
  • Parallel workflow orchestration

3.4 Interrupt & Cleanup

Interrupt Handling (aiwb:15-44):

cleanup_on_interrupt() {
    # Kill all background jobs
    local bg_jobs=$(jobs -p 2>/dev/null)
    kill $bg_jobs 2>/dev/null || true
    # Clean temp files
    rm -f /tmp/aiwb_curl_*
    exit 130  # SIGINT exit code
}
trap cleanup_on_interrupt INT TERM

4. CURRENT LIMITATIONS & LARGE CODEBASE HANDLING

4.1 Hard Limitations

Limitation Details Impact
Memory Entire prompt loaded in Bash variable Large directories/files slow shell
Token window No automatic context truncation Exceeding API limits = error
File depth Directories only scan first 5 files Deep projects miss relevant files
Sync only Single-threaded, blocking API calls Long requests block UI
No caching Each request re-sends all context Wasted tokens on repeat requests
Vision limit Only Gemini & Claude support images Others fall back to text
Local files cat reads entire files Large files cause memory bloat

4.2 How Large Codebases Are Handled

Current Approach: Naive Context Assembly

  1. User selects directory → find lists files
  2. First 5 files taken → head -20 each line
  3. Entire selection concatenated into prompt
  4. No deduplication, no filtering, no prioritization

Example:

# From modes.sh:817-821
find "$item" -type f -name "*.sh" -o -name "*.py" -o -name "*.js" -o -name "*.md" 2>/dev/null | 
  head -5 | 
  while read f; do
    echo "File: $f"
    head -20 "$f"  # Only first 20 lines!
    echo "..."
  done

Scaling Issues:

  • Small project (< 1MB, < 50 files) → Works fine
  • Medium project (1-100MB) → Slow shell operations, may exceed token limits
  • Large codebase (> 100MB) → Likely to hit API context limits or timeout
  • Deep structure → Only sees shallow files, misses relevant context

4.3 Error Handling for Large Inputs

Rate Limiting (lib/error.sh):

if echo "$error_msg" | grep -iq "rate limit\|quota"; then
    die "$E_RATE_LIMIT" "Rate limit exceeded: $error_msg"
    # 60-second retry wait
fi

Token Overflow: No specific handling

  • Error from API caught and displayed
  • User must reduce context size and retry

4.4 Cost Estimation (Basic)

# Estimate tokens BEFORE sending (rough approximation)
input_tokens=$(estimate_tokens "$final_prompt")
output_tokens=$((input_tokens * 2))
gen_cost=$(calculate_cost "$provider" "$model" "$input_tokens" "$output_tokens")

# Display: "Estimated cost: $X.XX - Proceed? (yes/no)"

Actual Cost Tracking:

# After execution, log to usage.jsonl:
{
    "timestamp": "2025-11-10T12:34:56Z",
    "provider": "gemini",
    "model": "2.5-flash",
    "input_tokens": 1250,
    "output_tokens": 450,
    "cost": 0.0045,
    "mode": "make"
}

5. WORKSPACE & STATE MANAGEMENT

5.1 Workspace Structure

~/.aiwb/
├── config.json              # User configuration
├── .aiwb.env                # Unencrypted API keys
├── .keys.age                # Age-encrypted keys (optional)
├── .session                 # Last session state
└── workspace/
    ├── projects/            # Project folders
    ├── tasks/               # Task files (.prompt.md)
    ├── outputs/             # Generated outputs
    ├── logs/
    │   ├── chat_*.log       # Interaction logs
    │   └── usage.jsonl      # Cost tracking
    ├── templates/           # User templates
    ├── history/             # Session history
    └── snapshots/           # Workspace backups

5.2 Session Management

Session State (lib/config.sh:218-261):

save_session() {
    # Saved: workspace, provider, model, task, project
    # Written to: ~/.aiwb/.session (JSON)
}

load_session() {
    # Restore previous session state on startup
}

5.3 Configuration Management

Default Config (lib/config.sh:122-150):

{
  "version": "2.0.0",
  "workspace": "",
  "model_provider": "gemini",
  "model_name": "2.5-flash",
  "current_task": "",
  "current_project": "",
  "preferences": {
    "auto_estimate": true,
    "confirm_before_generate": true,
    "show_costs": true,
    "stream_output": false,
    "tier_default": "Medium"
  },
  "cost_tracking": {
    "enabled": true,
    "monthly_budget": 0,
    "currency": "USD"
  },
  "security": {
    "encrypt_keys": false,
    "warn_on_exposure": true
  }
}

6. PERFORMANCE & SCALABILITY CHARACTERISTICS

6.1 Measured Baseline

Script Size: ~6,200 lines of Bash

  • Startup time: < 500ms (with dependencies)
  • Library loading: Parallel source 6 files
  • Config initialization: < 50ms

6.2 Bottlenecks

  1. Context Assembly: O(n) file reading

    • Reading 100 files: ~5-10 seconds
    • Building prompt string: ~1-2 seconds
  2. API Calls: Network I/O bound

    • Small request (< 2K tokens): ~2-5 seconds
    • Large request (> 4K tokens): ~5-15 seconds
    • Includes curl overhead and JSON parsing
  3. JSON Operations: Each config update parses & re-writes file

    • jq overhead: ~50-100ms per operation
    • No batching of config updates

6.3 Concurrency Model

Current: None (single-threaded)

What's Used:

# Interrupt-safe background wait pattern
set +e
curl ... &
local curl_pid=$!
wait $curl_pid
set -e

7. SECURITY ARCHITECTURE

7.1 API Key Management

Storage Options (lib/security.sh):

  1. Environment Variables (Plain text)

    • GEMINI_API_KEY, ANTHROPIC_API_KEY, etc.
    • Sourced from ~/.aiwb/.aiwb.env
  2. Age Encryption (Recommended)

    • Keys encrypted with age tool
    • Stored in ~/.aiwb/.keys.age
    • Decrypted at runtime (passphrase prompt)
  3. Interactive Setup

    • Command: aiwb keys
    • Encrypts keys during setup

7.2 Secure File Permissions

# Config file permissions
chmod 600 "$config_file"

# Keys file permissions
chmod 600 "$keys_file"

7.3 Input Sanitization

Safe Input Reading (lib/common.sh:58-101):

safe_read() {
    # Platform-aware input handling
    # Termux: Prefers /dev/tty
    # Linux/macOS: Falls back to stdin
    # No string interpolation vulnerabilities
}

8. PLANNED ENHANCEMENTS (Roadmap)

Phase 3 (In Progress - v2.1):

  • Smart Context Management
    • Automatic context relevance detection
    • Smart file filtering by task type
    • Context size optimization
    • Incremental context loading

Phase 4 (Future - v3.0):

  • Autonomous Workflows

    • Autonomous Generator-Verifier loops
    • Convergence detection
    • Quality metrics tracking
  • Parallel Task Execution

    • Multi-step workflow definitions
    • Conditional branching
    • Parallel task execution (explicit goal)

Phase 5 (Future - v3.5+):

  • Distributed/Cloud
    • API & SDK for remote execution
    • Webhook support
    • Custom provider plugins

9. COMPARISON TO OTHER SYSTEMS

Feature AIWB Claude Code LangChain LlamaIndex
CLI First
Multi-Provider Limited
Context Management Basic Advanced Advanced Advanced (RAG)
Parallel Tasks Limited
Agent Loops Generator-Verifier N/A Limited
Local Models ✓ (Ollama)
Cost Tracking
Lines of Code 6.2K Proprietary 100K+ 50K+

10. KEY INSIGHTS & RECOMMENDATIONS

Strengths

  1. Clean, modular Bash architecture - Easy to understand & extend
  2. Multi-provider support - 6 major AI providers
  3. Cost transparency - Built-in tracking and estimation
  4. Cross-platform - Linux, macOS, Termux/Android
  5. Developer-friendly - CLI-first, keyboard-driven

Current Gaps

  1. No context optimization - Naive file selection
  2. Single-threaded only - All API calls sequential
  3. No intelligent caching - Resends context each time
  4. Limited scalability - Not designed for large codebases
  5. Basic token estimation - 1 token = 4 chars approximation
  6. No task queuing - Memory-only state

Recommended Improvements

  1. Smart Context Selection

    • Implement semantic relevance scoring
    • Cache frequently-used context
    • Implement context compression (summaries)
  2. Parallel Processing

    • Background cost estimation while user types
    • Parallel verification workflows
    • Batch API calls for cost reduction
  3. Advanced Token Management

    • Integrate with tokenizers.js
    • Dynamic model selection based on context
    • Automatic context truncation
  4. Caching Layer

    • Store embeddings of uploaded files
    • Deduplicate context across requests
    • Smart invalidation
  5. Scalability

    • Move to Go/Python for better concurrency
    • Implement task queue (Redis/RabbitMQ)
    • Distributed context management

11. CODE STRUCTURE QUICK REFERENCE

Most Important Files:

File Lines Purpose Key Functions
aiwb 1,886 Main REPL & dispatch main loop, cmd handlers
lib/modes.sh 1,146 Workflow menus mode_run(), menu_model()
lib/api.sh 1,304 Provider integrations call_api(), call_gemini(), etc.
lib/config.sh 320 Config management config_get/set, init_workspace
lib/ui.sh 414 Terminal UI gum wrappers, menus
lib/common.sh 384 Utilities safe_read, logging, paths
lib/error.sh 339 Error handling error codes, diagnostics
lib/security.sh 416 API key management encrypt/decrypt, key loading