Skip to content

Conversation

@jnorthrup
Copy link

@jnorthrup jnorthrup commented Nov 20, 2025

Major Feature Release: Interactive REPL, NVIDIA Provider, Async Database, Memory Systems

This PR creates a lite cli minimalist wheel option --repl

🎯 Major Features

1. Interactive REPL Mode (aider-style CLI)

image

Changelog

collector branch (2025-11-19)

Features

  • Python 3.14 compatibility: Forward-compatible pydantic, puremagic instead of imghdr
  • REPL mode: Interactive CLI with tab completion for model/provider selection
  • NVIDIA provider: Qwen3-coder-480b default, Bayesian model ranking
  • Memory systems: Hashtable, memvid QR, DuckDB vector search
  • Installation options: Full (2GB) or lite (200MB via requirements-lite.txt)

Implementation

  • src/ii_agent/cli/: REPL mode entry point and implementation
  • src/ii_agent/db/: Async SQLite, DuckDB integration
  • src/ii_agent/storage/: Three memory backend options
  • src/ii_agent/server/api/nvidia_models.py: NVIDIA model fetching
  • scripts/fetch_nvidia_models.py: Bayesian ranking script

Breaking Changes

  • Unpinned pydantic (was ==2.11.7, now >=2.11.7)
  • Removed deprecated test files

Documentation

  • docs/INSTALL.md: Installation guide
  • requirements-lite.txt: Minimal REPL-only install

…ystems

Features:
- Python 3.14 compatibility (forward-compatible pydantic, puremagic)
- Interactive REPL mode with tab completion
- NVIDIA provider with qwen3-coder-480b default
- Three memory backends: hashtable, memvid QR, DuckDB
- Split install: full (2GB) vs lite (200MB)

Implementation:
- src/ii_agent/cli/: REPL entry point
- src/ii_agent/db/: Async SQLite, DuckDB
- src/ii_agent/storage/: Memory backends
- requirements-lite.txt: Minimal install

35 files changed, 3904 insertions(+), 1644 deletions(-)
- Checkpoint without eviction (Strategy 5)
- Microkernel generation before checkpoint creation
- Dynamic model-specific thresholds (90% for 64K → 15% for 1MB)
- Context mode transitions: SUSPENDED, HIGH_DETAIL, HIGH_CAPACITY, NORMAL
- RecallContext tool for breadcrumb trail queries
- MicrocontextSubroutine for temporary context expansion
- Tile generation at 33% context for future work streams
- Full integration in ChatService with mode transitions
- Harmonic miss tracking for context pressure analysis
jnorthrup and others added 7 commits November 20, 2025 18:58
Implemented comprehensive context management system to prevent performance degradation:

Core ACE Components:
- ContextWindowManager: Orchestrates context window monitoring and auto-summarization
- ContextBandOptimizer: Golden band tiling for optimal breadcrumb retrieval
- ContextCliffBenchmark: Background needle-in-haystack testing for cliff detection
- SlabCheckpoint: Non-evicting checkpoint system with microkernel summaries
- TileGenerator: Future work tile generation from TODO structure
- DictionaryStorage: LRU cache-backed memvid storage integration

Model-Specific Features:
- Model constants with context windows and performance cliffs
- Dynamic checkpoint thresholds based on cliff data
- Golden band placement for Claude 3.5, GPT-4o, Gemini, Llama 4, DeepSeek
- Harmonic miss tracking for error-induced threshold adjustment

REPL Enhancements:
- install-repl.sh: Local installation script to ~/.local/bin
- REPL with memvid and duckdb support
- Context window status display
- Token counting infrastructure
- Checkpoint and tile generation integration

Storage Systems:
- MemVid QR-encoded MP4 checkpoints
- DuckDB analytics support
- Slab checkpointing (context NOT evicted)
- Dictionary storage with hashtable indexing

Progressive Reduction Strategy:
- 33% context: Dump and generate tiles
- 90% context: Reduce message tokens
- 95% context: Force summarization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
REPL uses DuckDB for local storage and doesn't need PostgreSQL migrations.
Set IIAGENT_SKIP_MIGRATIONS and IIAGENT_SKIP_SERVER_APP_IMPORT environment
variables in the ii-repl wrapper script to prevent migration errors.

Fixes:
- Error running migrations: 'duckdb'
- Prevents unnecessary server app initialization in REPL mode

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replaced static hardcoded model lists with dynamic API-based model fetching:

Model Fetcher:
- Query NVIDIA models/ endpoint for 200+ available models
- Cache fetched models (1-hour TTL) in ~/.ii_agent/model_cache/
- Fetch OpenAI models dynamically from API
- Include known Anthropic and Gemini models

Tab Completion Improvements:
- Support provider/model-slug/sub-slug format (e.g., nvidia/qwen/qwen3-coder-480b)
- Dynamic completion from cached model lists
- File path completion for /add and /drop commands
- Workspace-relative path completion with directory traversal
- CompositeCompleter merges file and model completers

Command Updates:
- /model now accepts provider/model-slug format
- Backward compatible with /model provider model format
- Auto-detects format based on slash presence

Fixes:
- Removed static hardcoded NVIDIA model list (5 models → 200+)
- Proper model slug parsing for NVIDIA multi-slash format
- File completion skips hidden files unless explicitly requested

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Updated model tab completion to show only one directory level at a time:

Before:
- /model nvidia/[TAB] → shows all 200+ models
- Overwhelming and hard to navigate

After:
- /model nvidia/[TAB] → shows: qwen/, meta/, google/, nvidia/, kimi/, etc.
- /model nvidia/qwen/[TAB] → shows only qwen models
- /model nvidia/meta/[TAB] → shows only meta models

Implementation:
- Group models by next path segment after typed prefix
- Show unique prefixes with trailing slash
- Only show final model names when no more slashes
- Applied to both provider/model and provider model formats

UX matches bash cd completion behavior for better usability.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Added slash command completion and improved path-like navigation:

Command Completion:
- / + TAB → shows all available commands (/help, /model, /add, /drop, etc.)
- /mo + TAB → completes to /model
- /ad + TAB → completes to /add

Model Completion (bash-like):
- /model nvidia/ + TAB → qwen/, meta/, google/, nvidia/
- /model nvidia/qwen/ + TAB → shows only qwen models at this level
- Hierarchical navigation exactly like bash cd

File Completion (bash-like):
- /add src/ + TAB → shows files in src/ directory
- /add src/ii_agent/ + TAB → shows files in ii_agent/
- Shows just filenames at current level (not full paths)
- Directories have trailing slash
- Preserves typed path prefix

Implementation:
- CommandCompleter: handles / command completion
- FileCompleter: bash-like file path navigation
- ModelCompleter: hierarchical model path navigation
- CompositeCompleter: merges all three completers

UX now matches bash completion behavior exactly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…unds

Replaced all unreadable colors with readable ones on black backgrounds:
- Warnings: YELLOW → WHITE
- Info: CYAN → WHITE
- Commands: CYAN → WHITE
- Env vars: YELLOW → WHITE
- Header: CYAN → BLUE

Only using readable ANSI colors on black backgrounds:
✓ RED - error messages
✓ GREEN - success, files, models
✓ BLUE - prompts, workspace, headers
✓ WHITE - info, warnings, commands

Removed: CYAN, YELLOW, MAGENTA (unreadable on black)
…l completion

Replaces text.split() tokenization with cursor position-based word extraction
to enable bash-like file completion behavior for model paths with slashes.

Key changes:
- Extract current word at cursor position instead of tokenizing entire command
- Maintain one-level-at-a-time completion like bash cd
- Handle partial model paths correctly at any cursor position
- Add model_suggestions attribute for test compatibility
@jnorthrup
Copy link
Author

this has a new shortcut script ii-repl

there's tab completion for models and nvidia models are working well with all ~ 200 of them under tab completion

image

Anthropic Claude produced a long chain of reasoning steps to wire up the TODO which I handed over to Raptor model for persistent clean contexts.

The ability for sonnet-4.5 to fail dozens of times on simple tab completions and GLM to solve them in less than a minute must be commended.

wiring up the tools remains to be completed and adding a productive finite statemachine has yet to be mined from the server code or borrowed from walking backwards in hugginface papers looking for a good one.

I brought over some adaptive context engineering notions I've had which to my knowledge is not often designed for boundless collections of providers and quotas.

These await lacing up the tools, the model appears to lack the combination of prompt and tool listings as yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant