Problem
Current wake-up = static top-N memory bullets (~440 tokens). For projects with rich memory history (decisions, resolved errors, blockers, licensing stages, smoke test states), this is too shallow — agents have to fire icm_memory_recall to get useful context, adding latency and turn cost.
When the user asks "where are we?" / "what's missing?", the static bullets give a surface view; the real signal lives in the broader memory store.
Proposal
Replace (or augment) the static wake-up with an LLM-generated narrative briefing, compiled out-of-band whenever project memories change.
Flow
- Memory write → enqueue summarization job for the affected project
- Job reads all memories (project-scoped + relevant globals)
- LLM produces a structured briefing:
- State of work (in flight, blocked)
- Recent decisions and their rationale
- Known errors / resolutions to remember
- Project-specific user preferences
- Briefing cached at
~/.icm/cache/wake-up-{project}.md
- Session start loads cached briefing — zero added latency
Auto-detect the invoker AI tool
ICM should reuse the CLI already authenticated on the user's machine. No extra API key to manage.
Detection priority: | Prio | Mechanism | Notes |
|---|---|---|
| 1 | ICM_INVOKER env var | Explicit override (settable by hooks) |
| 2 | Tool-specific env vars | CLAUDECODE, CURSOR_TRACE_ID, GEMINI_CLI, CODEX_HOME, etc. |
| 3 | MCP clientInfo.name | When ICM is invoked via MCP |
| 4 | Parent process walk | Fallback (/proc/$PPID/comm on Linux) |
| 5 | Config TOML default | fallback_provider = "claude" |
Provider mapping (cheap defaults):
| Invoker |
Command |
Default model |
| Claude Code |
claude -p --model <m> |
claude-haiku-4-5 |
| Codex CLI |
codex exec --model <m> |
cheap codex equivalent (TBD) |
| Gemini CLI |
gemini -p --model <m> |
gemini-2.5-flash |
| Ollama |
HTTP API |
qwen2.5:7b |
| API direct |
Anthropic/Google/OpenAI SDK |
provider-specific cheap default |
Why cheap defaults: summarization is a simple task. No need for Opus/Sonnet tier. User pays nothing extra in most cases (their own CLI quota).
Configuration (3 levels)
~/.config/icm/config.toml:
[wakeup.summarizer]
provider = "auto" # auto | claude | codex | gemini | ollama | api
model = "" # empty = provider default
fallback_provider = "claude"
max_tokens = 1500
cache_dir = "~/.icm/cache"
regen_on_memory_write = true
CLI override: icm wakeup --summarizer-provider ollama --summarizer-model qwen2.5:7b
TUI: provider/model picker in icm interactive menu.
Per-project: .icm/config.toml in repo root.
Suggested PR breakdown (3 sub-PRs targeting v0.11)
PR 1 — Foundation (~500 LOC)
Provider trait
- Claude CLI provider (only)
- Auto-detect invoker (env vars + fallback, no MCP/pproc yet)
- Config schema extension
- Synchronous regen at session start (no cache yet)
- MVP works end-to-end for Claude Code users
PR 2 — Cache & async hooks (~400 LOC)
- File-based cache with hash-based invalidation
- Memory write hook → enqueue async regen
- Wake-up loads cache if present, falls back to static bullets
- Zero added latency at session start
PR 3 — Multi-provider + TUI (~500 LOC)
- Codex / Gemini / Ollama providers
- TUI picker for summarizer config
- CLI flags
--summarizer-*
- Docs + integrations updates
Non-goals
- Replace
icm_memory_recall — raw recall stays for deep dives
- Real-time regeneration on session start (must be pre-computed)
- Cross-project synthesis — scope is per-project briefings
Open questions
- Invalidation: hash of all memory rows vs explicit dirty flag?
- Should briefing include "delta since last session"?
- Token budget: 1500 default, hard cap at 3000?
- Fallback on LLM unreachable: stale cache vs static bullets vs error?
- Cost guardrail: rate-limit regen (e.g., max 1/minute per project)?
Acceptance criteria
Milestone
Target: v0.11 (no milestone created yet — to be assigned).
Problem
Current wake-up = static top-N memory bullets (~440 tokens). For projects with rich memory history (decisions, resolved errors, blockers, licensing stages, smoke test states), this is too shallow — agents have to fire
icm_memory_recallto get useful context, adding latency and turn cost.When the user asks "where are we?" / "what's missing?", the static bullets give a surface view; the real signal lives in the broader memory store.
Proposal
Replace (or augment) the static wake-up with an LLM-generated narrative briefing, compiled out-of-band whenever project memories change.
Flow
~/.icm/cache/wake-up-{project}.mdAuto-detect the invoker AI tool
ICM should reuse the CLI already authenticated on the user's machine. No extra API key to manage.
Detection priority: | Prio | Mechanism | Notes |
|---|---|---|
| 1 |
ICM_INVOKERenv var | Explicit override (settable by hooks) || 2 | Tool-specific env vars |
CLAUDECODE,CURSOR_TRACE_ID,GEMINI_CLI,CODEX_HOME, etc. || 3 | MCP
clientInfo.name| When ICM is invoked via MCP || 4 | Parent process walk | Fallback (
/proc/$PPID/common Linux) || 5 | Config TOML default |
fallback_provider = "claude"|Provider mapping (cheap defaults):
claude -p --model <m>claude-haiku-4-5codex exec --model <m>gemini -p --model <m>gemini-2.5-flashqwen2.5:7bWhy cheap defaults: summarization is a simple task. No need for Opus/Sonnet tier. User pays nothing extra in most cases (their own CLI quota).
Configuration (3 levels)
~/.config/icm/config.toml:CLI override:
icm wakeup --summarizer-provider ollama --summarizer-model qwen2.5:7bTUI: provider/model picker in
icminteractive menu.Per-project:
.icm/config.tomlin repo root.Suggested PR breakdown (3 sub-PRs targeting v0.11)
PR 1 — Foundation (~500 LOC)
ProvidertraitPR 2 — Cache & async hooks (~400 LOC)
PR 3 — Multi-provider + TUI (~500 LOC)
--summarizer-*Non-goals
icm_memory_recall— raw recall stays for deep divesOpen questions
Acceptance criteria
Milestone
Target: v0.11 (no milestone created yet — to be assigned).