Progress 2025-11-21: Schema/migration v3 (fts5 mirror with created_at) + rusqlite DAL; connectors implemented (Codex, Cline, Gemini, Claude; Amp/OpenCode detect-only) with Codex fixture test; index command persists to SQLite and Tantivy (agent/workspace/source_path/msg_idx/created_at/title/content) with optional watch scaffold; CLI/TUI shell on nightly; Search client supports Tantivy + SQLite-FTS fallback, agent/workspace/time filters, pagination; TUI renders live results + detail pane and status guidance.
Ultra-high-level:
Build a single Rust binary (agent-search, name TBD) that:
- Runs a slick, low-latency TUI (ratatui + crossterm) on Linux/macOS/Windows
- Auto-detects Codex CLI, Claude Code, Gemini CLI, Amp CLI, Cline, OpenCode (and is extensible to others)
- Normalizes each tool’s conversation history into a unified SQLite schema
- Builds and maintains a Tantivy index (Lucene-like, Rust-native) for sub-50ms “search as you type” over all conversations(GitHub)
- Ships via a
curl | bashinstaller (plus PowerShell equivalent) modeled on the Ultimate Bug Scanner installer, including--easy-modeand per-dependency prompts(GitHub)
-
Speed
- “Perceived instant” search as you type (<50–80ms for moderate corpora; <200ms for huge ones)
- Initial indexing amortized via background jobs + incremental updates
-
Coverage
-
First-class support for:
-
Pluggable architecture to add Cursor CLI, Roo Code, etc. later.
-
-
UX
- Beautiful TUI (ratatui widgets, color themes per agent)(GitHub)
- Hotkeys to filter by time, agent, workspace, project; view full transcript; jump to original log.
-
Portability
-
Single static(ish) binary per OS; zero runtime deps except libc.
-
Works on:
- Linux (x86_64, aarch64)
- macOS (arm64, x86_64)
- Windows (x86_64, possibly via WSL if some agents are Linux-only).
-
- No network calls to remote agent backends (Amp/Claude/Codex clouds). Only local artifacts (JSON/JSONL/SQLite) to avoid any auth/privacy issues.
- We don’t attempt to write back to these tools’ histories; we only read and index.
- Not a general “code search” tool; scope is chat / agent transcript search.
Context: zero legacy users. Optimize cass for AI/automation (tmux/headless), not for human muscle memory. Defaults can change freely.
- CLI is primary; TUI only on explicit
cass tui(never auto when automation flags present or stdout non-TTY withouttui). - Stdout is data-only; stderr is diagnostics/progress.
- Machine error schema:
{"error":{"code":int,"kind":string,"message":string,"hint":string,"retryable":bool}}. - Exit codes: 0 ok; 2 usage; 3 missing index/db; 4 network; 5 data-corrupt; 6 incompatible-version; 7 lock/busy; 8 partial; 9 unknown.
- Deterministic defaults printed in help (data dir, db path, log path). Color bars off when non-TTY unless forced.
--robot-help: deterministic, wide guide (Summary, Commands, Defaults, Exit codes, JSON/Error schema, Examples, Env, Paths, Trace, Contracts); version header (crate + contract version). No ANSI unless--color=always.robot-docs <topic>topics:commands,env,paths,schemas,exit-codes,examples,contracts,wrap.--jsoneverywhere data flows (search/index/etc.).--color=auto|never|always; default auto (off when non-TTY).--progress=plain|bars|none; default bars on TTY, plain otherwise.--wrap <cols>and--nowrap; default: no forced wrap (wide output encouraged).--trace-file <path>: JSONL spans {start_ts,end_ts,duration_ms,cmd,args,exit_code,error?}; never to stdout/stderr.
- If automation flag present (
--json,--robot-help,robot-docs,--trace-file), TUI path is bypassed; main returns after CLI action. - If no subcommand and stdout non-TTY: emit short guidance and exit 2 (don’t launch TUI).
- Search pagination:
--limit,--offset, stable ordering. - Progress bars suppressed by
--quietor--progress=none; data unaffected.
- Embed robot-help content generator; robot-docs topic renders parse-stable blocks.
- README “AI automation” section with wide examples, wrap guidance, trace usage, automation defaults, and no-legacy stance.
- Changelog + version bump; header in robot-help mirrors crate + contract versions.
- Snapshots for
--robot-helpand eachrobot-docstopic. - Contract tests: exit codes per scenario, JSON validity, color suppression when non-TTY, wrap flags, TUI bypass, trace file writing.
- Perf sanity: ensure minimal startup overhead for robot-help/doc paths.
This section turns web research into concrete connector requirements.
-
What it is Open-source terminal-native coding agent (
codexCLI) that reads/edits/runs code locally.(GitHub) -
Config & state locations
-
Config:
~/.codex/config.toml(or$CODEX_HOME/config.toml)(GitHub) -
Session logs:
-
-
Implications for us
-
Canonical source = rollout JSONL files; each describes a session with:
- Metadata (session id, start time, working directory)
- User messages / agent steps / approvals / tool runs.
-
We must:
- Discover
$CODEX_HOME(env or default~/.codex) - Recursively scan
sessions/*/*forrollout-*.jsonl. - Parse each JSONL line as a “log event” and reconstruct conversations.
- Discover
-
-
What it is Anthropic’s agentic coding tool (“Claude Code”) for terminal + editor.
-
History locations (based on ecosystem tools & docs) Community tools for Claude Code history refer to:
-
CLI logs
-
Several open-source viewers take Claude Code CLI logs (JSONL) and render them as Markdown, implying:
- CLI writes JSONL logs; path varies but
~/.claude/projectsis a strong default.(claude-hub.com)
- CLI writes JSONL logs; path varies but
-
-
Implications
-
We need a Claude connector that:
- Scans
~/.claude/projects/**for JSONL files (exclude non-log files). - Optionally scans each repo’s
.claudeor.claude.jsonfor embedded transcript data. - Parses JSONL events into our unified schema.
- Scans
-
-
What it is Official Google Gemini CLI for terminal-based workflows.(GitHub)
-
History location
-
Implications
-
Connector should:
- Enumerate
~/.gemini/tmp/*directories. - Treat each directory as a project/session cluster.
- Parse checkpoint & log JSON into conversation threads (ordered by timestamp / sequence).
- Enumerate
-
-
What it is “Frontier coding agent” available as VS Code extension and CLI, built by Sourcegraph.(Amp Code)
-
Local storage
-
Amp mainly stores threads on Sourcegraph servers (Doc & community reports note that “all threads are stored on Sourcegraph servers”).(Reddit)
-
VS Code extension:
- Caches thread history locally under VS Code’s
globalStoragedirectory (extension-managed).(Amp Code)
- Caches thread history locally under VS Code’s
-
Amp CLI:
-
Stores credentials in:
- Linux/macOS:
~/.local/share/amp/secrets.json - Windows:
%APPDATA%\amp\secrets.json(Amp Code)
- Linux/macOS:
-
Chat contents themselves are not guaranteed to be fully cached locally.
-
-
-
Implications
-
Our Amp connector must:
-
Respect that the primary truth is remote; we only index whatever is cached locally:
- VS Code globalStorage (same pattern as Cline / other extensions).
- Any CLI cache directories if they exist (we’ll detect by exploring
~/.local/share/amp/for JSON/JSONL).
-
Provide partial coverage; document clearly in the UI (e.g. label Amp as “local cache only”).
-
-
-
What it is Popular VS Code extension & ecosystem fork (Roo Code).
-
Local storage
-
Migration docs & issues consistently point to:
- macOS Cline data dir:
~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev(Reddit) - Linux analog:
~/.config/Code/User/globalStorage/saoudrizwan.claude-dev(inferred from VS Code layout). - Windows analog:
%APPDATA%\Code\User\globalStorage\saoudrizwan.claude-dev(same pattern).
- macOS Cline data dir:
-
In that directory, users mention files like:
-
taskHistory.json(index of tasks displayed in “Recent tasks”) -
One file per task containing:
task_metadata.jsonui_messages.jsonapi_conversation_history.json(Stack Overflow)
-
-
-
Implications
-
Cline connector must:
-
Find the VS Code globalStorage dir for the Cline extension.
-
Walk all task directories, reading:
task_metadata.json→ title, created_at, workspace, provider, etc.ui_messages.json/api_conversation_history.json→ actual transcript.
-
Rebuild conversation threads from these JSON files even if
taskHistory.jsonis corrupted (StackOverflow questions show that this is needed).(Stack Overflow)
-
-
-
What it is Local coding agent CLI with MCP support; uses SQLite to persist sessions.(GitHub)
-
Storage
-
Quickstart notes and blog posts describe:
-
On first run, OpenCode creates a
.opencodedata directory in the project root and initializes a SQLite database for conversation/sessions.(HackMD) -
Config includes a
data.directory/database_pathoption; default often resides in:- Project-local
.opencode - Or a global
~/.config/opencode/...SQLite file (depending on config).(atalupadhyay)
- Project-local
-
-
-
Implications
-
OpenCode connector:
-
Locates per-project
.opencodedirectories by scanning:- Current git repos (via
git rev-parse --show-toplevelor just walking up from CWD). $HOMEfor.opencodewhen not inside a repo (optional).
- Current git repos (via
-
Reads SQLite schema (already there), maps
sessions,messages, etc. → our unified schema.
-
-
Per agent, we need a detection matrix (paths inferred by OS):
| Agent | Primary history roots (defaults) |
|---|---|
| Codex CLI | $CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl (default CODEX_HOME=~/.codex); plus $CODEX_HOME/history.jsonl if enabled.(GitHub) |
| Claude Code | ~/.claude/projects/** JSONL logs; plus per-repo .claude / .claude.json.(GitHub) |
| Gemini CLI | ~/.gemini/tmp/<project-hash>/{chat,checkpoint}-*.json.(GitHub) |
| Amp | VS Code globalStorage cache for Amp; Amp CLI secrets & any local cache under ~/.local/share/amp or %APPDATA%\amp.(Amp Code) |
| Cline | VS Code globalStorage: Code/User/globalStorage/saoudrizwan.claude-dev/** JSON/JSONL.(Reddit) |
| OpenCode | Project-local .opencode directories with SQLite DB; global ~/.config/opencode/... if configured.(HackMD) |
-
CLI / entrypoint (
main.rs)-
Subcommands:
-
agent-search tui(default): launch full-screen TUI. -
agent-search index:--full: rebuild entire index from scratch.--incremental: only new or changed logs.
-
agent-search inspect <agent> <session-id>: dump normalized view of a single conversation.
-
-
-
Connectors layer (
connectors::*)-
One module per agent:
connectors::codexconnectors::claude_codeconnectors::geminiconnectors::ampconnectors::clineconnectors::opencode
-
Each exposes:
-
Detection:
fn detect_installation(env: &Environment) -> DetectionResult;
-
Scan & normalize:
fn scan_sessions(ctx: &ScanContext) -> anyhow::Result<Vec<NormalizedConversation>>; fn watch_paths(ctx: &ScanContext, tx: Sender<IndexUpdate>) -> anyhow::Result<()>;
-
-
-
Data model & persistence (
model,storage)-
modeldefines normalized Rust structs for:Agent,Conversation,Message,Snippet,Workspace.
-
storage::sqlite- SQLite DB (rusqlite) with strongly-typed schema.(Docs.rs)
-
-
Search engine (
search) -
TUI / UI (
ui)- Built with Ratatui +
ratatui-crosstermbackend.(GitHub)
- Built with Ratatui +
-
Index orchestrator (
indexer)-
Coordinates:
- Initial full scan
- Incremental updates (filesystem watchers via
notify)(GitHub) - Rebuilding indexes when schema changes.
-
-
Config (
config)- YAML/TOML config stored in XDG / platform-appropriate directories via
directoriescrate.(Crates)
- YAML/TOML config stored in XDG / platform-appropriate directories via
-
Logging & error handling
tracing+tracing-subscriberfor logging.color-eyreormiettefor pretty diagnostics in CLI mode.(The Rust Programming Language Forum)
-
UI thread
- Runs the Ratatui event loop, processes user input (crossterm events).
-
Search worker pool
- Uses
rayonto parallelize search + scoring over Tantivy index.(Crates)
- Uses
-
Index worker
-
Thread that:
-
Listens for
IndexUpdatemessages from:- Connectors (full/partial scans)
- Filesystem watchers (notify)
-
Batch-writes to SQLite & Tantivy.
-
-
Communication via crossbeam::channel:
enum UiEvent { Key(KeyEvent), Tick, SearchResult(SearchResults) }
enum IndexCommand { FullReindex, IncrementalScan, FilesystemEvent(FsEvent) }
struct Channels {
ui_tx: Sender<UiEvent>, ui_rx: Receiver<UiEvent>,
index_tx: Sender<IndexCommand>, index_rx: Receiver<IndexCommand>,
}- Agent:
codex,claude_code,gemini_cli,amp,cline,opencode, … - Workspace: root path of repo / project (if known).
- Conversation: one “thread” or “task”.
- Message: user or agent message, plus tool runs / actions.
- Snippet: optional code snippet or file section references.
We’ll create a single SQLite DB under app data dir:
- Use
rusqlitewithbundledfeature to ship our own SQLite build (ensures FTS5 is available across platforms).(Docs.rs)
Tables
-- Agents (tools)
CREATE TABLE agents (
id INTEGER PRIMARY KEY,
slug TEXT NOT NULL UNIQUE, -- "codex", "cline", etc.
name TEXT NOT NULL,
version TEXT,
kind TEXT NOT NULL, -- "cli", "vscode", "hybrid"
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
-- Workspaces (projects / repos)
CREATE TABLE workspaces (
id INTEGER PRIMARY KEY,
path TEXT NOT NULL, -- canonical absolute path
display_name TEXT,
UNIQUE(path)
);
-- Conversations (threads / tasks)
CREATE TABLE conversations (
id INTEGER PRIMARY KEY,
agent_id INTEGER NOT NULL REFERENCES agents(id),
workspace_id INTEGER REFERENCES workspaces(id),
external_id TEXT, -- tool's session/task ID
title TEXT,
source_path TEXT NOT NULL, -- original log / DB path
started_at INTEGER, -- unix millis
ended_at INTEGER,
approx_tokens INTEGER,
metadata_json TEXT, -- extra tool-specific info
UNIQUE(agent_id, external_id)
);
-- Messages
CREATE TABLE messages (
id INTEGER PRIMARY KEY,
conversation_id INTEGER NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
idx INTEGER NOT NULL, -- order in conversation
role TEXT NOT NULL, -- "user","agent","tool","system"
author TEXT,
created_at INTEGER, -- may be null if unknown
content TEXT NOT NULL,
extra_json TEXT
);
-- Optional per-message code snippets / file refs
CREATE TABLE snippets (
id INTEGER PRIMARY KEY,
message_id INTEGER NOT NULL REFERENCES messages(id) ON DELETE CASCADE,
file_path TEXT,
start_line INTEGER,
end_line INTEGER,
language TEXT,
snippet_text TEXT
);
-- Simple tag layer (for later)
CREATE TABLE tags (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE conversation_tags (
conversation_id INTEGER NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
tag_id INTEGER NOT NULL REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (conversation_id, tag_id)
);On DB open, apply pragmas:
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL; -- or FULL for "safe mode"
PRAGMA temp_store = MEMORY;
PRAGMA cache_size = -65536; -- 64MB in pages
PRAGMA foreign_keys = ON;
PRAGMA mmap_size = 268435456; -- 256MB (tuneable)Indexes:
CREATE INDEX idx_conversations_agent_started
ON conversations(agent_id, started_at DESC);
CREATE INDEX idx_messages_conv_idx
ON messages(conversation_id, idx);
CREATE INDEX idx_messages_created
ON messages(created_at);To support fallback search (and some advanced filters), create an FTS5 virtual table:
CREATE VIRTUAL TABLE fts_messages
USING fts5(
content,
title,
agent_slug,
workspace,
message_id UNINDEXED,
conversation_id UNINDEXED,
created_at UNINDEXED,
tokenize = "porter"
);We then keep fts_messages synchronized with messages via our Rust code (not triggers, to avoid performance surprises).
FTS5 gives fast text search, built-in ranking, and helps on platforms where Tantivy or its index is temporarily unavailable.(SQLite)
- Tantivy is a Rust-native Lucene-like full-text search engine with high performance and a feature set comparable to Elasticsearch’s core text features.(GitHub)
- Well suited as “Rust equivalent to Lucene/Elastic” per the requirement.
Index location
- On-disk under app data dir:
data_dir/index/(per schema version, e.g.index/v1/).
Fields
message_id(u64, stored)conversation_id(u64, stored)agent_slug(string, indexed, fast field for filters)workspace(string, indexed)created_at(i64, indexed as fast field, sortable)role(string, indexed)title(text, indexed & stored)content(text, indexed & stored)
Use:
TEXTfields with a standard analyzer (tokenization, lowercasing, stopwords).FASTfields forcreated_atandagent_slugto support efficient range & term filters.
When user types into search box:
-
Build a Tantivy query:
-
Query::Booleancombining:-
Full-text query on
content&title-
Multi-field query parser with weights:
titleweight 3.0contentweight 1.0
-
-
Agent filter(s):
TermQueryonagent_slug. -
Time filter:
RangeQueryoncreated_at.
-
-
-
Limit: top 100 hits (configurable).
-
Group results by conversation for TUI display:
- Each conversation row shows best-scoring message snippet.
-
Pre-open a Tantivy
IndexReader&Searcheron startup. -
Use Tantivy’s multi-threaded search (via its internal threadpool) plus
rayonfor grouping and post-processing.(quickwit.io) -
Debounce keystroke-triggered searches by ~100–150ms:
- Send
SearchRequest { query, filters, timestamp }on each change. - Worker deduplicates by dropping stale requests.
- Send
- TUI:
ratatuifor widgets/layout.(GitHub) - Backend:
ratatui-crosstermusingcrosstermfor cross-platform terminal control.(Crates)
Main screen = 3 panes:
-
Top bar (1–2 rows)
-
Search input
[ query here… ] -
Filter summary:
Agents: Codex, Claude, GeminiTime: Last 7 daysWorkspace: all
-
Right side: status (indexing progress, #docs, backend used: Tantivy/FTS)
-
-
Left pane – Results list
-
Scrollable list of hit conversations/messages.
-
Each row:
[AGENT ICON] [REL TIME] [WORKSPACE] Title / first line snippet
-
Colored by agent:
- Codex: cyan
- Claude: purple
- Gemini: blue
- Amp: magenta
- Cline: green
- OpenCode: yellow
-
-
Right pane – Detail
-
When a row is selected:
-
Shows full conversation:
- Timestamped
- Roles (“You”, “Agent”, “Tool”) with colors.
-
-
Tabs at top:
[Messages] [Code Snippets] [Raw JSON].
-
Bottom status line:
- Hints:
Enter: open | /: search | f: filters | t: time | a: agents | w: workspace | ?: help | q: quit.
-
Navigation:
↑/↓ork/j– move selection in result list.PgUp/PgDn– page results.Tab– toggle focus between search box / results / detail.
-
Search:
/– focus search input.Esc– clear search if input nonempty, else go back/focus results.
-
Filters:
-
f– open filter popover. -
a– agent filter:- Checkbox list of agents; space toggles; enter applies.
-
t– time filter modal:- Quick presets:
1= last 24h,7= last 7 days,3= last 30 days,0= all. c= custom; prompts from/to dates.
- Quick presets:
-
w– workspace filter:- List of detected workspaces; search-as-you-type fuzzy filter (local to this list).
-
-
Detail:
Enter– open conversation in detail pane (if not already).o– open underlying log/DB in external editor ($EDITOR + path:line).r– toggle between “grouped by turn” vs “flat log” view.
-
Use Ratatui’s
Block,List,Paragraph,Tabswidgets with:- Light borders, rounded corners where available.
- Highlight style for selected row: reverse video + bold.
- Soft accent colors rather than neon; calibrate for readability in dark mode.
-
Support light/dark themes via config (
theme = "dark" | "light"). -
Optional “minimal mode” that disables some borders for simpler terminals.
Define:
struct NormalizedConversation {
agent_slug: String,
external_id: String,
title: Option<String>,
workspace: Option<PathBuf>,
source_path: PathBuf,
started_at: Option<i64>,
ended_at: Option<i64>,
metadata: serde_json::Value,
messages: Vec<NormalizedMessage>,
}
struct NormalizedMessage {
role: MessageRole,
author: Option<String>,
created_at: Option<i64>,
content: String,
extra: serde_json::Value,
snippets: Vec<NormalizedSnippet>,
}Each connector:
- Emits
NormalizedConversationobjects. - Does idempotent scans: uses
source_path+external_idto avoid duplicates.
Detection
-
Check:
- Is
codexbinary on PATH? Usewhichcrate to detect executables robustly across platforms.(Stack Overflow) - Or does
~/.codexexist?
- Is
Scan
-
Determine
$CODEX_HOME:- Env var
CODEX_HOMEor default~/.codex.
- Env var
-
Enumerate
sessions:CODEX_HOME/sessions/*/*/rollout-*.jsonl
-
For each
rollout-*.jsonl:-
Treat file as one “session”.
-
Parse JSONL line-by-line:
- Identify user messages vs agent messages (look at event type:
user_message,assistant_message, etc.). - Extract timestamps, workspace path, title (if present), approvals, tool runs.
- Identify user messages vs agent messages (look at event type:
-
Build
NormalizedConversation:external_id= file path or session UUID from JSON.workspace= working directory from session metadata.started_at= first event timestamp;ended_at= last.
-
-
Optionally, incorporate
history.jsonl:- As fallback when sessions missing; but primary will be session logs.
Incremental updates
-
Use
notifyto watch:$CODEX_HOME/sessionsdirectory for new/changed files.
-
On new
rollout-*.jsonl:- Parse, upsert in DB and update Tantivy/FTS.
Detection
-
Heuristics:
~/.claudedirectory exists.- VS Code extension for Claude installed (look for
claude-codeor similar inglobalStoragedirectories).
-
Config-driven override:
- Allow user to specify
claude.projects_diretc.
- Allow user to specify
Scan
-
Root:
~/.claude/projects. -
For each project dir:
-
List
.jsonlhistory logs (names may vary:history-*.jsonl,session-*.jsonl). -
Parse JSONL:
- Each line = event. Identify conversation boundaries (session-id field).
-
-
Map fields:
- Title: may come from “task name” or first user message.
- Role: map Claude’s
user,assistant,tool. - Workspace: if path is embedded; else null.
-
Additionally, check per-repo
.claude/.claude.json:- Some setups store “project memory” or limited history there; treat as additional conversations.
Incremental
- Watch
~/.claude/projectsfor new/updated.jsonlfiles.
Detection
geminibinary on PATH (which "gemini"orgemini-cli), or~/.geminidirectory.(GitHub)
Scan
-
Root:
~/.gemini/tmp. -
For each child dir
<project-hash>:-
Enumerate JSON files:
checkpoint-*.json,chat-*.json, etc.
-
-
Reconstruction strategy (from logs-prettifier script semantics):(GitHub)
- Checkpoints contain “current conversation state”; chat logs contain message history.
- Prefer chat logs; if absent, fallback to checkpoints.
-
Build:
external_id= directory name + checkpoint id.title= derived from first user message or model-provided session name.- Timestamps = earliest / latest message timestamps.
Incremental
- Watch
~/.gemini/tmpfor new directories / files.
Detection
ampCLI on PATH (npm i -g @sourcegraph/ampinstalls it).(marketplace.visualstudio.com)~/.local/share/ampexists or%APPDATA%\amp.(Amp Code)
Scan
-
Local thread storage is limited (most sessions remote).(Reddit)
-
Strategy:
-
Inspect
~/.local/share/amp/%APPDATA%\amp:- Any JSON/JSONL logs? (we’ll define a naming convention once we see typical installs).
-
Inspect VS Code globalStorage for Amp extension (similar to Cline):
- e.g.
Code/User/globalStorage/sourcegraph.amp/**.
- e.g.
-
-
If we find JSON/JSONL per thread:
- Map them to
NormalizedConversation.
- Map them to
-
Tag Amp conversations as
partial = truein metadata.
Detection
-
Check for VS Code globalStorage path:
- Platform-specific pattern resolving to
<vscode-config>/User/globalStorage/saoudrizwan.claude-dev.(Reddit)
- Platform-specific pattern resolving to
Scan
-
In that directory:
-
Identify per-task directories or files:
-
taskHistory.jsonsummarizing tasks (maybe optional if corrupted).(Stack Overflow) -
A directory per task UUID with:
task_metadata.jsonui_messages.jsonapi_conversation_history.json
-
-
If
taskHistory.jsonexists:- Use as index for tasks (title, created_at).
-
For each task:
-
Parse
task_metadata.json:- Title, provider, workspace root, etc.
-
Parse
ui_messages.json/api_conversation_history.json:- Build ordered message list; unify user vs agent vs tool roles.
-
-
-
external_id= task id.
Incremental
- Watch globalStorage dir for changes.
Detection
-
On startup:
-
If
opencodeCLI on PATH (nice but not required). -
Scan:
- Current working dir upward for
.opencode(project-local). $HOMEfor.opencodeor config-specified global DB.(HackMD)
- Current working dir upward for
-
Scan
-
For each
.opencodedir:-
Read config (if present) to locate SQLite DB.
-
Open DB with rusqlite and introspect schema.
- Likely tables:
sessions,messages,files, etc.
- Likely tables:
-
-
Map:
- Each row in
sessions=Conversation. - Each
messagerow =Message. - Additional tables (e.g.,
files) →Snippets or tags.
- Each row in
Incremental
-
For SQLite DB, we can’t easily watch per-row changes, but we can:
-
Track DB
mtimeand last imported row id / timestamp per DB. -
On change:
- Query for rows newer than last imported.
-
-
User runs
agent-search(TUI command). -
App locates config dir (
directories::ProjectDirsforcoding_agent_search).(Crates) -
If DB / index missing:
-
Run initial detection:
- For each connector, call
detect_installation.
- For each connector, call
-
Show small TUI dialog:
- “Detected: Codex, Cline, Gemini. Index now? [Yes] [Skip]”
-
Kick off full scan in background thread:
-
Progress bar in status bar:
- “Indexing Codex: 327/1024 sessions…”
-
-
-
For log-file-based sources:
- Use
notifywatchers on root dirs (~/.codex,~/.gemini/tmp,~/.claude/projects, VS Code globalStorage).(GitHub) - Debounce FS events to avoid thrashing.
- Use
-
For SQLite-based sources (OpenCode):
- Periodic polling (e.g., every 60s) for DB
mtimechange.
- Periodic polling (e.g., every 60s) for DB
-
On new/changed source:
-
Re-run corresponding connector
scan_sessionsbut with:since_timestamp= last import time per source file / DB.
-
-
Maintain
schema_versionin a smallmetatable. -
On binary upgrade:
-
If schema mismatch:
- Run migration scripts (Rust-implemented).
- Optionally rebuild Tantivy index from SQLite.
-
- Single-line install inspired by Ultimate Bug Scanner:(GitHub)
curl -fsSL https://raw.githubusercontent.com/<you>/coding-agent-search/main/install.sh | bash-
Support
--easy-modeto:- Auto-install all dependencies without prompting.
- Auto-enable all detected agents.
1. Safety & prerequisites
-
set -euo pipefail -
Check for:
curlorwgettaruname,mktemp
-
Print what it will do and ask confirmation (unless
--easy-mode).
2. Detect OS / arch
uname -s→Linux/Darwin.uname -m→x86_64/arm64.
3. Download binary
- Determine latest version (github releases API or static
VERSIONfile). - Download
agent-search-<os>-<arch>.tar.gz. - Verify checksum (SHA-256 baked into script; like UBS does for its modules).(GitHub)
4. Install location
- Default:
${HOME}/.local/bin/agent-search(or~/binfallback). - Optionally
/usr/local/binif user chooses and has sudo.
5. Dependencies
We aim to build a fully self-contained binary (bundled SQLite, static linking), so external dependencies are minimal. For extra tools we might optionally use:
sqlite3CLI (for debug)lessorbat(for external viewing)(GitHub)
Script logic:
-
Detect package manager:
apt,dnf,pacman,brew,yum,zypper. -
For each missing extra:
- Prompt: “Install sqlite3 with apt? [Y/n]” unless
--easy-mode.
- Prompt: “Install sqlite3 with apt? [Y/n]” unless
6. Post-install
- Add
${HOME}/.local/binto PATH if missing (touch shell rc). - Print quickstart:
Run: agent-search
Or: agent-search tui
- Equivalent PowerShell command:
irm https://raw.githubusercontent.com/<you>/coding-agent-search/main/install.ps1 | iex-
Steps:
- Detect architecture via
[Environment]::Is64BitOperatingSystem. - Download
agent-search-windows-x86_64.zip. - Extract to
%LOCALAPPDATA%\Programs\agent-search. - Add that directory to user PATH (via registry or
setx).
- Detect architecture via
-
For Windows lacking proper terminal support:
- Recommend Windows Terminal or WSL; but the binary should still work with standard console.
Use which or pathsearch crate to reliably find executables in PATH on all OSes (handles PATHEXT on Windows).(Stack Overflow)
-
Binaries to probe:
codexampgeminiorgemini-cliopencode- For Claude Code / Cline (more VSCode-embedded), detection will lean on filesystem directories.
-
Check for each tool's canonical conf/data dirs (see Section 2 table).
-
If path exists and contains expected “signature file”:
- Codex:
~/.codex/config.toml(GitHub) - Gemini:
~/.gemini/tmpwithcheckpoint-*.json.(GitHub) - Claude:
~/.claude/projectswith JSONL.(GitHub) - Cline: VS Code globalStorage dir with
taskHistory.json.(Reddit) - Amp:
~/.local/share/amp/secrets.jsonor%APPDATA%\amp\secrets.json.(Amp Code) - OpenCode:
.opencodedirectories / global config.(HackMD)
- Codex:
On first run (and accessible via Settings):
- Show list:
| Agent | Detected? | Evidence | Enabled? |
|---|---|---|---|
| Codex CLI | yes/no | codex in PATH, ~/.codex/... |
[x] |
| Claude Code | yes/no | ~/.claude/projects |
[x] |
| Gemini CLI | yes/no | ~/.gemini/tmp |
[x] |
| Amp | yes/no | amp CLI, Amp globalStorage, etc. |
[ ] |
| Cline | yes/no | VSCode globalStorage dir | [x] |
| OpenCode | yes/no | .opencode dirs / global DB |
[x] |
User can toggle connectors on/off; this is stored in config.
-
Use
directories::ProjectDirsto compute platform-correct config directory, e.g.:- Linux:
~/.config/coding-agent-search/config.toml - macOS:
~/Library/Application Support/coding-agent-search/config.toml - Windows:
%APPDATA%\coding-agent-search\config.toml(Crates)
- Linux:
Example config.toml:
[general]
theme = "dark"
enable_tantivy = true
max_results = 200
[sqlite]
path = "/home/user/.local/share/coding-agent-search/agent_search.db"
page_size = 4096
cache_size_mb = 64
[agents.codex]
enabled = true
home = "/home/user/.codex"
[agents.claude_code]
enabled = true
projects_dir = "/home/user/.claude/projects"
[agents.gemini_cli]
enabled = true
root = "/home/user/.gemini/tmp"
[agents.amp]
enabled = false
note = "Limited to local cache only"
[agents.cline]
enabled = true
vscode_profile = "Code" # or "Code - Insiders"
[agents.opencode]
enabled = true
search_project_roots = true
extra_db_paths = []tantivy.index_path,tantivy.num_indexing_threads.search.default_time_range(e.g.,7d).search.min_query_lengthfor search-as-you-type.performance.max_conversationsto index; can be unlimited by default.
-
For each connector:
- Synthetic minimal log/DB sample → normalized conversations.
- Backwards-compat as upstream tools change (guard by snapshot tests).
-
For SQLite:
- Schema migrations tested with up/down simulation.
-
For search:
- Queries returning expected conversations for various filters.
-
End-to-end:
-
Spin up a temp dir as “home”.
-
Place sample logs for Codex, Cline, Gemini, etc.
-
Run
agent-search index --full --config test-config.toml. -
Run
agent-search tuiin non-interactive mode:- Feed keystrokes.
- Assert on output (e.g., via
crosstermrecording or snapshotting RAT).
-
Baseline dataset: e.g.,
- 10k conversations
- 1M messages
- Several hundred MB raw logs.
Metrics:
- Full index time with Tantivy only, SQLite only, both.
- Search latency distribution vs query length and filter complexity.
- Memory footprint vs dataset size.
Use criterion for benchmark harness.
-
CLI & TUI skeleton (Ratatui + crossterm).
-
SQLite storage with schema above.
-
Tantivy index:
- Simple indexing of
content,title,agent_slug,created_at.
- Simple indexing of
-
Connectors:
- Codex CLI (session logs)
- Cline (VS Code globalStorage)
- Gemini CLI (
~/.gemini/tmp)
-
Installer:
install.sh(curl | bash) for Linux/macOS.- Manual install instructions for Windows.
-
Full connectors:
- Claude Code (global projects &
.claudefiles) - OpenCode (SQLite integration)
- Initial Amp support (local caches only).
- Claude Code (global projects &
-
notify-based incremental indexer. -
Filter UI (per-agent, time range, workspace).
-
Config file + dynamic reload (
rto reload config).
-
Better Amp & Claude Code support as they stabilize history APIs.
-
Export features:
agent-search export --agent codex --format jsonletc.
-
“Session merge” features:
- Combine related threads from different tools for the same repo.
-
Optional vector-embedding index layered on top of Tantivy/FTS for semantic search.
A very granular build order to actually implement this:
-
Scaffolding
-
cargo new coding-agent-search -
Add dependencies:
-
-
Core modules
-
Implement
configwith load/save and defaults. -
Implement
storage::sqlite:- DB initialization, pragmas, migrations.
-
Implement
search::tantivy:- schema, index writer, searcher.
-
-
Minimal TUI
- Basic layout (search bar + list + detail).
- Hard-coded dummy data for results.
-
Codex connector
- Env detection, path scanning.
- Minimal JSONL parsing and mapping into DB/index.
-
Cline connector
- VS Code path resolution per OS.
- Task directory parsing.
-
Gemini connector
~/.gemini/tmpscanning and JSON parsing.
-
Index orchestration
- Full
indexcommand. - TUI-triggered incremental reindex.
- Full
-
Installer
- Implement
install.shcopying patterns from UBS (easy mode, sha256 verification, module detection).(GitHub) - Add GitHub workflow to build release tarballs/zips.
- Implement
-
Remaining connectors
- Claude Code, OpenCode, Amp.
-
Polish
- Theming, help screen, keybinding docs.
- Config toggle for FTS vs Tantivy.
- Extensive tests and benchmarks.
This plan should be enough to sit down and start coding the entire system in Rust, with each piece grounded in how the underlying tools really store their histories and in current best practices for Rust TUIs, embedded search, and installer UX.