Opportunistic Background Indexing

Automatically fills file annotations and knowledge graph during idle time

Overview

IdlerGear now includes an opportunistic indexing system that automatically:

Annotates files with descriptions, tags, and components
Populates the knowledge graph with commits, symbols, and relationships
Runs in small batches (5 items) after each MCP tool completes
Never blocks or runs in parallel with active work

How It Works

User calls tool → Tool executes → Tool returns → Index 5 items → Return to idle

Key Features:

Automatic: No manual triggering needed
Opportunistic: Only runs when system is idle (after tool completion)
Small Batches: Processes 5 items at a time (fast, non-blocking)
Smart Priority: Files > Commits > Symbols (most useful first)
Progress Tracking: Always knows what's left to index

Current Status

Check what needs indexing:

from idlergear.indexing import get_indexing_status

status = get_indexing_status()
print(f"{status['file_annotations']['unannotated']} files need annotation")
print(f"{status['knowledge_graph']['commits']['remaining']} commits need indexing")
print(f"{status['summary']['estimated_batches_remaining']} batches remaining")

Example Output:

{
  "file_annotations": {
    "total_files": 142,
    "annotated": 4,
    "unannotated": 138,
    "percent_complete": 2.8
  },
  "knowledge_graph": {
    "commits": {
      "indexed": 0,
      "estimated_total": 100,
      "remaining": 100,
      "percent_complete": 0.0
    },
    "symbols": {
      "indexed": 0,
      "estimated_total": 2840,
      "remaining": 2840,
      "percent_complete": 0.0
    }
  },
  "summary": {
    "total_work_remaining": 238,
    "estimated_batches_remaining": 47
  }
}

MCP Tools

Check Status

idlergear_indexing_status()
# Returns: Full indexing status with progress percentages

Manual Batch

idlergear_index_batch(batch_size=5, target="auto")
# Processes 5 items (files, commits, or symbols)
# target: "auto", "files", "commits", "symbols"

Pause/Resume

idlergear_pause_indexing()   # Pause during performance-critical operations
idlergear_resume_indexing()  # Resume automatic indexing

Indexing Priority

When target="auto", the system prioritizes:

File Annotations (fastest, 93% token savings)
- Extracts docstrings, classes, functions from Python files
- Auto-generates descriptions and tags
- Enables semantic file search
Git Commits (historical context)
- Indexes commit messages, authors, timestamps
- Links commits to files changed
- Enables "what changed when" queries
Code Symbols (deep code understanding)
- Indexes functions, classes, methods
- Links symbols to files
- Enables "find function by name" queries

File Annotation Generation

Auto-generated from file contents:

# For src/idlergear/graph/queries.py:
{
  "description": "Common query patterns for IdlerGear knowledge graph",
  "tags": ["graph", "query"],
  "components": ["query_task_context", "query_file_context", "query_symbols_by_name"],
  "related_files": []
}

Extraction Logic:

Description: From module docstring or filename
Tags: From file path (test, api, graph, mcp, etc.)
Components: Public functions and classes (excludes _private)

Performance

Batch Processing:

File annotation: ~50ms per file (parse + extract + save)
Commit indexing: ~100ms per commit (parse + relationships)
Symbol indexing: ~200ms per file (AST parsing + graph insert)

Total for 5 items: ~500ms (half a second, imperceptible)

Non-Blocking:

Runs in finally block after tool completion
Uses global flag to prevent recursive indexing
Failures don't break normal tool operations

State Tracking

State saved to .idlergear/indexing_state.json:

{
  "paused": false,
  "last_file_index": 25,
  "last_commit_index": 15,
  "last_symbol_index": 10,
  "last_run": "2026-02-26T14:30:00",
  "total_files_indexed": 25,
  "total_commits_indexed": 15,
  "total_symbols_indexed": 150
}

Incremental Progress:

Tracks last processed index for files, commits, symbols
Skips already-indexed items (idempotent)
Resumes from where it left off across sessions

Configuration

Default: Indexing is enabled and runs automatically

To disable:

idlergear_pause_indexing()

To manually control:

# Check if work available
from idlergear.indexing import should_run_indexing
if should_run_indexing():
    # Process batch
    result = idlergear_index_batch(batch_size=10)  # Larger batch
    print(f"Indexed {result['files_annotated']} files")

Architecture

Module: src/idlergear/indexing/background.py

Key Functions:

get_indexing_status() - Check what needs indexing
index_next_batch(batch_size, target) - Process items
should_run_indexing() - Check if work available
pause_indexing() / resume_indexing() - Control

MCP Server Hook: src/idlergear/mcp_server.py

_run_opportunistic_indexing() - Called in finally block of call_tool()
Global flag _indexing_in_progress prevents recursion

State File: .idlergear/indexing_state.json

Tracks progress across sessions
Survives MCP server restarts

Benefits

For Users

No manual work: Annotations and graph fill in automatically
Always up to date: Runs after every tool call
Non-blocking: Never slows down active work
Progress visibility: Always know what's left

For AI Assistants

93% token savings: File annotations enable semantic search
95-98% token savings: Knowledge graph enables context queries
Faster responses: Pre-indexed data = instant lookups
Better context: Complete file/commit/symbol data always available

Examples

Automatic Usage (Default)

Just use IdlerGear normally - indexing happens automatically:

# User calls any tool
idlergear_task_list()

# → Tool executes
# → Returns results
# → Indexes 5 items in background
# → Returns to idle

Manual Batch Processing

Process multiple batches explicitly:

# Index 50 files (10 batches of 5)
for i in range(10):
    result = idlergear_index_batch(batch_size=5, target="files")
    print(f"Batch {i+1}: {result['files_annotated']} files annotated")

Check Progress

Monitor indexing progress:

status = idlergear_indexing_status()

print(f"Files: {status['file_annotations']['percent_complete']}% complete")
print(f"Commits: {status['knowledge_graph']['commits']['percent_complete']}% complete")
print(f"Estimated batches remaining: {status['summary']['estimated_batches_remaining']}")

Future Enhancements

Potential improvements:

Adaptive batch sizing: Larger batches when system very idle, smaller when busy
Priority hints: User can mark files/paths as high priority
Time budgets: Maximum time per batch (e.g., 500ms)
Completion callbacks: Notify when all indexing complete
Metrics tracking: Histogram of indexing times, success rates

Troubleshooting

Indexing not running?

Check if paused: idlergear_indexing_status() → "paused": true
Resume: idlergear_resume_indexing()

Indexing too slow?

Check batch size (default: 5 items)
Manually trigger larger batch: idlergear_index_batch(batch_size=10)

Want to force immediate indexing?

# Process all remaining work (blocking)
while idlergear_indexing_status()['summary']['total_work_remaining'] > 0:
    idlergear_index_batch(batch_size=10)

Files not being annotated?

Check if already annotated: idlergear_indexing_status() → "annotated": N
Check for errors: Run index_batch() and inspect "errors" field

Summary

The opportunistic indexing system ensures that IdlerGear's knowledge base is always:

✅ Up to date (runs after every tool)
✅ Complete (eventually processes all files/commits/symbols)
✅ Non-blocking (small batches, idle-time only)
✅ Transparent (automatic, visible progress)

No more manual indexing commands - just use IdlerGear normally and it fills in the gaps automatically!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opportunistic Background Indexing

Overview

How It Works

Current Status

MCP Tools

Check Status

Manual Batch

Pause/Resume

Indexing Priority

File Annotation Generation

Performance

State Tracking

Configuration

Architecture

Benefits

For Users

For AI Assistants

Examples

Automatic Usage (Default)

Manual Batch Processing

Check Progress

Future Enhancements

Troubleshooting

Summary

FilesExpand file tree

OPPORTUNISTIC_INDEXING.md

Latest commit

History

OPPORTUNISTIC_INDEXING.md

File metadata and controls

Opportunistic Background Indexing

Overview

How It Works

Current Status

MCP Tools

Check Status

Manual Batch

Pause/Resume

Indexing Priority

File Annotation Generation

Performance

State Tracking

Configuration

Architecture

Benefits

For Users

For AI Assistants

Examples

Automatic Usage (Default)

Manual Batch Processing

Check Progress

Future Enhancements

Troubleshooting

Summary