Skip to content

Latest commit

 

History

History
613 lines (497 loc) · 21.2 KB

File metadata and controls

613 lines (497 loc) · 21.2 KB

RLM-RS Architecture

Internal architecture documentation for rlm-rs.

Overview

RLM-RS implements the Recursive Language Model (RLM) pattern from arXiv:2512.24601, enabling LLMs to process documents up to 100x larger than their context windows.

Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        Claude Code                               │
│  ┌─────────────────┐    ┌─────────────────┐                     │
│  │   Root LLM      │───▶│   Sub-LLM       │                     │
│  │ (Opus/Sonnet)   │    │   (Haiku)       │                     │
│  └────────┬────────┘    └────────┬────────┘                     │
│           │                      │                               │
│           ▼                      ▼                               │
│  ┌─────────────────────────────────────────┐                    │
│  │              Bash Tool                   │                    │
│  └─────────────────┬───────────────────────┘                    │
└────────────────────┼────────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────────┐
│                        rlm-rs CLI                                │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                      CLI Layer                               ││
│  │  parser.rs │ commands.rs │ output.rs                        ││
│  └─────────────────────────┬───────────────────────────────────┘│
│                            │                                     │
│  ┌─────────────────────────┴───────────────────────────────────┐│
│  │                    Core Domain                               ││
│  │  Buffer │ Chunk │ Context │ Variable                        ││
│  └─────────────────────────┬───────────────────────────────────┘│
│                            │                                     │
│  ┌────────────┬────────────┴────────────┬────────────┬────────────┐│
│  │  Chunking  │       Storage           │ Embedding  │    I/O     ││
│  │  ─────────  │       ───────           │ ─────────  │    ───     ││
│  │  Fixed     │       SQLite            │  BGE-M3   │   Reader   ││
│  │  Semantic  │       FTS5 (BM25)       │ fastembed │   (mmap)   ││
│  │  Code      │       Hybrid Search     │  (1024d)  │   Unicode  ││
│  │  Parallel  │       HNSW (optional)   │           │            ││
│  └────────────┴─────────────────────────┴────────────┴────────────┘│
└─────────────────────────────────────────────────────────────────────┘

Module Structure

src/
├── lib.rs           # Library entry point and public API
├── main.rs          # Binary entry point
├── error.rs         # Error types (thiserror)
│
├── core/            # Core domain types
│   ├── mod.rs
│   ├── buffer.rs    # Buffer: loaded file content
│   ├── chunk.rs     # Chunk: content segment with metadata
│   └── context.rs   # Context: variables and state
│
├── chunking/        # Chunking strategies
│   ├── mod.rs       # Strategy factory and constants
│   ├── traits.rs    # Chunker trait definition
│   ├── fixed.rs     # Fixed-size chunking
│   ├── semantic.rs  # Sentence/paragraph-aware chunking
│   ├── code.rs      # Language-aware code chunking
│   └── parallel.rs  # Multi-threaded chunking
│
├── embedding/       # Embedding generation
│   ├── mod.rs       # Embedding trait and constants
│   ├── fastembed_impl.rs  # BGE-M3 via fastembed-rs
│   └── fallback.rs  # Fallback when fastembed unavailable
│
├── storage/         # Persistence layer
│   ├── mod.rs
│   ├── traits.rs    # Storage trait definition
│   ├── schema.rs    # Database schema and migrations
│   └── sqlite.rs    # SQLite implementation
│
├── search/          # Hybrid search
│   ├── mod.rs       # Hybrid search (semantic + BM25 with RRF)
│   ├── hnsw.rs      # HNSW vector index (usearch-hnsw feature)
│   └── rrf.rs       # Reciprocal Rank Fusion scoring
│
├── io/              # File I/O
│   ├── mod.rs
│   ├── reader.rs    # File reading with mmap
│   └── unicode.rs   # Unicode/grapheme utilities
│
└── cli/             # Command-line interface
    ├── mod.rs
    ├── parser.rs    # Clap argument definitions
    ├── commands.rs  # Command implementations
    └── output.rs    # Output formatting

Core Types

Buffer

Represents a loaded file with metadata:

pub struct Buffer {
    pub id: Option<i64>,
    pub name: String,
    pub content: String,
    pub source: Option<String>,
    pub metadata: BufferMetadata,
}

pub struct BufferMetadata {
    pub size: usize,
    pub line_count: usize,
    pub hash: String,
    pub content_type: Option<String>,
    pub chunk_count: usize,
    pub created_at: Option<String>,
    pub updated_at: Option<String>,
}

Chunk

Represents a segment of buffer content:

pub struct Chunk {
    pub buffer_id: i64,
    pub content: String,
    pub byte_range: Range<usize>,
    pub index: usize,
    pub metadata: ChunkMetadata,
}

pub struct ChunkMetadata {
    pub token_count: Option<usize>,
    pub has_overlap: bool,
    pub strategy: Option<String>,
}

Context

Manages variables and state:

pub struct Context {
    buffers: HashMap<i64, Buffer>,
    variables: HashMap<String, ContextValue>,
    globals: HashMap<String, ContextValue>,
}

pub enum ContextValue {
    String(String),
    Number(i64),
    Float(f64),
    Boolean(bool),
    List(Vec<ContextValue>),
}

Chunking System

Chunker Trait

All chunking strategies implement:

pub trait Chunker: Send + Sync {
    fn chunk(
        &self,
        buffer_id: i64,
        text: &str,
        metadata: Option<&ChunkMetadata>,
    ) -> Result<Vec<Chunk>>;

    fn name(&self) -> &'static str;
    fn supports_parallel(&self) -> bool;
    fn description(&self) -> &'static str;
    fn validate(&self, metadata: Option<&ChunkMetadata>) -> Result<()>;
}

Strategy Selection

Strategy Algorithm Use Case
SemanticChunker Unicode sentence/paragraph boundaries Markdown, prose
CodeChunker Language-aware function/class boundaries Source code files
FixedChunker Character boundaries with UTF-8 safety Logs, raw text
ParallelChunker Rayon-parallelized wrapper for any Chunker Texts larger than 100 KB

Code Chunker Languages

The CodeChunker uses regex-based pattern matching for multiple languages:

Language Extensions Boundary Detection
Rust .rs fn, impl, struct, enum, mod
Python .py def, class, async def
JavaScript/TypeScript .js, .jsx, .ts, .tsx function, class, const =
Go .go func, type
Java .java class, interface, method signatures
C/C++ .c, .cpp, .h, .hpp Function definitions
Ruby .rb def, class, module
PHP .php function, class

Default Configuration

pub const DEFAULT_CHUNK_SIZE: usize = 3_000;    // ~750 tokens
pub const DEFAULT_OVERLAP: usize = 500;          // Context continuity
pub const MAX_CHUNK_SIZE: usize = 50_000;        // Safety limit

Storage Layer

Storage Trait

pub trait Storage: Send + Sync {
    // Buffer operations
    fn add_buffer(&mut self, buffer: &Buffer) -> Result<i64>;
    fn get_buffer(&self, id: i64) -> Result<Option<Buffer>>;
    fn get_buffer_by_name(&self, name: &str) -> Result<Option<Buffer>>;
    fn update_buffer(&mut self, buffer: &Buffer) -> Result<()>;
    fn delete_buffer(&mut self, id: i64) -> Result<()>;
    fn list_buffers(&self) -> Result<Vec<Buffer>>;

    // Chunk operations
    fn add_chunks(&mut self, buffer_id: i64, chunks: &[Chunk]) -> Result<()>;
    fn get_chunks(&self, buffer_id: i64) -> Result<Vec<Chunk>>;
    fn delete_chunks(&mut self, buffer_id: i64) -> Result<()>;

    // Variable operations
    fn set_variable(&mut self, name: &str, value: &ContextValue) -> Result<()>;
    fn get_variable(&self, name: &str) -> Result<Option<ContextValue>>;
    fn delete_variable(&mut self, name: &str) -> Result<()>;

    // Global operations
    fn set_global(&mut self, name: &str, value: &ContextValue) -> Result<()>;
    fn get_global(&self, name: &str) -> Result<Option<ContextValue>>;
    fn delete_global(&mut self, name: &str) -> Result<()>;
}

SQLite Schema

-- Buffers table
CREATE TABLE buffers (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,
    content TEXT NOT NULL,
    source TEXT,
    size INTEGER NOT NULL,
    line_count INTEGER NOT NULL,
    hash TEXT NOT NULL,
    content_type TEXT,
    chunk_count INTEGER DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Chunks table
CREATE TABLE chunks (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    buffer_id INTEGER NOT NULL REFERENCES buffers(id) ON DELETE CASCADE,
    content TEXT NOT NULL,
    byte_start INTEGER NOT NULL,
    byte_end INTEGER NOT NULL,
    chunk_index INTEGER NOT NULL,
    token_count INTEGER,
    has_overlap INTEGER DEFAULT 0,
    strategy TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Variables table
CREATE TABLE variables (
    name TEXT PRIMARY KEY,
    value TEXT NOT NULL,
    value_type TEXT NOT NULL,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Globals table
CREATE TABLE globals (
    name TEXT PRIMARY KEY,
    value TEXT NOT NULL,
    value_type TEXT NOT NULL,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

I/O Layer

Memory-Mapped File Reading

For large files, memmap2 provides efficient reading:

pub fn read_file(path: &Path) -> Result<String> {
    let file = File::open(path)?;
    let mmap = unsafe { MmapOptions::new().map(&file)? };
    let content = std::str::from_utf8(&mmap)?;
    Ok(content.to_string())
}

Unicode Handling

The unicode-segmentation crate ensures proper handling of:

  • Multi-byte UTF-8 characters
  • Grapheme clusters
  • Sentence boundaries
pub const fn find_char_boundary(s: &str, pos: usize) -> usize {
    if pos >= s.len() {
        return s.len();
    }
    let bytes = s.as_bytes();
    let mut boundary = pos;
    // UTF-8 continuation bytes start with 10xxxxxx (0x80-0xBF)
    while boundary > 0 && (bytes[boundary] & 0xC0) == 0x80 {
        boundary -= 1;
    }
    boundary
}

Error Handling

All errors use thiserror for ergonomic error types:

#[derive(Error, Debug)]
pub enum Error {
    #[error("Storage error: {0}")]
    Storage(#[from] StorageError),

    #[error("Chunking error: {0}")]
    Chunking(#[from] ChunkingError),

    #[error("I/O error: {0}")]
    Io(#[from] std::io::Error),

    #[error("Command error: {0}")]
    Command(#[from] CommandError),
}

RLM Pattern Implementation

Concept Mapping

RLM Concept rlm-rs Implementation
Root LLM Claude Code main conversation (Opus/Sonnet)
Sub-LLM Claude Code subagent (Haiku)
External Environment rlm-rs CLI + SQLite database
Chunk Chunk struct with byte range and metadata
Buffer Buffer struct with full content
State SQLite persistence + context variables

Workflow

  1. Load: Large document loaded into buffer, chunked, stored in SQLite
  2. Index: Root LLM queries chunk indices via chunk-indices
  3. Process: Sub-LLM processes individual chunks via file reads
  4. Aggregate: Results stored back via add-buffer
  5. Synthesize: Root LLM synthesizes final result

Performance Considerations

Token Estimation

Chunks target ~10,000 tokens to fit within Claude's 25,000 token read limit:

impl Chunk {
    pub fn estimate_tokens(&self) -> usize {
        // Approximate: 4 characters per token
        self.content.len() / 4
    }
}

Parallel Processing

The ParallelChunker uses Rayon for multi-threaded chunking:

impl Chunker for ParallelChunker {
    fn chunk(&self, buffer_id: i64, text: &str, metadata: Option<&ChunkMetadata>) -> Result<Vec<Chunk>> {
        let segments = split_into_segments(text, self.segment_count);

        segments
            .par_iter()
            .enumerate()
            .flat_map(|(i, segment)| {
                self.inner.chunk(buffer_id, segment, metadata)
            })
            .collect()
    }
}

Parallel Semantic Search

semantic_search() distributes the O(n) cosine-similarity scan across all CPU cores using rayon::par_iter(). The work-stealing scheduler provides near-linear speedup as the embedding collection grows:

fn semantic_search(
    storage: &SqliteStorage,
    embedder: &dyn Embedder,
    query: &str,
    config: &SearchConfig,
) -> Result<Vec<(i64, f32)>> {
    use rayon::prelude::*;

    // Generate query embedding
    let query_embedding = embedder.embed(query)?;

    // Get all embeddings from storage
    let all_embeddings = storage.get_all_embeddings()?;

    if all_embeddings.is_empty() {
        return Ok(Vec::new());
    }

    // Calculate similarities in parallel (rayon data parallelism)
    let mut similarities: Vec<(i64, f32)> = all_embeddings
        .par_iter()
        .map(|(chunk_id, embedding)| {
            let sim = cosine_similarity(&query_embedding, embedding);
            (*chunk_id, sim)
        })
        .filter(|(_, sim)| *sim >= config.similarity_threshold)
        .collect();

    // Sort by similarity descending
    similarities.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));

    // Limit results
    similarities.truncate(config.top_k * 2);

    Ok(similarities)
}

Pre-Sized Embedding Buffers

store_embedding() and store_embeddings_batch() pre-allocate the serialization buffer before converting f32 values to little-endian bytes, avoiding repeated heap reallocations per embedding:

store_embedding() uses an explicit loop:

let mut bytes = Vec::with_capacity(embedding.len() * 4);
for f in embedding {
    bytes.extend_from_slice(&f.to_le_bytes());
}

store_embeddings_batch() uses iterator chaining:

let mut bytes = Vec::with_capacity(embedding.len() * 4);
bytes.extend(embedding.iter().flat_map(|f| f.to_le_bytes()));

Single-Query Buffer Embedding Check

buffer_fully_embedded() and its underlying SqliteStorage::all_chunks_have_embeddings() check whether every chunk in a buffer has a stored embedding using one NOT EXISTS query, regardless of chunk count:

SELECT NOT EXISTS (
    SELECT 1 FROM chunks c
    LEFT JOIN chunk_embeddings e ON e.chunk_id = c.id
    WHERE c.buffer_id = ? AND e.chunk_id IS NULL
)

This replaces the previous pattern of issuing one has_embedding() round-trip per chunk, reducing database traffic from O(n) queries to O(1) for a buffer with n chunks. The impact is most visible when checking large documents that have been split into hundreds of chunks.

Testing Strategy

Unit Tests

Each module has #[cfg(test)] tests:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_chunk_boundaries() {
        let chunker = SemanticChunker::with_size(100);
        let chunks = chunker.chunk(1, "Hello. World.", None).unwrap();
        assert!(!chunks.is_empty());
    }
}

Integration Tests

tests/integration_test.rs covers end-to-end workflows.

Property-Based Tests

Using proptest for invariant verification:

proptest! {
    #[test]
    fn chunk_byte_range_valid(content in ".{1,1000}") {
        let chunker = FixedChunker::with_size(100);
        let chunks = chunker.chunk(1, &content, None).unwrap();
        for chunk in chunks {
            prop_assert!(chunk.byte_range.end <= content.len());
        }
    }
}

Search System

Hybrid Search Architecture

rlm-cli implements a hybrid search system combining multiple retrieval methods:

┌─────────────────────────────────────────────────────────────┐
│                      Search Query                            │
└─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
    ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
    │ Semantic Search │ │  BM25 Search    │ │  HNSW Index     │
    │  (Embeddings)   │ │  (FTS5)         │ │  (Optional)     │
    └────────┬────────┘ └────────┬────────┘ └────────┬────────┘
             │                   │                   │
             └───────────────────┼───────────────────┘
                                 ▼
                    ┌───────────────────────┐
                    │  Reciprocal Rank      │
                    │  Fusion (RRF)         │
                    └───────────────────────┘
                                 │
                                 ▼
                    ┌───────────────────────┐
                    │   Ranked Results      │
                    └───────────────────────┘

Embedding System

Component Implementation Details
Model BGE-M3 via fastembed 1024 dimensions
Fallback Hash-based embedder When fastembed unavailable
Storage SQLite BLOB Compact binary storage
Incremental embed_buffer_chunks_incremental Only new/changed chunks

HNSW Index (Optional)

When the usearch-hnsw feature is enabled:

  • O(log n) approximate nearest neighbor search
  • Persistent index on disk
  • Incremental updates
  • Falls back to brute-force when disabled

Future Extensions

Planned Features

  • Streaming: Process chunks as they're generated
  • Compression: Compress stored content
  • Encryption: Encrypt sensitive buffers

Extension Points

  • Chunker trait for custom chunking strategies
  • Embedder trait for alternative embedding models
  • Storage trait for alternative backends (PostgreSQL, Redis)
  • Output formatters for additional formats (YAML, TOML)

See Also