RLM-RS Architecture

Internal architecture documentation for rlm-rs.

Overview

RLM-RS implements the Recursive Language Model (RLM) pattern from arXiv:2512.24601, enabling LLMs to process documents up to 100x larger than their context windows.

Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        Claude Code                               │
│  ┌─────────────────┐    ┌─────────────────┐                     │
│  │   Root LLM      │───▶│   Sub-LLM       │                     │
│  │ (Opus/Sonnet)   │    │   (Haiku)       │                     │
│  └────────┬────────┘    └────────┬────────┘                     │
│           │                      │                               │
│           ▼                      ▼                               │
│  ┌─────────────────────────────────────────┐                    │
│  │              Bash Tool                   │                    │
│  └─────────────────┬───────────────────────┘                    │
└────────────────────┼────────────────────────────────────────────┘
                     │
                     ▼
┌─────────────────────────────────────────────────────────────────┐
│                        rlm-rs CLI                                │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                      CLI Layer                               ││
│  │  parser.rs │ commands.rs │ output.rs                        ││
│  └─────────────────────────┬───────────────────────────────────┘│
│                            │                                     │
│  ┌─────────────────────────┴───────────────────────────────────┐│
│  │                    Core Domain                               ││
│  │  Buffer │ Chunk │ Context │ Variable                        ││
│  └─────────────────────────┬───────────────────────────────────┘│
│                            │                                     │
│  ┌────────────┬────────────┴────────────┬────────────┬────────────┐│
│  │  Chunking  │       Storage           │ Embedding  │    I/O     ││
│  │  ─────────  │       ───────           │ ─────────  │    ───     ││
│  │  Fixed     │       SQLite            │  BGE-M3   │   Reader   ││
│  │  Semantic  │       FTS5 (BM25)       │ fastembed │   (mmap)   ││
│  │  Code      │       Hybrid Search     │  (1024d)  │   Unicode  ││
│  │  Parallel  │       HNSW (optional)   │           │            ││
│  └────────────┴─────────────────────────┴────────────┴────────────┘│
└─────────────────────────────────────────────────────────────────────┘

Module Structure

src/
├── lib.rs           # Library entry point and public API
├── main.rs          # Binary entry point
├── error.rs         # Error types (thiserror)
│
├── core/            # Core domain types
│   ├── mod.rs
│   ├── buffer.rs    # Buffer: loaded file content
│   ├── chunk.rs     # Chunk: content segment with metadata
│   └── context.rs   # Context: variables and state
│
├── chunking/        # Chunking strategies
│   ├── mod.rs       # Strategy factory and constants
│   ├── traits.rs    # Chunker trait definition
│   ├── fixed.rs     # Fixed-size chunking
│   ├── semantic.rs  # Sentence/paragraph-aware chunking
│   ├── code.rs      # Language-aware code chunking
│   └── parallel.rs  # Multi-threaded chunking
│
├── embedding/       # Embedding generation
│   ├── mod.rs       # Embedding trait and constants
│   ├── fastembed_impl.rs  # BGE-M3 via fastembed-rs
│   └── fallback.rs  # Fallback when fastembed unavailable
│
├── storage/         # Persistence layer
│   ├── mod.rs
│   ├── traits.rs    # Storage trait definition
│   ├── schema.rs    # Database schema and migrations
│   └── sqlite.rs    # SQLite implementation
│
├── search/          # Hybrid search
│   ├── mod.rs       # Hybrid search (semantic + BM25 with RRF)
│   ├── hnsw.rs      # HNSW vector index (usearch-hnsw feature)
│   └── rrf.rs       # Reciprocal Rank Fusion scoring
│
├── io/              # File I/O
│   ├── mod.rs
│   ├── reader.rs    # File reading with mmap
│   └── unicode.rs   # Unicode/grapheme utilities
│
└── cli/             # Command-line interface
    ├── mod.rs
    ├── parser.rs    # Clap argument definitions
    ├── commands.rs  # Command implementations
    └── output.rs    # Output formatting

Core Types

Buffer

Represents a loaded file with metadata:

pub struct Buffer {
    pub id: Option<i64>,
    pub name: String,
    pub content: String,
    pub source: Option<String>,
    pub metadata: BufferMetadata,
}

pub struct BufferMetadata {
    pub size: usize,
    pub line_count: usize,
    pub hash: String,
    pub content_type: Option<String>,
    pub chunk_count: usize,
    pub created_at: Option<String>,
    pub updated_at: Option<String>,
}

Chunk

Represents a segment of buffer content:

pub struct Chunk {
    pub buffer_id: i64,
    pub content: String,
    pub byte_range: Range<usize>,
    pub index: usize,
    pub metadata: ChunkMetadata,
}

pub struct ChunkMetadata {
    pub token_count: Option<usize>,
    pub has_overlap: bool,
    pub strategy: Option<String>,
}

Context

Manages variables and state:

pub struct Context {
    buffers: HashMap<i64, Buffer>,
    variables: HashMap<String, ContextValue>,
    globals: HashMap<String, ContextValue>,
}

pub enum ContextValue {
    String(String),
    Number(i64),
    Float(f64),
    Boolean(bool),
    List(Vec<ContextValue>),
}

Chunking System

Chunker Trait

All chunking strategies implement:

pub trait Chunker: Send + Sync {
    fn chunk(
        &self,
        buffer_id: i64,
        text: &str,
        metadata: Option<&ChunkMetadata>,
    ) -> Result<Vec<Chunk>>;

    fn name(&self) -> &'static str;
    fn supports_parallel(&self) -> bool;
    fn description(&self) -> &'static str;
    fn validate(&self, metadata: Option<&ChunkMetadata>) -> Result<()>;
}

Strategy Selection

Strategy	Algorithm	Use Case
`SemanticChunker`	Unicode sentence/paragraph boundaries	Markdown, prose
`CodeChunker`	Language-aware function/class boundaries	Source code files
`FixedChunker`	Character boundaries with UTF-8 safety	Logs, raw text
`ParallelChunker`	Rayon-parallelized wrapper for any `Chunker`	Texts larger than 100 KB

Code Chunker Languages

The CodeChunker uses regex-based pattern matching for multiple languages:

Language	Extensions	Boundary Detection
Rust	.rs	`fn`, `impl`, `struct`, `enum`, `mod`
Python	.py	`def`, `class`, `async def`
JavaScript/TypeScript	.js, .jsx, .ts, .tsx	`function`, `class`, `const =`
Go	.go	`func`, `type`
Java	.java	`class`, `interface`, method signatures
C/C++	.c, .cpp, .h, .hpp	Function definitions
Ruby	.rb	`def`, `class`, `module`
PHP	.php	`function`, `class`

Default Configuration

pub const DEFAULT_CHUNK_SIZE: usize = 3_000;    // ~750 tokens
pub const DEFAULT_OVERLAP: usize = 500;          // Context continuity
pub const MAX_CHUNK_SIZE: usize = 50_000;        // Safety limit

Storage Layer

Storage Trait

pub trait Storage: Send + Sync {
    // Buffer operations
    fn add_buffer(&mut self, buffer: &Buffer) -> Result<i64>;
    fn get_buffer(&self, id: i64) -> Result<Option<Buffer>>;
    fn get_buffer_by_name(&self, name: &str) -> Result<Option<Buffer>>;
    fn update_buffer(&mut self, buffer: &Buffer) -> Result<()>;
    fn delete_buffer(&mut self, id: i64) -> Result<()>;
    fn list_buffers(&self) -> Result<Vec<Buffer>>;

    // Chunk operations
    fn add_chunks(&mut self, buffer_id: i64, chunks: &[Chunk]) -> Result<()>;
    fn get_chunks(&self, buffer_id: i64) -> Result<Vec<Chunk>>;
    fn delete_chunks(&mut self, buffer_id: i64) -> Result<()>;

    // Variable operations
    fn set_variable(&mut self, name: &str, value: &ContextValue) -> Result<()>;
    fn get_variable(&self, name: &str) -> Result<Option<ContextValue>>;
    fn delete_variable(&mut self, name: &str) -> Result<()>;

    // Global operations
    fn set_global(&mut self, name: &str, value: &ContextValue) -> Result<()>;
    fn get_global(&self, name: &str) -> Result<Option<ContextValue>>;
    fn delete_global(&mut self, name: &str) -> Result<()>;
}

SQLite Schema

-- Buffers table
CREATE TABLE buffers (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,
    content TEXT NOT NULL,
    source TEXT,
    size INTEGER NOT NULL,
    line_count INTEGER NOT NULL,
    hash TEXT NOT NULL,
    content_type TEXT,
    chunk_count INTEGER DEFAULT 0,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Chunks table
CREATE TABLE chunks (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    buffer_id INTEGER NOT NULL REFERENCES buffers(id) ON DELETE CASCADE,
    content TEXT NOT NULL,
    byte_start INTEGER NOT NULL,
    byte_end INTEGER NOT NULL,
    chunk_index INTEGER NOT NULL,
    token_count INTEGER,
    has_overlap INTEGER DEFAULT 0,
    strategy TEXT,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Variables table
CREATE TABLE variables (
    name TEXT PRIMARY KEY,
    value TEXT NOT NULL,
    value_type TEXT NOT NULL,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Globals table
CREATE TABLE globals (
    name TEXT PRIMARY KEY,
    value TEXT NOT NULL,
    value_type TEXT NOT NULL,
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);

I/O Layer

Memory-Mapped File Reading

For large files, memmap2 provides efficient reading:

pub fn read_file(path: &Path) -> Result<String> {
    let file = File::open(path)?;
    let mmap = unsafe { MmapOptions::new().map(&file)? };
    let content = std::str::from_utf8(&mmap)?;
    Ok(content.to_string())
}

Unicode Handling

The unicode-segmentation crate ensures proper handling of:

Multi-byte UTF-8 characters
Grapheme clusters
Sentence boundaries

pub const fn find_char_boundary(s: &str, pos: usize) -> usize {
    if pos >= s.len() {
        return s.len();
    }
    let bytes = s.as_bytes();
    let mut boundary = pos;
    // UTF-8 continuation bytes start with 10xxxxxx (0x80-0xBF)
    while boundary > 0 && (bytes[boundary] & 0xC0) == 0x80 {
        boundary -= 1;
    }
    boundary
}

Error Handling

All errors use thiserror for ergonomic error types:

#[derive(Error, Debug)]
pub enum Error {
    #[error("Storage error: {0}")]
    Storage(#[from] StorageError),

    #[error("Chunking error: {0}")]
    Chunking(#[from] ChunkingError),

    #[error("I/O error: {0}")]
    Io(#[from] std::io::Error),

    #[error("Command error: {0}")]
    Command(#[from] CommandError),
}

RLM Pattern Implementation

Concept Mapping

RLM Concept	rlm-rs Implementation
Root LLM	Claude Code main conversation (Opus/Sonnet)
Sub-LLM	Claude Code subagent (Haiku)
External Environment	`rlm-rs` CLI + SQLite database
Chunk	`Chunk` struct with byte range and metadata
Buffer	`Buffer` struct with full content
State	SQLite persistence + context variables

Workflow

Load: Large document loaded into buffer, chunked, stored in SQLite
Index: Root LLM queries chunk indices via chunk-indices
Process: Sub-LLM processes individual chunks via file reads
Aggregate: Results stored back via add-buffer
Synthesize: Root LLM synthesizes final result

Performance Considerations

Token Estimation

Chunks target ~10,000 tokens to fit within Claude's 25,000 token read limit:

impl Chunk {
    pub fn estimate_tokens(&self) -> usize {
        // Approximate: 4 characters per token
        self.content.len() / 4
    }
}

Parallel Processing

The ParallelChunker uses Rayon for multi-threaded chunking:

impl Chunker for ParallelChunker {
    fn chunk(&self, buffer_id: i64, text: &str, metadata: Option<&ChunkMetadata>) -> Result<Vec<Chunk>> {
        let segments = split_into_segments(text, self.segment_count);

        segments
            .par_iter()
            .enumerate()
            .flat_map(|(i, segment)| {
                self.inner.chunk(buffer_id, segment, metadata)
            })
            .collect()
    }
}

Parallel Semantic Search

semantic_search() distributes the O(n) cosine-similarity scan across all CPU cores using rayon::par_iter(). The work-stealing scheduler provides near-linear speedup as the embedding collection grows:

fn semantic_search(
    storage: &SqliteStorage,
    embedder: &dyn Embedder,
    query: &str,
    config: &SearchConfig,
) -> Result<Vec<(i64, f32)>> {
    use rayon::prelude::*;

    // Generate query embedding
    let query_embedding = embedder.embed(query)?;

    // Get all embeddings from storage
    let all_embeddings = storage.get_all_embeddings()?;

    if all_embeddings.is_empty() {
        return Ok(Vec::new());
    }

    // Calculate similarities in parallel (rayon data parallelism)
    let mut similarities: Vec<(i64, f32)> = all_embeddings
        .par_iter()
        .map(|(chunk_id, embedding)| {
            let sim = cosine_similarity(&query_embedding, embedding);
            (*chunk_id, sim)
        })
        .filter(|(_, sim)| *sim >= config.similarity_threshold)
        .collect();

    // Sort by similarity descending
    similarities.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));

    // Limit results
    similarities.truncate(config.top_k * 2);

    Ok(similarities)
}

Pre-Sized Embedding Buffers

store_embedding() and store_embeddings_batch() pre-allocate the serialization buffer before converting f32 values to little-endian bytes, avoiding repeated heap reallocations per embedding:

store_embedding() uses an explicit loop:

let mut bytes = Vec::with_capacity(embedding.len() * 4);
for f in embedding {
    bytes.extend_from_slice(&f.to_le_bytes());
}

store_embeddings_batch() uses iterator chaining:

let mut bytes = Vec::with_capacity(embedding.len() * 4);
bytes.extend(embedding.iter().flat_map(|f| f.to_le_bytes()));

Single-Query Buffer Embedding Check

buffer_fully_embedded() and its underlying SqliteStorage::all_chunks_have_embeddings() check whether every chunk in a buffer has a stored embedding using one NOT EXISTS query, regardless of chunk count:

SELECT NOT EXISTS (
    SELECT 1 FROM chunks c
    LEFT JOIN chunk_embeddings e ON e.chunk_id = c.id
    WHERE c.buffer_id = ? AND e.chunk_id IS NULL
)

This replaces the previous pattern of issuing one has_embedding() round-trip per chunk, reducing database traffic from O(n) queries to O(1) for a buffer with n chunks. The impact is most visible when checking large documents that have been split into hundreds of chunks.

Testing Strategy

Unit Tests

Each module has #[cfg(test)] tests:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_chunk_boundaries() {
        let chunker = SemanticChunker::with_size(100);
        let chunks = chunker.chunk(1, "Hello. World.", None).unwrap();
        assert!(!chunks.is_empty());
    }
}

Integration Tests

tests/integration_test.rs covers end-to-end workflows.

Property-Based Tests

Using proptest for invariant verification:

proptest! {
    #[test]
    fn chunk_byte_range_valid(content in ".{1,1000}") {
        let chunker = FixedChunker::with_size(100);
        let chunks = chunker.chunk(1, &content, None).unwrap();
        for chunk in chunks {
            prop_assert!(chunk.byte_range.end <= content.len());
        }
    }
}

Search System

Hybrid Search Architecture

rlm-cli implements a hybrid search system combining multiple retrieval methods:

┌─────────────────────────────────────────────────────────────┐
│                      Search Query                            │
└─────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
    ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
    │ Semantic Search │ │  BM25 Search    │ │  HNSW Index     │
    │  (Embeddings)   │ │  (FTS5)         │ │  (Optional)     │
    └────────┬────────┘ └────────┬────────┘ └────────┬────────┘
             │                   │                   │
             └───────────────────┼───────────────────┘
                                 ▼
                    ┌───────────────────────┐
                    │  Reciprocal Rank      │
                    │  Fusion (RRF)         │
                    └───────────────────────┘
                                 │
                                 ▼
                    ┌───────────────────────┐
                    │   Ranked Results      │
                    └───────────────────────┘

Embedding System

Component	Implementation	Details
Model	BGE-M3 via fastembed	1024 dimensions
Fallback	Hash-based embedder	When fastembed unavailable
Storage	SQLite BLOB	Compact binary storage
Incremental	`embed_buffer_chunks_incremental`	Only new/changed chunks

HNSW Index (Optional)

When the usearch-hnsw feature is enabled:

O(log n) approximate nearest neighbor search
Persistent index on disk
Incremental updates
Falls back to brute-force when disabled

Future Extensions

Planned Features

Streaming: Process chunks as they're generated
Compression: Compress stored content
Encryption: Encrypt sensitive buffers

Extension Points

Chunker trait for custom chunking strategies
Embedder trait for alternative embedding models
Storage trait for alternative backends (PostgreSQL, Redis)
Output formatters for additional formats (YAML, TOML)

FilesExpand file tree

architecture.md

Latest commit

History