Internal architecture documentation for rlm-rs.
RLM-RS implements the Recursive Language Model (RLM) pattern from arXiv:2512.24601, enabling LLMs to process documents up to 100x larger than their context windows.
┌─────────────────────────────────────────────────────────────────┐
│ Claude Code │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Root LLM │───▶│ Sub-LLM │ │
│ │ (Opus/Sonnet) │ │ (Haiku) │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Bash Tool │ │
│ └─────────────────┬───────────────────────┘ │
└────────────────────┼────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ rlm-rs CLI │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ CLI Layer ││
│ │ parser.rs │ commands.rs │ output.rs ││
│ └─────────────────────────┬───────────────────────────────────┘│
│ │ │
│ ┌─────────────────────────┴───────────────────────────────────┐│
│ │ Core Domain ││
│ │ Buffer │ Chunk │ Context │ Variable ││
│ └─────────────────────────┬───────────────────────────────────┘│
│ │ │
│ ┌────────────┬────────────┴────────────┬────────────┬────────────┐│
│ │ Chunking │ Storage │ Embedding │ I/O ││
│ │ ───────── │ ─────── │ ───────── │ ─── ││
│ │ Fixed │ SQLite │ BGE-M3 │ Reader ││
│ │ Semantic │ FTS5 (BM25) │ fastembed │ (mmap) ││
│ │ Code │ Hybrid Search │ (1024d) │ Unicode ││
│ │ Parallel │ HNSW (optional) │ │ ││
│ └────────────┴─────────────────────────┴────────────┴────────────┘│
└─────────────────────────────────────────────────────────────────────┘
src/
├── lib.rs # Library entry point and public API
├── main.rs # Binary entry point
├── error.rs # Error types (thiserror)
│
├── core/ # Core domain types
│ ├── mod.rs
│ ├── buffer.rs # Buffer: loaded file content
│ ├── chunk.rs # Chunk: content segment with metadata
│ └── context.rs # Context: variables and state
│
├── chunking/ # Chunking strategies
│ ├── mod.rs # Strategy factory and constants
│ ├── traits.rs # Chunker trait definition
│ ├── fixed.rs # Fixed-size chunking
│ ├── semantic.rs # Sentence/paragraph-aware chunking
│ ├── code.rs # Language-aware code chunking
│ └── parallel.rs # Multi-threaded chunking
│
├── embedding/ # Embedding generation
│ ├── mod.rs # Embedding trait and constants
│ ├── fastembed_impl.rs # BGE-M3 via fastembed-rs
│ └── fallback.rs # Fallback when fastembed unavailable
│
├── storage/ # Persistence layer
│ ├── mod.rs
│ ├── traits.rs # Storage trait definition
│ ├── schema.rs # Database schema and migrations
│ └── sqlite.rs # SQLite implementation
│
├── search/ # Hybrid search
│ ├── mod.rs # Hybrid search (semantic + BM25 with RRF)
│ ├── hnsw.rs # HNSW vector index (usearch-hnsw feature)
│ └── rrf.rs # Reciprocal Rank Fusion scoring
│
├── io/ # File I/O
│ ├── mod.rs
│ ├── reader.rs # File reading with mmap
│ └── unicode.rs # Unicode/grapheme utilities
│
└── cli/ # Command-line interface
├── mod.rs
├── parser.rs # Clap argument definitions
├── commands.rs # Command implementations
└── output.rs # Output formatting
Represents a loaded file with metadata:
pub struct Buffer {
pub id: Option<i64>,
pub name: String,
pub content: String,
pub source: Option<String>,
pub metadata: BufferMetadata,
}
pub struct BufferMetadata {
pub size: usize,
pub line_count: usize,
pub hash: String,
pub content_type: Option<String>,
pub chunk_count: usize,
pub created_at: Option<String>,
pub updated_at: Option<String>,
}Represents a segment of buffer content:
pub struct Chunk {
pub buffer_id: i64,
pub content: String,
pub byte_range: Range<usize>,
pub index: usize,
pub metadata: ChunkMetadata,
}
pub struct ChunkMetadata {
pub token_count: Option<usize>,
pub has_overlap: bool,
pub strategy: Option<String>,
}Manages variables and state:
pub struct Context {
buffers: HashMap<i64, Buffer>,
variables: HashMap<String, ContextValue>,
globals: HashMap<String, ContextValue>,
}
pub enum ContextValue {
String(String),
Number(i64),
Float(f64),
Boolean(bool),
List(Vec<ContextValue>),
}All chunking strategies implement:
pub trait Chunker: Send + Sync {
fn chunk(
&self,
buffer_id: i64,
text: &str,
metadata: Option<&ChunkMetadata>,
) -> Result<Vec<Chunk>>;
fn name(&self) -> &'static str;
fn supports_parallel(&self) -> bool;
fn description(&self) -> &'static str;
fn validate(&self, metadata: Option<&ChunkMetadata>) -> Result<()>;
}| Strategy | Algorithm | Use Case |
|---|---|---|
SemanticChunker |
Unicode sentence/paragraph boundaries | Markdown, prose |
CodeChunker |
Language-aware function/class boundaries | Source code files |
FixedChunker |
Character boundaries with UTF-8 safety | Logs, raw text |
ParallelChunker |
Rayon-parallelized wrapper for any Chunker |
Texts larger than 100 KB |
The CodeChunker uses regex-based pattern matching for multiple languages:
| Language | Extensions | Boundary Detection |
|---|---|---|
| Rust | .rs | fn, impl, struct, enum, mod |
| Python | .py | def, class, async def |
| JavaScript/TypeScript | .js, .jsx, .ts, .tsx | function, class, const = |
| Go | .go | func, type |
| Java | .java | class, interface, method signatures |
| C/C++ | .c, .cpp, .h, .hpp | Function definitions |
| Ruby | .rb | def, class, module |
| PHP | .php | function, class |
pub const DEFAULT_CHUNK_SIZE: usize = 3_000; // ~750 tokens
pub const DEFAULT_OVERLAP: usize = 500; // Context continuity
pub const MAX_CHUNK_SIZE: usize = 50_000; // Safety limitpub trait Storage: Send + Sync {
// Buffer operations
fn add_buffer(&mut self, buffer: &Buffer) -> Result<i64>;
fn get_buffer(&self, id: i64) -> Result<Option<Buffer>>;
fn get_buffer_by_name(&self, name: &str) -> Result<Option<Buffer>>;
fn update_buffer(&mut self, buffer: &Buffer) -> Result<()>;
fn delete_buffer(&mut self, id: i64) -> Result<()>;
fn list_buffers(&self) -> Result<Vec<Buffer>>;
// Chunk operations
fn add_chunks(&mut self, buffer_id: i64, chunks: &[Chunk]) -> Result<()>;
fn get_chunks(&self, buffer_id: i64) -> Result<Vec<Chunk>>;
fn delete_chunks(&mut self, buffer_id: i64) -> Result<()>;
// Variable operations
fn set_variable(&mut self, name: &str, value: &ContextValue) -> Result<()>;
fn get_variable(&self, name: &str) -> Result<Option<ContextValue>>;
fn delete_variable(&mut self, name: &str) -> Result<()>;
// Global operations
fn set_global(&mut self, name: &str, value: &ContextValue) -> Result<()>;
fn get_global(&self, name: &str) -> Result<Option<ContextValue>>;
fn delete_global(&mut self, name: &str) -> Result<()>;
}-- Buffers table
CREATE TABLE buffers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
content TEXT NOT NULL,
source TEXT,
size INTEGER NOT NULL,
line_count INTEGER NOT NULL,
hash TEXT NOT NULL,
content_type TEXT,
chunk_count INTEGER DEFAULT 0,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- Chunks table
CREATE TABLE chunks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
buffer_id INTEGER NOT NULL REFERENCES buffers(id) ON DELETE CASCADE,
content TEXT NOT NULL,
byte_start INTEGER NOT NULL,
byte_end INTEGER NOT NULL,
chunk_index INTEGER NOT NULL,
token_count INTEGER,
has_overlap INTEGER DEFAULT 0,
strategy TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- Variables table
CREATE TABLE variables (
name TEXT PRIMARY KEY,
value TEXT NOT NULL,
value_type TEXT NOT NULL,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- Globals table
CREATE TABLE globals (
name TEXT PRIMARY KEY,
value TEXT NOT NULL,
value_type TEXT NOT NULL,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);For large files, memmap2 provides efficient reading:
pub fn read_file(path: &Path) -> Result<String> {
let file = File::open(path)?;
let mmap = unsafe { MmapOptions::new().map(&file)? };
let content = std::str::from_utf8(&mmap)?;
Ok(content.to_string())
}The unicode-segmentation crate ensures proper handling of:
- Multi-byte UTF-8 characters
- Grapheme clusters
- Sentence boundaries
pub const fn find_char_boundary(s: &str, pos: usize) -> usize {
if pos >= s.len() {
return s.len();
}
let bytes = s.as_bytes();
let mut boundary = pos;
// UTF-8 continuation bytes start with 10xxxxxx (0x80-0xBF)
while boundary > 0 && (bytes[boundary] & 0xC0) == 0x80 {
boundary -= 1;
}
boundary
}All errors use thiserror for ergonomic error types:
#[derive(Error, Debug)]
pub enum Error {
#[error("Storage error: {0}")]
Storage(#[from] StorageError),
#[error("Chunking error: {0}")]
Chunking(#[from] ChunkingError),
#[error("I/O error: {0}")]
Io(#[from] std::io::Error),
#[error("Command error: {0}")]
Command(#[from] CommandError),
}| RLM Concept | rlm-rs Implementation |
|---|---|
| Root LLM | Claude Code main conversation (Opus/Sonnet) |
| Sub-LLM | Claude Code subagent (Haiku) |
| External Environment | rlm-rs CLI + SQLite database |
| Chunk | Chunk struct with byte range and metadata |
| Buffer | Buffer struct with full content |
| State | SQLite persistence + context variables |
- Load: Large document loaded into buffer, chunked, stored in SQLite
- Index: Root LLM queries chunk indices via
chunk-indices - Process: Sub-LLM processes individual chunks via file reads
- Aggregate: Results stored back via
add-buffer - Synthesize: Root LLM synthesizes final result
Chunks target ~10,000 tokens to fit within Claude's 25,000 token read limit:
impl Chunk {
pub fn estimate_tokens(&self) -> usize {
// Approximate: 4 characters per token
self.content.len() / 4
}
}The ParallelChunker uses Rayon for multi-threaded chunking:
impl Chunker for ParallelChunker {
fn chunk(&self, buffer_id: i64, text: &str, metadata: Option<&ChunkMetadata>) -> Result<Vec<Chunk>> {
let segments = split_into_segments(text, self.segment_count);
segments
.par_iter()
.enumerate()
.flat_map(|(i, segment)| {
self.inner.chunk(buffer_id, segment, metadata)
})
.collect()
}
}semantic_search() distributes the O(n) cosine-similarity scan across all CPU cores using
rayon::par_iter(). The work-stealing scheduler provides near-linear speedup as the embedding
collection grows:
fn semantic_search(
storage: &SqliteStorage,
embedder: &dyn Embedder,
query: &str,
config: &SearchConfig,
) -> Result<Vec<(i64, f32)>> {
use rayon::prelude::*;
// Generate query embedding
let query_embedding = embedder.embed(query)?;
// Get all embeddings from storage
let all_embeddings = storage.get_all_embeddings()?;
if all_embeddings.is_empty() {
return Ok(Vec::new());
}
// Calculate similarities in parallel (rayon data parallelism)
let mut similarities: Vec<(i64, f32)> = all_embeddings
.par_iter()
.map(|(chunk_id, embedding)| {
let sim = cosine_similarity(&query_embedding, embedding);
(*chunk_id, sim)
})
.filter(|(_, sim)| *sim >= config.similarity_threshold)
.collect();
// Sort by similarity descending
similarities.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
// Limit results
similarities.truncate(config.top_k * 2);
Ok(similarities)
}store_embedding() and store_embeddings_batch() pre-allocate the serialization buffer before
converting f32 values to little-endian bytes, avoiding repeated heap reallocations per embedding:
store_embedding() uses an explicit loop:
let mut bytes = Vec::with_capacity(embedding.len() * 4);
for f in embedding {
bytes.extend_from_slice(&f.to_le_bytes());
}store_embeddings_batch() uses iterator chaining:
let mut bytes = Vec::with_capacity(embedding.len() * 4);
bytes.extend(embedding.iter().flat_map(|f| f.to_le_bytes()));buffer_fully_embedded() and its underlying SqliteStorage::all_chunks_have_embeddings() check
whether every chunk in a buffer has a stored embedding using one NOT EXISTS query, regardless of
chunk count:
SELECT NOT EXISTS (
SELECT 1 FROM chunks c
LEFT JOIN chunk_embeddings e ON e.chunk_id = c.id
WHERE c.buffer_id = ? AND e.chunk_id IS NULL
)This replaces the previous pattern of issuing one has_embedding() round-trip per chunk, reducing
database traffic from O(n) queries to O(1) for a buffer with n chunks. The impact is most visible
when checking large documents that have been split into hundreds of chunks.
Each module has #[cfg(test)] tests:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_chunk_boundaries() {
let chunker = SemanticChunker::with_size(100);
let chunks = chunker.chunk(1, "Hello. World.", None).unwrap();
assert!(!chunks.is_empty());
}
}tests/integration_test.rs covers end-to-end workflows.
Using proptest for invariant verification:
proptest! {
#[test]
fn chunk_byte_range_valid(content in ".{1,1000}") {
let chunker = FixedChunker::with_size(100);
let chunks = chunker.chunk(1, &content, None).unwrap();
for chunk in chunks {
prop_assert!(chunk.byte_range.end <= content.len());
}
}
}rlm-cli implements a hybrid search system combining multiple retrieval methods:
┌─────────────────────────────────────────────────────────────┐
│ Search Query │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Semantic Search │ │ BM25 Search │ │ HNSW Index │
│ (Embeddings) │ │ (FTS5) │ │ (Optional) │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└───────────────────┼───────────────────┘
▼
┌───────────────────────┐
│ Reciprocal Rank │
│ Fusion (RRF) │
└───────────────────────┘
│
▼
┌───────────────────────┐
│ Ranked Results │
└───────────────────────┘
| Component | Implementation | Details |
|---|---|---|
| Model | BGE-M3 via fastembed | 1024 dimensions |
| Fallback | Hash-based embedder | When fastembed unavailable |
| Storage | SQLite BLOB | Compact binary storage |
| Incremental | embed_buffer_chunks_incremental |
Only new/changed chunks |
When the usearch-hnsw feature is enabled:
- O(log n) approximate nearest neighbor search
- Persistent index on disk
- Incremental updates
- Falls back to brute-force when disabled
- Streaming: Process chunks as they're generated
- Compression: Compress stored content
- Encryption: Encrypt sensitive buffers
Chunkertrait for custom chunking strategiesEmbeddertrait for alternative embedding modelsStoragetrait for alternative backends (PostgreSQL, Redis)- Output formatters for additional formats (YAML, TOML)
- RLM-Inspired Design - How rlm-rs builds on the RLM paper
- Plugin Integration - Claude Code plugin setup and portability
- CLI Reference - Complete command documentation
- API Reference - Rust library documentation
- README.md - Project overview
- RLM Paper - Original research paper