This document provides comprehensive configuration options for the RAG (Retrieval-Augmented Generation) processing system in NeuroLink.
The RAG processing system consists of three main components:
- Chunkers - Split documents into smaller, processable segments
- Rerankers - Re-score and re-order search results for relevance
- Hybrid Search - Combine BM25 and vector search for improved retrieval
| Strategy | Description | Best For |
|---|---|---|
character |
Fixed-size character splits | Simple text, logs |
recursive |
Paragraph/sentence-aware splits | General documents |
sentence |
Sentence boundary splitting | Natural language text |
token |
Token-based (GPT tokenizer) | LLM context optimization |
markdown |
Header-aware markdown parsing | Documentation, README files |
html |
HTML tag-aware splitting | Web content |
json |
JSON structure-aware | API responses, config files |
latex |
LaTeX section-aware | Academic papers |
semantic-markdown |
Semantic markdown with embeddings | Technical documentation |
type ChunkerConfig = {
// Maximum chunk size (characters or tokens)
maxSize: number; // Default: 1000
// Overlap between chunks (characters or tokens)
overlap: number; // Default: 100
// Minimum chunk size (avoid tiny chunks)
minSize?: number; // Default: 10
// Document ID for metadata tracking
documentId?: string; // Default: auto-generated UUID
// Additional metadata to attach to chunks
metadata?: Record<string, unknown>;
// Whether to preserve metadata from source document
preserveMetadata?: boolean; // Default: true
};const config = {
maxSize: 1000, // Max characters per chunk
overlap: 100, // Character overlap between chunks
separator: "", // No separator (split by character count)
};const config = {
maxSize: 1000,
overlap: 100,
separators: ["\n\n", "\n", ". ", " ", ""], // Priority order
keepSeparators: true, // Keep separators in output chunks
};const config = {
maxSize: 1000, // Max characters per chunk
overlap: 1, // Overlap in sentences (not characters)
minSentences: 1, // Minimum sentences per chunk
maxSentences: 10, // Maximum sentences per chunk
};const config = {
maxSize: 512, // Max tokens per chunk
overlap: 50, // Token overlap
tokenizer: "cl100k_base", // OpenAI tokenizer
};const config = {
maxSize: 1000,
overlap: 100,
preserveHeaders: true, // Include parent headers in chunks
codeBlockHandling: "preserve", // 'preserve' | 'split' | 'remove'
};const config = {
maxSize: 1000,
overlap: 100,
preserveTags: ["p", "div", "section", "article"],
removeTags: ["script", "style", "nav", "footer"],
extractText: true, // Strip HTML tags from output
};const config = {
maxSize: 500,
preserveStructure: true, // Keep valid JSON in chunks
flattenDepth: 2, // Max nesting depth before flattening
arrayHandling: "split", // 'split' | 'preserve'
};const config = {
maxSize: 1000,
overlap: 100,
sectionCommands: ["\\section", "\\subsection", "\\chapter"],
preserveMath: true, // Keep math environments intact
includeComments: false, // Strip LaTeX comments
};const config = {
maxSize: 500,
overlap: 100,
semanticThreshold: 0.7, // Similarity threshold for merging
embedder: "openai", // Embedding provider
};import { createChunker, getAvailableStrategies } from "@juspay/neurolink";
// List available strategies
const strategies = getAvailableStrategies();
console.log(strategies); // ['character', 'recursive', ...]
// Create a chunker with configuration
const chunker = await createChunker("recursive", {
maxSize: 500,
overlap: 50,
});
// Chunk a document
const chunks = await chunker.chunk(documentText, {
maxSize: 500,
overlap: 50,
});
// Each chunk has structure:
// {
// id: string,
// text: string,
// metadata: {
// documentId: string,
// chunkIndex: number,
// startOffset: number,
// endOffset: number,
// ...customMetadata
// }
// }| Type | Description | Requires Model | Use Case |
|---|---|---|---|
simple |
Position + vector score combo | No | Fast, no-cost reranking |
llm |
LLM semantic scoring | Yes | High-quality semantic |
cross-encoder |
Cross-encoder model | Yes | Accuracy-focused |
cohere |
Cohere Rerank API | Yes (API key) | Production-grade |
batch |
Batch LLM reranking | Yes | Large result sets |
type RerankerConfig = {
// Number of top results to return
topK: number; // Default: 10
// Minimum score threshold
minScore?: number; // Default: 0.0
// Include original scores in output
includeOriginalScores?: boolean; // Default: false
};const config = {
topK: 10,
positionWeight: 0.3, // Weight for position in results
scoreWeight: 0.7, // Weight for original vector score
};const config = {
topK: 5,
model: "gpt-4",
temperature: 0.0,
prompt: "Rate relevance of this passage to the query (0-1):",
batchSize: 5, // Process in batches
};const config = {
topK: 10,
model: "cross-encoder/ms-marco-MiniLM-L-12-v2",
normalize: true, // Normalize scores to 0-1
};const config = {
topK: 10,
model: "rerank-english-v2.0",
maxChunksPerDoc: 10,
returnDocuments: false,
};const config = {
topK: 20,
batchSize: 10, // Documents per LLM call
parallelBatches: 3, // Concurrent batches
model: "gpt-3.5-turbo",
};import { createReranker, getAvailableRerankerTypes } from "@juspay/neurolink";
// List available types
const types = getAvailableRerankerTypes();
console.log(types); // ['simple', 'llm', 'cross-encoder', 'cohere', 'batch']
// Create a simple reranker (no model required)
const reranker = await createReranker("simple", { topK: 5 });
// Rerank search results
const reranked = await reranker.rerank(searchResults, query, { topK: 5 });
// Each result has structure:
// {
// id: string,
// text: string,
// score: number,
// originalScore?: number,
// metadata?: Record<string, unknown>
// }type BM25Config = {
// BM25 parameters
k1: number; // Default: 1.2 (term frequency saturation)
b: number; // Default: 0.75 (document length normalization)
// Preprocessing
lowercase: boolean; // Default: true
stemming: boolean; // Default: false
stopwords: string[]; // Default: English stopwords
};import { reciprocalRankFusion } from "@juspay/neurolink";
const fusedScores = reciprocalRankFusion(
[vectorRankings, bm25Rankings],
60, // k parameter (default: 60)
);import { linearCombination } from "@juspay/neurolink";
const combinedScores = linearCombination(
vectorScores, // Map<string, number>
bm25Scores, // Map<string, number>
0.5, // alpha: weight for vector scores (0-1)
);import { createHybridSearch, InMemoryBM25Index } from "@juspay/neurolink";
// Create BM25 index
const bm25Index = new InMemoryBM25Index({ k1: 1.2, b: 0.75 });
// Add documents
await bm25Index.addDocuments([
{ id: "doc1", text: "Document content...", metadata: {} },
// ...
]);
// Create hybrid search
const hybridSearch = createHybridSearch({
bm25Index,
vectorStore, // Your vector store instance
fusionMethod: "rrf", // 'rrf' | 'linear'
alpha: 0.5, // Vector weight (for linear fusion)
k: 60, // RRF parameter
});
// Execute hybrid search
const results = await hybridSearch.search(query, {
topK: 10,
filter: { category: "technical" },
});The RAG system includes resilience patterns to handle failures gracefully.
Circuit breakers prevent cascading failures by stopping operations when error rates are too high.
type RAGCircuitBreakerConfig = {
// Number of failures before opening circuit
failureThreshold: number; // Default: 5
// Time in ms before attempting reset
resetTimeout: number; // Default: 60000 (1 minute)
// Max calls allowed in half-open state
halfOpenMaxCalls: number; // Default: 3
// Operation timeout in ms
operationTimeout: number; // Default: 30000 (30 seconds)
// Minimum calls before calculating failure rate
minimumCallsBeforeCalculation: number; // Default: 10
// Time window for statistics in ms
statisticsWindowSize: number; // Default: 300000 (5 minutes)
};import {
getCircuitBreaker,
executeWithCircuitBreaker,
} from "@juspay/neurolink";
// Create a circuit breaker for vector queries
const breaker = getCircuitBreaker("vector-queries", {
failureThreshold: 3,
resetTimeout: 30000,
});
// Execute operation with circuit breaker protection
const result = await breaker.execute(async () => {
return await vectorStore.query(embedding, { topK: 10 });
}, "vector-query");
// Or use the convenience function
const result = await executeWithCircuitBreaker(
"embedding-service",
() => embeddingProvider.embed(text),
"embedding",
{ failureThreshold: 5 },
);
// Get circuit breaker statistics
const stats = breaker.getStats();
// {
// state: 'closed' | 'open' | 'half-open',
// totalCalls: number,
// failureRate: number,
// averageLatency: number,
// p95Latency: number,
// ...
// }Retry handlers provide automatic retries with exponential backoff for transient failures.
type RAGRetryConfig = {
// Maximum number of retry attempts
maxRetries: number; // Default: 3
// Initial delay in ms
initialDelay: number; // Default: 1000
// Maximum delay in ms
maxDelay: number; // Default: 30000
// Backoff multiplier
backoffMultiplier: number; // Default: 2
// Whether to add jitter
jitter: boolean; // Default: true
// Retryable HTTP status codes
retryableStatusCodes?: number[]; // Default: [408, 429, 500, 502, 503, 504]
};import {
withRAGRetry,
RAGRetryHandler,
embeddingRetryHandler,
vectorStoreRetryHandler,
} from "@juspay/neurolink";
// Simple retry wrapper
const result = await withRAGRetry(() => embeddingProvider.embed(text), {
maxRetries: 5,
initialDelay: 2000,
});
// Use specialized retry handlers
const embedding = await embeddingRetryHandler.executeWithRetry(() =>
embeddingProvider.embed(text),
);
const queryResult = await vectorStoreRetryHandler.executeWithRetry(() =>
vectorStore.query(embedding),
);
// Batch operations with retry
const handler = new RAGRetryHandler({ maxRetries: 3 });
const results = await handler.executeBatch(
documents,
async (doc, index) => await processDocument(doc),
{ concurrency: 5, continueOnError: true },
);
// Returns: { successful: [...], failed: [...], successRate: number }| Handler | maxRetries | initialDelay | Use Case |
|---|---|---|---|
embeddingRetryHandler |
5 | 2000ms | Embedding API rate limits |
vectorStoreRetryHandler |
3 | 1000ms | Vector store operations |
metadataExtractionRetryHandler |
3 | 1500ms | LLM-based metadata extraction |
The RAG system supports extracting metadata from document chunks using LLMs.
| Type | Description | Output |
|---|---|---|
title |
Extract document title | string |
summary |
Generate chunk summary | string |
keywords |
Extract relevant keywords | string[] |
questions |
Generate Q&A pairs for retrieval | {question, answer}[] |
custom |
Custom schema extraction with Zod | Record<string, unknown> |
type BaseExtractorConfig = {
// Language model to use
modelName?: string; // e.g., "gpt-4", "claude-3-sonnet"
// Provider for the model
provider?: string; // e.g., "openai", "anthropic"
// Custom prompt template
promptTemplate?: string;
// Maximum tokens for LLM response
maxTokens?: number;
// Temperature for LLM generation
temperature?: number;
};const titleConfig = {
modelName: "gpt-4",
nodes: 5, // Number of nodes to analyze
nodeTemplate: "Extract the main topic from: {text}",
combineTemplate: "Combine these topics into a title: {topics}",
};const summaryConfig = {
modelName: "gpt-3.5-turbo",
summaryTypes: ["current", "previous", "next"], // Context-aware summaries
maxWords: 100, // Maximum summary length
};const keywordConfig = {
modelName: "gpt-3.5-turbo",
maxKeywords: 10, // Maximum keywords to extract
minRelevance: 0.5, // Minimum relevance score (0-1)
};const questionConfig = {
modelName: "gpt-4",
numQuestions: 5, // Number of Q&A pairs
includeAnswers: true, // Include answers in output
embeddingOnly: false, // Generate full questions vs embedding-optimized
};import { MDocument } from "@juspay/neurolink";
const doc = new MDocument(content, { type: "markdown" });
// Chunk with metadata extraction
const chunks = await doc.chunk({
strategy: "recursive",
config: { maxSize: 1000, overlap: 100 },
extract: {
title: true,
summary: { maxWords: 50 },
keywords: { maxKeywords: 5 },
questions: { numQuestions: 3 },
},
});
// Each chunk now includes extracted metadata:
// {
// id: string,
// text: string,
// metadata: {
// title: "Extracted Title",
// summary: "Brief summary...",
// keywords: ["keyword1", "keyword2"],
// ...
// }
// }import {
createChunker,
createReranker,
createHybridSearch,
} from "@juspay/neurolink";
// 1. Configure chunker
const chunker = await createChunker("recursive", {
maxSize: 500,
overlap: 50,
});
// 2. Configure reranker
const reranker = await createReranker("simple", {
topK: 5,
});
// 3. Configure hybrid search
const hybridSearch = createHybridSearch({
bm25Index,
vectorStore,
fusionMethod: "rrf",
});
// 4. Process documents
const chunks = await chunker.chunk(document);
// 5. Index chunks (implementation depends on your vector store)
await vectorStore.addDocuments(chunks);
await bm25Index.addDocuments(chunks);
// 6. Search and rerank
const searchResults = await hybridSearch.search(query, { topK: 20 });
const finalResults = await reranker.rerank(searchResults, query, { topK: 5 });| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
For LLM/semantic reranking | Optional |
COHERE_API_KEY |
For Cohere reranker | Optional |
ANTHROPIC_API_KEY |
For Claude-based reranking | Optional |
- Match chunk size to context window - Use token chunker for LLMs
- Choose strategy by content type - Markdown for docs, HTML for web
- Use overlap for continuity - 10-20% overlap prevents context loss
- Preserve structure - Use format-aware chunkers when possible
- Start simple - Simple reranker is fast and often sufficient
- Use LLM reranking for quality - When accuracy matters more than speed
- Batch for efficiency - Use batch reranker for large result sets
- Consider cost - API-based rerankers have per-call costs
- Balance weights - Start with 0.5 alpha and tune based on results
- RRF is robust - Less sensitive to score scale differences
- Index incrementally - Update both BM25 and vector indices together
- Filter early - Apply metadata filters before fusion when possible
- Empty chunks - Check if maxSize is too small for content
- Overlapping content - Reduce overlap parameter
- Missing context - Increase chunk size or overlap
- Slow reranking - Use simple reranker or reduce topK
- Poor search quality - Tune BM25 parameters (k1, b)
# Enable verbose logging
DEBUG=neurolink:rag:* npx tsx your-script.tsFor complete API documentation, see the TypeScript definitions in:
src/lib/rag/types.ts- Core type definitionssrc/lib/rag/ChunkerFactory.ts- Chunker factory APIsrc/lib/rag/reranker/RerankerFactory.ts- Reranker factory APIsrc/lib/rag/retrieval/hybridSearch.ts- Hybrid search API
- RAG Feature Guide - Main RAG documentation with quick start and overview
- RAG Testing Guide - How to run RAG tests
- RAG API Reference - API documentation