Environment-based configuration for the Largefile MCP Server.
Largefile uses environment variables for all configuration to keep MCP tool signatures clean and simple. Set these variables before starting your MCP client.
Control how files are accessed based on size:
# Memory loading threshold (default: 50MB)
LARGEFILE_MEMORY_THRESHOLD_MB=50
# Memory mapping threshold (default: 500MB)
LARGEFILE_MMAP_THRESHOLD_MB=500File Access Strategy:
- < 50MB: Loaded into memory with Tree-sitter AST caching
- 50-500MB: Memory-mapped access with streaming search
- > 500MB: Chunk-based streaming processing
Configure line truncation for very long lines:
# Trigger truncation for lines longer than this (default: 1000)
LARGEFILE_MAX_LINE_LENGTH=1000
# Display length for truncated lines (default: 500)
LARGEFILE_TRUNCATE_LENGTH=500Behavior:
- Lines exceeding
MAX_LINE_LENGTHare truncated in overview and search results - Original file content is never modified
- Full content available via
read_contenttool
Control fuzzy search sensitivity and behavior:
# Minimum similarity score for fuzzy matches (default: 0.8)
LARGEFILE_FUZZY_THRESHOLD=0.8
# Maximum search results returned (default: 20)
LARGEFILE_MAX_SEARCH_RESULTS=20
# Context lines before/after matches (default: 2)
LARGEFILE_CONTEXT_LINES=2Fuzzy Threshold Values:
1.0: Exact matches only0.9: Very strict fuzzy matching0.8: Balanced (default) - handles typos and formatting0.7: Loose matching - more results, lower precision< 0.7: Not recommended - too many false positives
# Chunk size for streaming large files (default: 8192 bytes)
LARGEFILE_STREAMING_CHUNK_SIZE=8192
# Enable parallel processing for multi-pattern search (default: true)
LARGEFILE_ENABLE_PARALLEL_SEARCH=trueControl semantic analysis features:
# Enable/disable Tree-sitter parsing (default: true)
LARGEFILE_ENABLE_TREE_SITTER=true
# Maximum time for AST parsing (default: 5 seconds)
LARGEFILE_TREE_SITTER_TIMEOUT=5
# Cache parsed ASTs for session reuse (default: true)
LARGEFILE_ENABLE_AST_CACHE=trueWhen to Disable Tree-sitter:
- Working primarily with non-code text files
- Memory constraints in containerized environments
- Parsing timeouts on very large or complex files
- Language not supported (falls back to text-based analysis)
Configure automatic backup behavior:
# Directory for edit backups (default: ~/.largefile/backups)
LARGEFILE_BACKUP_DIR="/path/to/backups"
# Maximum number of backups to keep per file (default: 10)
LARGEFILE_MAX_BACKUPS=10Backup Behavior:
- Automatic backup created before every edit operation
- Backups named:
{filename}.{path_hash}.{timestamp}for uniqueness - Old backups automatically cleaned up based on
MAX_BACKUPS - Use
revert_edittool to restore any backup
Backup Naming Convention:
example.py.a1b2c3d4.20240115_143022
│ │ │
│ │ └── Timestamp (YYYYMMDD_HHMMSS)
│ └── Path hash (first 8 chars of SHA-256)
└── Original filename
Configure enhanced error messages when edits fail:
# Maximum similar matches to show on edit failure (default: 3)
LARGEFILE_SIMILAR_MATCH_LIMIT=3
# Minimum similarity score to include in suggestions (default: 0.6)
LARGEFILE_SIMILAR_MATCH_THRESHOLD=0.6Error Recovery Behavior:
- When
edit_contentfails to find a pattern, it searches for similar lines - Returns up to
SIMILAR_MATCH_LIMITsuggestions with similarity scores - Only shows matches above
SIMILAR_MATCH_THRESHOLD(0.0-1.0 scale) - Includes actionable suggestion like "Did you mean X on line Y?"
Configure batch edit limits:
# Maximum changes per batch edit call (default: 50)
LARGEFILE_MAX_BATCH_CHANGES=50Batch Editing Behavior:
- Prevents excessively large batch operations
- All changes in a batch share a single backup
- Partial success is supported (some changes can fail)
- Per-change results include individual error details
Control logging output and debug information:
# Log level: DEBUG, INFO, WARNING, ERROR (default: INFO)
LARGEFILE_LOG_LEVEL=INFO
# Enable performance metrics logging (default: false)
LARGEFILE_ENABLE_METRICS=false
# Log file path (default: stderr)
LARGEFILE_LOG_FILE="/path/to/largefile.log"Debug Mode:
# Enable all debug features
LARGEFILE_LOG_LEVEL=DEBUG
LARGEFILE_ENABLE_METRICS=true
LARGEFILE_TREE_SITTER_TIMEOUT=10For fast machines with plenty of memory:
# Larger memory thresholds
LARGEFILE_MEMORY_THRESHOLD_MB=200
LARGEFILE_MMAP_THRESHOLD_MB=2000
# More aggressive search
LARGEFILE_MAX_SEARCH_RESULTS=50
LARGEFILE_FUZZY_THRESHOLD=0.7
# Larger chunks for streaming
LARGEFILE_STREAMING_CHUNK_SIZE=65536
# Extended timeouts
LARGEFILE_TREE_SITTER_TIMEOUT=10For containers or low-memory systems:
# Conservative memory usage
LARGEFILE_MEMORY_THRESHOLD_MB=10
LARGEFILE_MMAP_THRESHOLD_MB=50
# Disable caching
LARGEFILE_ENABLE_AST_CACHE=false
# Smaller chunks
LARGEFILE_STREAMING_CHUNK_SIZE=4096
# Stricter search limits
LARGEFILE_MAX_SEARCH_RESULTS=10
LARGEFILE_CONTEXT_LINES=1For non-code files or when Tree-sitter isn't needed:
# Disable semantic features
LARGEFILE_ENABLE_TREE_SITTER=false
LARGEFILE_ENABLE_AST_CACHE=false
# Focus on text search performance
LARGEFILE_FUZZY_THRESHOLD=0.8
LARGEFILE_MAX_SEARCH_RESULTS=30
LARGEFILE_ENABLE_PARALLEL_SEARCH=trueFor debugging and development:
# Verbose logging
LARGEFILE_LOG_LEVEL=DEBUG
LARGEFILE_ENABLE_METRICS=true
# Preserve all data
LARGEFILE_MAX_BACKUPS_PER_FILE=100
LARGEFILE_COMPRESS_BACKUPS=false
# Detailed truncation
LARGEFILE_MAX_LINE_LENGTH=500
LARGEFILE_TRUNCATE_LENGTH=200Check your configuration:
# Verify environment variables are set
env | grep LARGEFILE_
# Test with a sample file
echo "test content" > test.txt
# Use get_overview tool to verify settings are workingCommon Issues:
- Memory errors: Reduce
MEMORY_THRESHOLD_MB - Slow performance: Increase chunk sizes or disable Tree-sitter
- Too many/few search results: Adjust
FUZZY_THRESHOLDandMAX_SEARCH_RESULTS - Backup failures: Check
BACKUP_DIRpermissions
- Profile your typical file sizes - set thresholds appropriately
- Monitor memory usage - enable metrics to track consumption
- Adjust AST caching - disable for very large codebases
- Use streaming - for files that don't need semantic analysis
- Tune fuzzy threshold - balance precision vs recall
- Limit context lines - reduce for performance, increase for clarity
- Use exact matching - when possible for fastest results
- Parallel search - enable for multi-pattern workflows
- SSD storage - dramatically improves memory-mapped performance
- Backup location - use fast storage for backup directory
- Compression - enable for backup storage efficiency
- Cleanup frequency - adjust max backups based on usage
See Performance Documentation for detailed benchmarks and optimization guides.
Control the list_directory tool behaviour:
# Maximum total entries returned by list_directory (default: 200)
LARGEFILE_MAX_DIR_ENTRIES=200
# Comma-separated directory names that are always skipped (default shown)
LARGEFILE_IGNORED_DIR_PATTERNS=__pycache__,node_modules,.gitNotes:
LARGEFILE_MAX_DIR_ENTRIESis a hard cap across all recursion depths combined.LARGEFILE_IGNORED_DIR_PATTERNSmatches exact directory names (not paths or globs). Hidden directories (e.g..git) are filtered independently byinclude_hidden.- Entries are always listed directories-first, then files, each group sorted alphabetically.
Control the search_directory tool behaviour:
# Maximum total matches returned by search_directory across all files (default: 100)
LARGEFILE_MAX_DIR_SEARCH_RESULTS=100
# Maximum number of files visited by search_directory before stopping (default: 10000)
# Prevents runaway walks on very large directory trees (e.g. searching /)
LARGEFILE_MAX_DIR_SEARCH_FILES=10000Notes:
LARGEFILE_MAX_DIR_SEARCH_RESULTSis a hard cap on total matches (not files). When reached,truncated=Trueandtruncated_atindicates where scanning stopped.LARGEFILE_MAX_DIR_SEARCH_FILESis a hard cap on files visited (not matches). Protects against very deep or wide trees;truncated=Trueis set when triggered.LARGEFILE_IGNORED_DIR_PATTERNSis shared withlist_directory— one config controls both tools.fuzzy=Falseis the default forsearch_directory; enabling it on large trees can be slow. Useinclude_patternto narrow the scope first.