Skip to content

Commit 36aad2c

Browse files
committed
docs: simplify readme, clarify schema docs
1 parent 7e06146 commit 36aad2c

3 files changed

Lines changed: 152 additions & 271 deletions

File tree

README.md

Lines changed: 75 additions & 250 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,20 @@
11
# Largefile MCP Server
22

3-
An MCP server that enables AI assistants to work with large files that exceed context limits.
3+
Navigate, search, and edit large codebases, logs, and data files that exceed AI context limits.
44

5-
[![CI](https://img.shields.io/github/actions/workflow/status/peteretelej/largefile/ci.yml?branch=main&logo=github)](https://github.com/peteretelej/largefile/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/peteretelej/largefile/branch/main/graph/badge.svg)](https://codecov.io/gh/peteretelej/largefile) [![PyPI version](https://img.shields.io/pypi/v/largefile.svg)](https://pypi.org/project/largefile/) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) [![uv](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/uv/main/assets/badge/v0.json)](https://github.com/astral-sh/uv)
5+
[![CI](https://img.shields.io/github/actions/workflow/status/peteretelej/largefile/ci.yml?branch=main&logo=github)](https://github.com/peteretelej/largefile/actions/workflows/ci.yml) [![codecov](https://codecov.io/gh/peteretelej/largefile/branch/main/graph/badge.svg)](https://codecov.io/gh/peteretelej/largefile) [![PyPI version](https://img.shields.io/pypi/v/largefile.svg)](https://pypi.org/project/largefile/) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
66

7-
Navigate, search, and edit files of any size without loading entire content into memory. Largefile provides targeted access to specific lines, patterns, and sections while maintaining file integrity using research-backed search/replace editing instead of error-prone line-based operations.
7+
## Why Largefile?
88

9-
Perfect for working with large codebases, generated files, logs, and datasets that would otherwise be inaccessible due to context window limitations.
10-
11-
## MCP Tools
12-
13-
Five tools that work together for progressive file exploration:
14-
15-
| Tool | Purpose |
16-
|------|---------|
17-
| **`get_overview`** | File structure with Tree-sitter semantic analysis, binary detection, and long line stats |
18-
| **`search_content`** | Pattern search with fuzzy, regex, count-only, and invert matching modes |
19-
| **`read_content`** | Targeted reading by offset/limit, pattern, tail mode, or head mode |
20-
| **`edit_content`** | Batch search/replace editing with automatic backups and preview mode |
21-
| **`revert_edit`** | Recover from bad edits by reverting to previous backup states |
9+
- **Go beyond context limits** - Read, search, and edit files too large to fit in AI context windows
10+
- **Semantic code navigation** - Tree-sitter extracts functions/classes for Python, JS/TS, Rust, Go
11+
- **Fewer LLM errors** - Search/replace editing eliminates line number mistakes common with line-based edits
12+
- **Smart search** - Fuzzy matching, regex, case-insensitive, inverted, and count-only modes
13+
- **No size limits** - Handles multi-GB files via tiered memory strategy (RAM → mmap → streaming)
2214

2315
## Quick Start
2416

25-
**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) (an extremely fast Python package manager) which provides the `uvx` command.
26-
27-
Add to your MCP configuration:
17+
**Prerequisite:** Install [uv](https://docs.astral.sh/uv/getting-started/installation/) for the `uvx` command.
2818

2919
```json
3020
{
@@ -37,275 +27,110 @@ Add to your MCP configuration:
3727
}
3828
```
3929

40-
## Usage
30+
## Tools
4131

42-
Your AI Assistant / LLM can now work with very large files that exceed its context limits. Here are some common workflows:
32+
| Tool | Use For |
33+
| ---------------- | ------------------------------------------------------ |
34+
| `get_overview` | File structure and semantic outline before diving in |
35+
| `search_content` | Finding patterns, counting occurrences, regex matching |
36+
| `read_content` | Reading specific sections; tail/head modes for logs |
37+
| `edit_content` | Safe search/replace with automatic backups |
38+
| `revert_edit` | Recovering from bad edits |
4339

44-
### Analyzing Large Code Files
40+
## When to Use Largefile
4541

46-
**AI Question:** _"Can you analyze this large Django models file and tell me about the class structure and any potential issues? It's a large file so use largefile."_
42+
**Use when:**
4743

48-
**AI Assistant workflow:**
49-
50-
1. Gets file overview to understand structure
51-
2. Searches for classes and their methods
52-
3. Looks for code issues like TODOs or long functions
53-
54-
```python
55-
# AI gets file structure
56-
overview = get_overview("/path/to/django-models.py")
57-
# Returns: 2,847 lines, 15 classes, semantic outline with Tree-sitter
44+
- File exceeds ~1000 lines or 100KB (supports multi-GB files)
45+
- Navigating large codebases with semantic structure
46+
- Analyzing log files (especially recent entries with tail mode)
47+
- Making search/replace edits across large files
48+
- Counting occurrences without loading full content
5849

59-
# AI searches for all class definitions
60-
classes = search_content("/path/to/django-models.py", "class ", max_results=20)
61-
# Returns: Model classes with line numbers and context
50+
**Don't use for:**
6251

63-
# AI examines specific class implementation
64-
model_code = read_content("/path/to/django-models.py", pattern="class User", mode="semantic")
65-
# Returns: Complete class definition with all methods
66-
```
52+
- Small files that fit in context (AI doesn't need help with those)
53+
- Binary files (images, executables, compressed)
6754

68-
### Working with Documentation
55+
## Usage Examples
6956

70-
**AI Question:** _"Find all the installation methods mentioned in this README file and update the pip install to use uv instead."_
57+
### Large Codebase Navigation
7158

72-
**AI Assistant workflow:**
59+
```pythonß
60+
# Get semantic structure of a large Python file
61+
overview = get_overview("/path/to/large_module.py")
62+
# Returns: 2,847 lines, 15 classes, function outline via Tree-sitter
7363
74-
1. Search for installation patterns
75-
2. Read the installation section
76-
3. Replace pip commands with uv equivalents
64+
# Find all class definitions
65+
classes = search_content("/path/to/large_module.py", "class ", fuzzy=False)
7766
78-
```python
79-
# AI finds installation instructions
80-
install_sections = search_content("/path/to/readme.md", "install", fuzzy=True, context_lines=3)
81-
82-
# AI reads the installation section
83-
install_content = read_content("/path/to/readme.md", pattern="## Installation", mode="semantic")
84-
85-
# AI replaces pip with uv
86-
edit_result = edit_content(
87-
"/path/to/readme.md",
88-
changes=[{"search": "pip install anthropic", "replace": "uv add anthropic"}],
89-
preview=True
90-
)
67+
# Read complete class with semantic chunking
68+
code = read_content("/path/to/large_module.py", pattern="class UserModel", mode="semantic")
9169
```
9270

93-
### Debugging Large Log Files
94-
95-
**AI Question:** _"Check this production log file for any critical errors in the last few thousand lines and show me the context around them. Use largefile mcp."_
96-
97-
**AI Assistant workflow:**
98-
99-
1. Get log file overview
100-
2. Read the last N lines efficiently with tail mode
101-
3. Search for error patterns in recent entries
71+
### Batch Refactoring
10272

10373
```python
104-
# AI gets log file overview
105-
overview = get_overview("/path/to/production.log")
106-
# Returns: 150,000 lines, 2.1GB file size
107-
108-
# AI reads the last 1000 lines efficiently (no need to know total line count)
109-
recent = read_content("/path/to/production.log", limit=1000, mode="tail")
110-
# Returns: Last 1000 lines without loading entire file
111-
112-
# AI counts errors efficiently
113-
error_count = search_content("/path/to/production.log", "ERROR", count_only=True, fuzzy=False)
114-
# Returns: {"count": 47, ...} without loading all content
74+
# Preview rename across file
75+
preview = edit_content("/path/to/api.py", changes=[
76+
{"search": "process_data", "replace": "transform_data"},
77+
{"search": "old_endpoint", "replace": "new_endpoint"}
78+
], preview=True)
11579

116-
# AI searches for critical errors with context
117-
errors = search_content("/path/to/production.log", "CRITICAL", fuzzy=False, max_results=10)
80+
# Apply changes (creates automatic backup)
81+
result = edit_content("/path/to/api.py", changes=[...], preview=False)
11882

119-
# AI examines context around each error
120-
for error in errors["results"]:
121-
context = read_content("/path/to/production.log", offset=error["line_number"], limit=20)
122-
# Shows surrounding log entries for debugging
83+
# Undo if needed
84+
revert_edit("/path/to/api.py")
12385
```
12486

125-
### Refactoring Code
126-
127-
**AI Question:** _"I need to rename the function `process_data` to `transform_data` throughout this large codebase file. Can you help me do this safely?"_
128-
129-
**AI Assistant workflow:**
130-
131-
1. Find all occurrences of the function
132-
2. Preview changes to ensure accuracy
133-
3. Apply changes with automatic backup
87+
### Log Analysis
13488

13589
```python
136-
# AI finds all usages
137-
usages = search_content("/path/to/codebase.py", "process_data", fuzzy=False, max_results=50)
138-
139-
# AI previews the changes
140-
preview = edit_content(
141-
"/path/to/codebase.py",
142-
changes=[{"search": "process_data", "replace": "transform_data"}],
143-
preview=True
144-
)
145-
146-
# AI applies changes after confirmation
147-
result = edit_content(
148-
"/path/to/codebase.py",
149-
changes=[{"search": "process_data", "replace": "transform_data"}],
150-
preview=False
151-
)
152-
# Creates automatic backup before changes
153-
```
90+
# Get log file overview
91+
overview = get_overview("/var/log/app.log")
92+
# Returns: 150,000 lines, 2.1GB
15493

155-
### Batch Editing Multiple Patterns
94+
# Read last 500 lines efficiently
95+
recent = read_content("/var/log/app.log", limit=500, mode="tail")
15696

157-
**AI Question:** _"Update all the deprecated API calls in this file - there are several different ones to change."_
97+
# Count errors without loading content
98+
error_count = search_content("/var/log/app.log", "ERROR", count_only=True, fuzzy=False)
15899

159-
**AI Assistant workflow:**
160-
161-
1. Identify all deprecated patterns
162-
2. Apply multiple changes atomically in one call
163-
164-
```python
165-
# AI applies multiple changes in a single atomic operation
166-
result = edit_content(
167-
"/path/to/api_client.py",
168-
changes=[
169-
{"search": "client.get_user(", "replace": "client.fetch_user("},
170-
{"search": "client.post_data(", "replace": "client.send_data("},
171-
{"search": "client.delete_item(", "replace": "client.remove_item("},
172-
],
173-
preview=True
174-
)
175-
# Returns per-change results with success/failure status
176-
# All changes applied atomically - partial success is reported
100+
# Find errors with regex
101+
errors = search_content("/var/log/app.log", r"ERROR.*timeout", regex=True)
177102
```
178103

179-
### Recovering from Bad Edits
180-
181-
**AI Question:** _"That last edit broke something. Can you undo it?"_
182-
183-
**AI Assistant workflow:**
184-
185-
1. List available backups
186-
2. Revert to previous state (current state is preserved as new backup)
187-
188-
```python
189-
# AI reverts to the most recent backup
190-
result = revert_edit("/path/to/broken_file.py")
191-
# Current state saved as backup, file restored to previous version
192-
193-
# Or revert to a specific backup by ID
194-
result = revert_edit("/path/to/broken_file.py", backup_id="20240115_143022")
195-
# Returns: available_backups list for reference
196-
```
197-
198-
### Advanced Search with Regex
199-
200-
**AI Question:** _"Find all IP addresses in this server log file."_
201-
202-
**AI Assistant workflow:**
203-
204-
```python
205-
# AI uses regex mode to find IP address patterns
206-
results = search_content(
207-
"/path/to/server.log",
208-
r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}",
209-
regex=True,
210-
fuzzy=False,
211-
max_results=50
212-
)
213-
214-
# AI finds non-INFO lines (invert mode like grep -v)
215-
non_info = search_content("/path/to/app.log", "INFO", invert=True, fuzzy=False)
216-
```
217-
218-
### Exploring API Documentation
219-
220-
**AI Question:** _"What are all the available methods in this large API documentation file and can you show me examples of authentication?"_
221-
222-
**AI Assistant workflow:**
223-
224-
1. Get document structure overview
225-
2. Search for method definitions and auth patterns
226-
3. Extract relevant code examples
227-
228-
```python
229-
# AI analyzes document structure
230-
overview = get_overview("/path/to/api-docs.md")
231-
# Returns: Section outline, headings, suggested search patterns
104+
## Supported Languages
232105

233-
# AI finds API methods
234-
methods = search_content("/path/to/api-docs.md", "###", max_results=30)
235-
# Returns: All method headings with context
106+
Tree-sitter semantic analysis for: **Python**, **JavaScript/JSX**, **TypeScript/TSX**, **Rust**, **Go**
236107

237-
# AI searches for authentication examples
238-
auth_examples = search_content("/path/to/api-docs.md", "auth", fuzzy=True, context_lines=5)
239-
240-
# AI reads complete authentication section
241-
auth_section = read_content("/path/to/api-docs.md", pattern="## Authentication", mode="semantic")
242-
```
108+
Other file types use text-based analysis with graceful fallback.
243109

244110
## File Size Handling
245111

246-
- **Small files (<50MB)**: Memory loading with Tree-sitter AST caching
247-
- **Medium files (50-500MB)**: Memory-mapped access
248-
- **Large files (>500MB)**: Streaming processing
249-
- **Long lines (>1000 chars)**: Automatic truncation for display
250-
251-
## Supported Languages
252-
253-
Tree-sitter semantic analysis for:
254-
255-
- Python (.py)
256-
- JavaScript/JSX (.js, .jsx)
257-
- TypeScript/TSX (.ts, .tsx)
258-
- Rust (.rs)
259-
- Go (.go)
260-
261-
Files without Tree-sitter support use text-based analysis with graceful degradation.
112+
| Size | Strategy |
113+
| -------- | --------------------------------------- |
114+
| < 50MB | Full memory loading with AST caching |
115+
| 50-500MB | Memory-mapped access |
116+
| > 500MB | Streaming (tail/head modes recommended) |
262117

263118
## Configuration
264119

265-
Configure via environment variables:
120+
Environment variables for tuning:
266121

267122
```bash
268-
# File processing thresholds
269-
LARGEFILE_MEMORY_THRESHOLD_MB=50 # Memory loading limit
270-
LARGEFILE_MMAP_THRESHOLD_MB=500 # Memory mapping limit
271-
272-
# Search settings
273-
LARGEFILE_FUZZY_THRESHOLD=0.8 # Fuzzy match sensitivity (0.0-1.0)
274-
LARGEFILE_MAX_SEARCH_RESULTS=20 # Result limit per search
275-
LARGEFILE_CONTEXT_LINES=3 # Context lines around matches
276-
277-
# Error recovery
278-
LARGEFILE_SIMILAR_MATCH_LIMIT=3 # Similar matches shown on edit failure
279-
LARGEFILE_SIMILAR_MATCH_THRESHOLD=0.6 # Min similarity for suggestions
280-
281-
# Backup management
282-
LARGEFILE_BACKUP_DIR="~/.largefile/backups" # Backup location
283-
LARGEFILE_MAX_BACKUPS=10 # Backups retained per file
284-
285-
# Batch editing
286-
LARGEFILE_MAX_BATCH_CHANGES=50 # Max changes per batch call
287-
288-
# Performance
289-
LARGEFILE_ENABLE_TREE_SITTER=true # Semantic features
123+
LARGEFILE_MEMORY_THRESHOLD_MB=50 # RAM loading limit
124+
LARGEFILE_MMAP_THRESHOLD_MB=500 # Memory mapping limit
125+
LARGEFILE_FUZZY_THRESHOLD=0.8 # Match sensitivity (0.0-1.0)
126+
LARGEFILE_MAX_SEARCH_RESULTS=20 # Results per search
127+
LARGEFILE_BACKUP_DIR=~/.largefile/backups
290128
```
291129

292-
## Key Features
293-
294-
- **Search/replace editing** - Eliminates LLM line number errors with fuzzy matching
295-
- **Batch operations** - Apply multiple changes atomically in one call
296-
- **Regex & invert search** - Powerful pattern matching with grep-like features
297-
- **Count-only mode** - Efficiently count matches without loading content
298-
- **Smart error recovery** - Failed edits show similar matches with suggestions
299-
- **Backup & revert** - Automatic backups with full revert capability
300-
- **Tail & head modes** - Read file endings/beginnings without full scan
301-
- **Binary detection** - Warns when files appear binary
302-
- **Semantic awareness** - Tree-sitter integration for code structure
303-
- **Memory efficient** - Handles files of any size via tiered access strategy
304-
305130
## Documentation
306131

307132
- [API Reference](docs/API.md) - Detailed tool documentation
308-
- [Configuration Guide](docs/configuration.md) - Environment variables and tuning
309-
- [Examples](docs/examples.md) - Real-world usage examples and workflows
310-
- [Design Document](docs/design.md) - Architecture and implementation details
311-
- [Contributing](docs/CONTRIBUTING.md) - Development setup and guidelines
133+
- [Configuration Guide](docs/configuration.md) - All environment variables
134+
- [Examples](docs/examples.md) - More workflow examples
135+
- [Design Document](docs/design.md) - Architecture details
136+
- [Contributing](docs/CONTRIBUTING.md) - Development setup

0 commit comments

Comments
 (0)