diff --git a/docs/features/file-editing.mdx b/docs/features/file-editing.mdx index 8a52d358..0284a6e5 100644 --- a/docs/features/file-editing.mdx +++ b/docs/features/file-editing.mdx @@ -5,7 +5,7 @@ description: "Precise, conflict-safe find-and-replace across files, scoped to a icon: "file-pen" --- -File editing tools provide secure, workspace-scoped file operations with **precise, conflict-safe** find-and-replace. Ambiguous matches fail loudly instead of editing the wrong occurrence, and a content-hash check prevents lost-update conflicts when files change between read and edit. +File editing tools provide secure, workspace-scoped file operations with **precise, conflict-safe** find-and-replace. Ambiguous matches fail loudly instead of editing the wrong occurrence, and a content-hash check prevents lost-update conflicts when files change between read and edit. A fuzzy matching ladder makes first-try edits succeed even when `old_string` drifts from the file by whitespace, indentation, or line endings. ```mermaid graph LR @@ -107,7 +107,7 @@ sequenceDiagram A->>E: edit_file(path, old, new, expected_hash=sha256) E->>F: re-read + hash F-->>E: current content - E->>E: ambiguity & hash checks + E->>E: fuzzy ladder + ambiguity & hash checks E->>F: write preserved LF/CRLF/BOM E-->>A: success + unified diff ``` @@ -118,8 +118,9 @@ sequenceDiagram | **read_file** *(edit_tools)* | Low | No | Read content + SHA-256 hash for staleness checks | | **read_file** *(file_tools)* | Low | No | Plain read returning a string | | **list_files** | Low | No | Directory listings | -| **edit_file** | High | Recommended | Precise find-and-replace with ambiguity & staleness guards | +| **edit_file** | High | Recommended | Precise find-and-replace with fuzzy ladder, ambiguity & staleness guards | | **write_file** | High | Recommended | Create/overwrite files | +| **apply_patch** | High | Recommended | Atomic multi-file Add/Update/Delete with rollback | Two modules export `read_file` with different signatures: @@ -132,13 +133,216 @@ Use **edit_tools** when passing `expected_hash` to `edit_file`. --- +## How Fuzzy Matching Works + +`edit_file` walks a deterministic ladder of matching strategies and stops at the **first** strategy that produces a confident match. Exact matches always win; fuzzy strategies only engage when an exact substring is not found. Existing code keeps behaving exactly as before — only previously-failing edits now succeed. + +```mermaid +graph TB + subgraph "Fuzzy Matching Ladder" + A[🤖 Agent old_string] --> S1[1. exact] + S1 -->|match| OK[✅ Apply] + S1 -->|miss| S2[2. line_trimmed] + S2 -->|match| OK + S2 -->|miss| S3[3. whitespace_normalised] + S3 -->|match| OK + S3 -->|miss| S4[4. indentation_flexible] + S4 -->|match| OK + S4 -->|miss| S5[5. block_anchor + similarity 0.7] + S5 -->|match| OK + S5 -->|miss| ERR[⚠️ String not found] + OK --> AMB{Single span?} + AMB -->|yes| W[💾 Write + diff] + AMB -->|no| AMBERR[⚠️ Ambiguous match] + end + + classDef agent fill:#8B0000,stroke:#7C90A0,color:#fff + classDef step fill:#189AB4,stroke:#7C90A0,color:#fff + classDef ok fill:#10B981,stroke:#7C90A0,color:#fff + classDef warn fill:#F59E0B,stroke:#7C90A0,color:#fff + + class A agent + class S1,S2,S3,S4,S5,OK,AMB step + class W ok + class ERR,AMBERR warn +``` + +| # | Strategy | Tolerates | Example divergence | +|---|----------|-----------|-------------------| +| 1 | `exact` | nothing | byte-for-byte match | +| 2 | `line_trimmed` | leading/trailing whitespace per line | ` return 1` vs `return 1` | +| 3 | `whitespace_normalised` | collapsed internal whitespace | `x = 1` vs `x = 1` | +| 4 | `indentation_flexible` | tabs vs spaces, depth | `\treturn 1` vs ` return 1` | +| 5 | `block_anchor` | structural drift (similarity ≥ 0.7) | first/last lines anchor a fuzzy block | + +**Confidence guards (block_anchor only):** +- Similarity threshold: `0.7` (constant `_BLOCK_ANCHOR_THRESHOLD` in source) +- Disproportionate-length guard: rejects blocks more than 2× or less than half the `old_string` line count +- Tie-breaking: two equally-scored candidates are treated as **ambiguous** (not silently picked) + +```python +from praisonaiagents.tools.edit_tools import edit_file + +# File uses tabs; old_string uses spaces — still succeeds +# because indentation_flexible normalises both +edit_file( + "src/utils.py", + old_string=" return value", + new_string=" return processed_value", +) +``` + + +**Why this matters for coding agents:** LLM-generated `old_string` values routinely drift by whitespace, indentation, or line endings. The fuzzy ladder makes the first attempt succeed when the target is unambiguous, saving retry turns and tokens. + + +--- + +## Multi-file Patches with `apply_patch` + +`apply_patch` lets an agent Add, Update, and Delete multiple files in a single atomic call — all changes succeed together or none are committed. + +```mermaid +graph TB + Q[Need to edit files?] + Q --> A1{One file?} + A1 -->|Yes, one change| E[edit_file] + A1 -->|Yes, multiple changes| EM[apply_patch with one Update] + A1 -->|Multiple files| AP[apply_patch] + AP --> R[All-or-nothing atomic] + + classDef q fill:#6366F1,stroke:#7C90A0,color:#fff + classDef tool fill:#10B981,stroke:#7C90A0,color:#fff + classDef note fill:#F59E0B,stroke:#7C90A0,color:#fff + + class Q,A1 q + class E,EM,AP tool + class R note +``` + + + + +```python +from praisonaiagents import Agent + +agent = Agent( + name="Refactor Agent", + instructions="Refactor across files atomically. Use apply_patch for multi-file changes.", + tools=["read_file", "search_files", "apply_patch", "edit_file"], +) + +agent.start("Rename UserService to AccountService across src/ and update its tests.") +``` + + + + + +```python +from praisonaiagents.tools.edit_tools import apply_patch + +patch = """*** Update File: src/service.py +@@ +class UserService: +=== +class AccountService: +*** Update File: tests/test_service.py +@@ +from src.service import UserService +=== +from src.service import AccountService +*** Delete File: docs/old_userservice.md +""" + +result = apply_patch(patch) +print(result) # "Success: Applied patch to 3 file(s) ... " +``` + + + + +### Patch Format Reference + +| Header | Body format | Purpose | +|--------|------------|---------| +| `*** Add File: ` | Full file content lines until next header | Create a new file (errors if path already exists) | +| `*** Update File: ` | One or more `@@` hunks (`\n===\n`) | Modify file using fuzzy ladder for each hunk | +| `*** Delete File: ` | (no body) | Remove file (errors if path missing) | + +Optional sentinels `*** Begin Patch` / `*** End Patch` are accepted and stripped. + +**Update hunk syntax:** + +``` +*** Update File: path/to/file +@@ + +=== + +@@ + +=== + +``` + +Each `@@` hunk runs through the same fuzzy ladder as `edit_file`, so whitespace/indentation drift in the old block is tolerated. + +### Atomicity Guarantees + +```mermaid +sequenceDiagram + participant A as 🤖 Agent + participant AP as ✏️ apply_patch + participant FS as 📁 Filesystem + + A->>AP: apply_patch(patch_text) + AP->>AP: Phase 1 — parse + validate + Note over AP: Compute new content, check existence,
resolve fuzzy hunks + AP->>FS: Phase 2 — commit (staged temp files) + alt All commits succeed + FS-->>AP: ✅ Each os.replace OK + AP-->>A: Success + combined diff + else Any commit fails + AP->>FS: Roll back (LIFO restore backups) + FS-->>AP: Restored + AP-->>A: Error: ... + end +``` + +| Behaviour | How it works | +|-----------|-------------| +| All-or-nothing | Phase 1 validates every operation and computes new content; Phase 2 commits with staged temp files and `os.replace` | +| Rollback on failure | If any commit step raises, applied operations are reversed in LIFO order via backup paths | +| BOM preservation | UTF-8 BOM detected on Update is reapplied on write | +| Line-ending preservation | CRLF files stay CRLF, LF files stay LF (matches `edit_file`) | +| UTF-16 rejection | Update on a UTF-16 file fails with a clear error | + +### `apply_patch` Error Messages + +| Trigger | Message | +|---------|---------| +| Empty / no operations | `Error: Patch contains no operations` | +| Malformed header / orphan body | `Error: Invalid patch: Unexpected line in patch (expected a section header): ...` | +| Add target already exists | `Error: Cannot add '': file already exists` | +| Delete target missing | `Error: Cannot delete '': file not found` | +| Update target missing | `Error: Cannot update '': file not found` | +| Empty hunk old-block | `Error: Empty hunk in update for ''` | +| Hunk not found | `Error: Hunk not found in '': ''` | +| Ambiguous hunk | `Error: Ambiguous hunk in '': '' matches N locations` | +| UTF-16 file | `Error: Cannot update '': UTF-16 encoding is not supported. Please convert the file to UTF-8.` | +| Success | `Success: Applied patch to N file(s)\n\n` | + +--- + ## Configuration Options ### File Editing Functions | Function | Args | Returns | Notes | |----------|------|---------|-------| -| `edit_file` | `filepath`, `old_string`, `new_string`, `replace_all=False`, `expected_hash=None` | `str` | High-risk; fails on ambiguous match unless `replace_all=True` | +| `edit_file` | `filepath`, `old_string`, `new_string`, `replace_all=False`, `expected_hash=None` | `str` | High-risk; fuzzy ladder + fails on ambiguous match unless `replace_all=True` | +| `apply_patch` | `patch: str` | `str` | High-risk; atomic multi-file Add/Update/Delete with rollback | | `read_file` *(edit_tools)* | `filepath` | `Tuple[str, str]` | `(content, sha256_hex)` for staleness checks | | `read_file` *(file_tools)* | `filepath`, `encoding='utf-8'` | `str` | Simple read, no hash | | `search_files` | `directory`, `pattern`, `file_pattern='*'` | JSON string | Case-insensitive substring search | @@ -176,6 +380,10 @@ edit_file("config.py", "DEBUG = False", "DEBUG = True", expected_hash=h) | Ambiguous match | `Error: Ambiguous match - '{preview}' occurs {N} times. Please provide more surrounding context to make the match unique, or use replace_all=True to replace all occurrences.` | | Success | `Success: Made {N} replacement(s) in {filepath}\n\nDiff:\n{diff}` | + +An "Ambiguous match" error can also fire when **fuzzy strategies** produce more than one candidate location (e.g. whitespace-normalised matches at two places). Fix by adding more surrounding context to `old_string`. + + If a file contains mixed line endings, any CRLF present causes the file to be normalised to CRLF on save. @@ -218,6 +426,26 @@ result = edit_file("config.py", "DEBUG = False", "DEBUG = True", expected_hash=h # Returns stale-file error if content changed — re-read and retry ``` +### Atomic Multi-file Rename + +```python +from praisonaiagents.tools.edit_tools import apply_patch + +result = apply_patch("""*** Begin Patch +*** Update File: src/auth.py +@@ +class UserService: +=== +class AccountService: +*** Update File: tests/test_auth.py +@@ +from src.auth import UserService +=== +from src.auth import AccountService +*** End Patch +""") +``` + --- ## Best Practices @@ -243,6 +471,14 @@ Include surrounding context so the match is unambiguous; use `replace_all=True` CRLF files stay CRLF, LF files stay LF, UTF-8 BOM is preserved. UTF-16 files are rejected with a clear error. + +Use `apply_patch` when changes span multiple files and must succeed/fail together (rename, refactor, dependency bump). Use `edit_file` when changing one file in one place — it returns a focused diff and avoids patch syntax overhead. + + + +The patch hunk format uses `@@` to separate hunks and `===` to separate old from new — this is not unified diff format. Add/Delete sections must not contain `@@`/`===` markers. + + File operations respect workspace boundaries. Paths outside the workspace are rejected to prevent directory traversal.