Potential Arbitrary File Write Vulnerability due to Insufficient LLM Output Sanitization

**Description:**

Upon reviewing the code related to file writing in `superagi/tools/code/write_code.py` and `superagi/resource_manager/file_manager.py`, a potential security vulnerability has been identified. The system allows Large Language Models (LLMs) to generate both filenames and file content, which are then written to the file system. While some basic sanitization is performed on the filename, it appears to be insufficient to prevent arbitrary file writes.

**Vulnerability Details:**

1.  **`superagi/tools/code/write_code.py` (<mcfile name="write_code.py" path="SuperAGI/superagi/tools/code/write_code.py"></mcfile>#L81-107):**
    The `WriteCodeTool` extracts filenames and code content from the LLM's output. The filename undergoes a basic sanitization step (`re.sub(r'[<>"|?*]', "", match.group(1))`) to remove certain special characters and a check for leading/trailing non-alphanumeric characters. However, this sanitization might not be comprehensive enough to prevent path traversal attacks (e.g., `../`) or other malicious filename constructions that could lead to writing files outside the intended directory.

    The `code` content (`match.group(2)`) is used directly without any apparent sanitization or validation before being written to the file.

2.  **`superagi/resource_manager/file_manager.py` (<mcfile name="file_manager.py" path="SuperAGI/superagi/resource_manager/file_manager.py"></mcfile>#L48-61):**
    The `write_file` method in `FileManager` constructs the `final_path` using `ResourceHelper.get_agent_write_resource_path` or `ResourceHelper.get_resource_path`. While `get_agent_write_resource_path` (<mcsymbol name="get_agent_write_resource_path" filename="resource_helper.py" path="SuperAGI/superagi/helper/resource_helper.py" startline="128" type="function"></mcsymbol>) handles agent-specific paths and directory creation, it ultimately concatenates the `root_dir` with the `file_name` provided by the `WriteCodeTool`.

    If the `file_name` generated by the LLM (even after its limited sanitization) contains path traversal sequences (e.g., `../../../../etc/passwd`), an attacker could potentially trick the system into writing arbitrary content to arbitrary locations on the server, leading to:
    *   Overwriting critical system files.
    *   Creating executable files in sensitive directories.
    *   Defacing the application.
    *   Achieving remote code execution (if combined with other vulnerabilities).

**Impact:**
Arbitrary file write can lead to severe consequences, including remote code execution, denial of service, and data corruption.

**Proposed Solution:**

1.  **Strict Path Validation:** Implement more robust path validation in `ResourceHelper.get_agent_write_resource_path` and `ResourceHelper.get_resource_path` to explicitly disallow path traversal sequences (`..`, absolute paths, etc.) and ensure that the generated `final_path` always resides within the intended, sandboxed resource directory.
2.  **Whitelist Filename Characters:** Instead of blacklisting characters, consider whitelisting allowed characters for filenames to further restrict malicious inputs.
3.  **Content Validation (if applicable):** Depending on the expected content, consider implementing content validation or sanitization, especially if the written files are later executed or served to users.

**Steps to Reproduce (Conceptual):**

1.  Craft an LLM prompt that encourages the model to generate a filename containing path traversal sequences (e.g., `../../../../tmp/malicious_script.sh`).
2.  Provide a malicious script as the file content.
3.  Observe if the file is written outside the intended resource directory.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential Arbitrary File Write Vulnerability due to Insufficient LLM Output Sanitization #1473

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential Arbitrary File Write Vulnerability due to Insufficient LLM Output Sanitization #1473

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions