Skip to content

fix(files): prevent path traversal in file serving endpoint#175

Open
sebastiondev wants to merge 1 commit into
AIDC-AI:mainfrom
sebastiondev:fix/cwe22-files-file-83c5
Open

fix(files): prevent path traversal in file serving endpoint#175
sebastiondev wants to merge 1 commit into
AIDC-AI:mainfrom
sebastiondev:fix/cwe22-files-file-83c5

Conversation

@sebastiondev
Copy link
Copy Markdown

Vulnerability Summary

CWE-22 (Path Traversal) in api/routers/files.py — the get_file endpoint (GET /api/v1/files/{file_path:path}) allows an unauthenticated attacker to read arbitrary files from the server's filesystem.

The endpoint accepts a user-controlled file_path parameter and constructs a filesystem path via:

abs_path = Path.cwd() / full_path  # full_path = f"output/{file_path}"

It then attempts to validate the path using abs_path.relative_to(Path.cwd()) and startswith("output"). However, because abs_path is never resolved (no .resolve() call), a path like output/../../etc/passwd passes both checks:

  1. relative_to(Path.cwd()) succeeds because the path is still relative to cwd in terms of string components — it returns output/../../etc/passwd
  2. startswith("output") passes because the relative path string literally starts with "output"

The path is then handed to FileResponse, which follows the ../ components and serves the target file.

Proof of Concept

With the API server running (default port 8000):

# Read /etc/passwd
curl http://localhost:8000/api/v1/files/output/../../etc/passwd

# Read application source code
curl http://localhost:8000/api/v1/files/output/../../api/routers/files.py

# Read environment variables (may contain secrets)
curl http://localhost:8000/api/v1/files/output/../../.env

No authentication is required — the /files router is mounted without any auth dependencies, and the get_file endpoint has no Depends() guards.

Fix Description

The fix resolves both the constructed path and the current working directory to their canonical absolute forms using .resolve(), which normalizes away all ../ sequences and follows symlinks. The directory containment check is then performed using Python's Path.is_relative_to() on the resolved paths, which cannot be bypassed with traversal sequences.

Specifically:

  • cwd = Path.cwd().resolve() — canonical base directory
  • abs_path = (cwd / full_path).resolve() — canonical requested path
  • abs_path.is_relative_to(allowed_dir) — true containment check on resolved paths

This replaces the flawed relative_to() + startswith() approach, which operated on unresolved path components.

Testing

We verified the fix by tracing the data flow from file_path through path construction and validation:

  • Before fix: Path.cwd() / "output/../../etc/passwd" produces an unresolved Path object. relative_to(Path.cwd()) returns PosixPath('output/../../etc/passwd'), and str(...).startswith("output") is True. The file is served.
  • After fix: (Path.cwd() / "output/../../etc/passwd").resolve() produces /etc/passwd. Path("/etc/passwd").is_relative_to(Path.cwd() / "output") is False. The request is correctly rejected with 403.

Legitimate requests like GET /api/v1/files/output/video.mp4 continue to work because the resolved path remains within the allowed directory.

Adversarial Review

Before submitting, we considered whether existing protections would prevent exploitation. The relative_to() + startswith() check appears to be a security gate but operates on unresolved path objects, so ../ sequences are preserved verbatim in the string representation. There is no authentication middleware on this router (confirmed by inspecting api/app.py line 133 and the router definition), no WAF or reverse proxy assumed in the default deployment, and FileResponse faithfully serves whatever path it receives. The vulnerability is exploitable with a single HTTP request against any default deployment.


Submitted by Sebastion — autonomous open-source security research from Foundation Machines. Free for public repos via the Sebastion AI GitHub App.

Use Path.resolve() to canonicalize the requested path and
Path.is_relative_to() for directory containment checks instead
of string prefix matching, which could be bypassed with crafted
directory names or ../ sequences.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sebastiondev
Copy link
Copy Markdown
Author

Thanks for flagging the CLA requirement. I've signed the Contributor License Agreement via the CLA assistant link. Please recheck the status when ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants