fix(upload): store relative paths in uploaded_files.storage_path by tanbro · Pull Request #235 · xorbitsai/xagent

tanbro · 2026-04-01T03:46:42Z

Store Relative Paths in `uploaded_files.storage_path`

Summary

Convert uploaded_files.storage_path from storing absolute paths to relative paths (without user_{user_id} prefix).

Before: /uploads/user_1/web_task_123/output/file.txt
After: web_task_123/output/file.txt

Motivation

Storing absolute paths in the database creates portability and configuration issues:

Issue	Impact
Non-portable	Cannot migrate database to different environments
Config changes break data	Changing `XAGENT_UPLOADS_DIR` invalidates existing records
Backup/restore complex	Restoring to new path requires updating all records
Cross-platform	Windows and Unix path formats incompatible

Changes

New Utilities

src/xagent/web/utils/file.py:

to_relative_path() - Convert absolute to relative for storage
to_absolute_path() - Convert relative to absolute for file access
find_file_by_path() - Query helper handling both formats

Model Enhancement

src/xagent/web/models/uploaded_file.py:

Added absolute_path property - transparently resolves both absolute (old) and relative (new) paths

@property
def absolute_path(self) -> Path:
    stored = Path(self.storage_path)
    if stored.is_absolute():
        return stored  # Old data: return as-is
    user_root = get_uploads_dir() / f"user_{self.user_id}"
    return (user_root / self.storage_path).resolve()  # New data: resolve

Updated Storage Locations

files.py: File upload stores relative paths
workspace.py: Agent workspace registration uses relative paths
websocket.py: Output file registration uses relative paths
kb_file_service.py: Knowledge base operations use relative paths

Backward Compatibility

The absolute_path property and find_file_by_path() handle both formats seamlessly. Existing code continues to work after migration.

Migration

Automatic (Recommended for most cases)

alembic upgrade head

Migration 3da89273f616 converts all absolute paths to relative paths using current UPLOADS_DIR.

Manual (For complex scenarios)

If XAGENT_UPLOADS_DIR has changed multiple times, use the manual tool:

python scripts/migrate_uploads_file_abs_path.py migrate -d /old/uploads --confirm

See scripts/migrate_uploads_file_abs_path.README.md for details.

Testing

tests/web/test_storage_path_relative.py - 6 tests covering:

Absolute ↔ relative conversion
absolute_path property for both formats
Backward compatibility

All passing (822/823 tests; 1 unrelated pre-existing failure).

Design Notes

Single-column design: Data transformation only, no schema changes. This allows full rollback via alembic downgrade.

Reversible: Upgrade converts abs→rel, downgrade converts rel→abs.

qinxuye · 2026-04-01T11:47:30Z

Should this be a alembic migration?

I think it is no need to make an ablembic migration for records have absolute path in uploads_file table into relative path.

Because the PR just works with absolute path records. Let them be for less risks.

This is ok, but the problem of script is that there is too much explanation cost to teach users how to use it, what's your idea?

OK, I'll optimize the migration script, apply batch and transaction on it, to make it possible for large uploaded_files table.

The script has been optimized to be production-ready:

Batch processing with transactions (default batch size: 1000)

Progress bars for visibility

Subcommands (check / migrate) for better UX

The previous implementation wasn't practical for real use for large uploads fiels table.

This is ok, but the problem of script is that there is too much explanation cost to teach users how to use it, what's your idea?

I prefer do NOT make a migration in our alembic scripts.

The migration tool scripts/migrate_uploads_file_abs_path.py provide by the PR is OPTIONAL.

Users may choose to use it - the tool is production-ready with streaming/batch support, and the UX is clear and easy.

Also, users may choose to keep their uploaded_files table as they are - because the PR's backward compatibility is solid.

My concern is if compatibility is fine, what's the meaning of this script, no one would use it other than you and me.

qinxuye · 2026-04-02T15:15:11Z

I think we can keep English version only.

- Convert storage_path from absolute to relative paths for portability - Add path conversion utilities (to_relative_path, to_absolute_path, find_file_by_path) - Update file upload, workspace, and websocket to store relative paths - Add Alembic migration for data transformation (upgrade: abs->rel, downgrade: rel->abs) - Add manual migration tool for multiple uploads_dir scenarios - Add Chinese and English README for manual migration tool

Also keep scripts markdown documentations by adding !scripts/**/*.md to .gitignore to ensure markdown files in scripts directory are not ignored. fix(scripts): translate Chinese README to English for migration tool Convert the Chinese documentation in scripts/migrate_uploads_file_abs_path.README.md to English to improve internationalization and maintain consistency with the project's English-based documentation standards.

- Fix kb.py import order per isort - Remove relative path migration tool script's non-English readme

PR xorbitsai#247 changed UPLOADS_DIR from a constant to get_uploads_dir() function. The new test test_relative_path_works_with_different_upload_dir was using the old UPLOADS_DIR attribute approach, causing AttributeError. This fix aligns with the unified configuration module introduced in PR xorbitsai#247.

rogercloud

Review: Store relative paths in uploaded_files.storage_path

The overall approach is solid and well-structured. Backward compatibility via the absolute_path property and dual-format find_file_by_path is a good design. However, I found several issues that need to be addressed before merging.

Critical (not inline — design issues)

1. storage_path UNIQUE constraint will break the migration in multi-user deployments

storage_path has a global UNIQUE constraint (Column(String(2048), nullable=False, unique=True)). After migration, two users with the same relative path will collide:

User 1: /uploads/user_1/web_task_1/output/file.txt → web_task_1/output/file.txt
User 2: /uploads/user_2/web_task_1/output/file.txt → web_task_1/output/file.txt

The migration aborts mid-batch, leaving the database in a partially-migrated state.

Fix: Either (a) drop the UNIQUE constraint, (b) change it to UniqueConstraint('user_id', 'storage_path'), or (c) include user_{user_id}/ in the relative path to preserve uniqueness.

2. Telegram and Feishu channel bots still store absolute paths

src/xagent/web/channels/telegram/bot.py:238 and src/xagent/web/channels/feishu/bot.py:476 still use storage_path=str(target_path). These files weren't updated by this PR but should be converted to use to_relative_path() for consistency.

Summary

Severity	Count	Key Issues
Critical	2	UNIQUE constraint collision; missed bot conversions
Major	4	Broken dedup; wrong `find_file_by_path` usage (x2); backfill UNIQUE risk; silent fallback
Minor	4	Migration logging; downgrade POSIX; docstring; separator consistency

rogercloud · 2026-04-16T09:40:52Z

 _auto_register = contextvars.ContextVar("_auto_register", default=False)


+def _to_relative_path(file_path: Path, user_id: int) -> str:


Major: Broken deduplication in list_all_user_files

After this change, DB records have storage_path as a relative string (e.g. "web_task_123/output/file.txt"), but the filesystem scan below (lines 855-856) compares against file_path which is an absolute string (e.g. "/uploads/user_1/web_task_123/output/file.txt").

is_already_listed = any( f.get("storage_path") == file_path for f in result_files )

Before this PR, both sides were absolute, so the comparison worked. Now it never matches, causing duplicate entries — once from DB records and once from the workspace filesystem scan.

Fix: Use str(file_record.absolute_path) for the comparison, or compare using absolute_path consistently.

is_already_listed = any( f.get("storage_path") == relative_storage_path for f in result_files )

Where relative_storage_path matches the format stored in storage_path for DB records.

rogercloud · 2026-04-16T09:40:53Z

 from ..utils.db_timezone import safe_timestamp_to_unix
+from ..utils.file import find_file_by_path, to_relative_path

 logger = logging.getLogger(__name__)


Major: find_file_by_path called with relative path — won't find old absolute records

find_file_by_path is designed to accept an absolute path (see its docstring at line 81: file_path: Absolute file path to search for). Internally it tries: (1) exact match, then (2) relative conversion if file_path.is_absolute().

But here relative_storage_path is already a relative string. Inside the function:

First query matches storage_path == "web_task_123/..." — OK for new records

Second branch: file_path.is_absolute() → False — skipped, so old absolute-path records are never found

During the transition period (before migration), the DB still has absolute paths. This code would fail to find existing records and attempt to create duplicates, potentially hitting the UNIQUE constraint.

Fix: Pass the absolute resolved_path directly — the function handles conversion internally:

file_record = find_file_by_path(db, resolved_path, task_user_id)

rogercloud · 2026-04-16T09:40:53Z

            normalized_relative_path
        )

-        file_record = (


Major: Same find_file_by_path issue as above

Same problem as _normalize_file_outputs — relative_storage_path is already relative, so find_file_by_path cannot find old-style absolute records.

Fix: Pass resolved_path (absolute) instead:

file_record = find_file_by_path(db, resolved_path, owner_user_id)

Then use relative_storage_path only for creating new records.

rogercloud · 2026-04-16T09:40:53Z

 from ..models.database import get_db
 from ..models.uploaded_file import UploadedFile
 from ..models.user import User
+from ..utils.file import find_file_by_path, to_relative_path


Major: Backfill dedup fails with mixed old/new records

existing_paths collects raw storage_path values from the DB. If some records still have absolute paths (pre-migration), the set contains strings like "/uploads/user_1/web_task_123/output/file.txt". The computed relative_storage_path is "web_task_123/output/file.txt" — the in check fails, and the backfill attempts to insert a duplicate, violating the UNIQUE constraint.

Fix: Normalize existing_paths to relative format:

existing_paths = set() for row in db.query(UploadedFile.storage_path) .filter(UploadedFile.user_id.in_(target_user_ids)) .all(): p = Path(row[0]) if p.is_absolute(): existing_paths.add(to_relative_path(p, target_user_ids[0])) # normalize else: existing_paths.add(row[0])

Or simpler: use find_file_by_path to check existence instead of a set lookup.

rogercloud · 2026-04-16T09:40:53Z

+    except ImportError:
+        # Fallback for non-web contexts (e.g., tests)
+        # Store as-is absolute path
+        return str(file_path)


Major: Silent fallback stores absolute paths

If the import fails, return str(file_path) stores an absolute path — the exact problem this PR is trying to solve. No warning is logged, making it hard to diagnose.

However, this fallback is likely dead code since other imports from ..web in this file (e.g. line 227: from ..web.models.uploaded_file import UploadedFile) would also fail if the web module were unavailable.

Suggestion: Either make the import mandatory (remove the try/except), or at minimum add logger.warning(...):

except ImportError: logger.warning("Could not import web.utils.file; storing absolute path") return str(file_path)

rogercloud · 2026-04-16T09:40:53Z

+                record.storage_path = relative_path.as_posix()  # pyright: ignore[reportAttributeAccessIssue]
+            except ValueError as e:
+                # Path outside uploads_dir - keep as absolute but log the issue
+                print(


Minor: Multi-line warning printed for every unconvertible record

This 7-line print() block fires for each record that can't be converted. If there are hundreds of such records, this floods stdout with thousands of lines.

Fix: Print the detailed message once, then just count subsequent failures:

unconvertible = 0 ... except ValueError as e: if unconvertible == 0: print(f"Warning: Some paths cannot be converted...") print("See scripts/migrate_uploads_file_abs_path.README.md") unconvertible += 1 ... if unconvertible > 0: print(f"{unconvertible} records could not be converted")

rogercloud · 2026-04-16T09:40:53Z

+            # Convert relative to absolute
+            user_root = get_uploads_dir() / f"user_{record.user_id}"
+            absolute_path = user_root / path_obj
+            record.storage_path = str(absolute_path)  # pyright: ignore[reportAttributeAccessIssue]


Minor: downgrade uses str() instead of .as_posix()

The upgrade path uses .as_posix() (line 74) to ensure forward-slash separators. But downgrade uses str(absolute_path), which on Windows produces backslash paths like C:\uploads\user_1\....

While the detection code in upgrade handles both formats, the asymmetry is inconsistent. Consider:

record.storage_path = Path(absolute_path).as_posix()

This ensures roundtrip consistency: upgrade → downgrade → upgrade produces identical results on all platforms.

rogercloud · 2026-04-16T09:40:53Z

+        Relative path string with POSIX separators (/)
+
+    Raises:
+        ValueError: If path is not within UPLOADS_DIR (caught by caller)


Minor: Docstring says Raises: ValueError but the function never raises it

The docstring documents ValueError: If path is not within UPLOADS_DIR, but the actual implementation catches ValueError at line 42 and returns absolute_path.as_posix() as a fallback. The function never raises ValueError.

Fix: Update the docstring to reflect actual behavior:

Returns: Relative path with POSIX separators. If the path is outside UPLOADS_DIR, returns the absolute path as POSIX.

And remove the Raises section.

rogercloud · 2026-04-16T09:40:53Z

Minor: os.sep vs "/" inconsistency

to_relative_path() always returns POSIX paths (uses .as_posix()), so rel_prefix never contains os.sep. The check endswith(os.sep) works by coincidence on Linux (os.sep = "/") but is semantically misleading — on Windows it checks for "\\" which never matches.

Fix: Just check for "/":

if not rel_prefix.endswith("/"): rel_prefix_str = rel_prefix + "/"

XprobeBot added the bug Something isn't working label Apr 1, 2026

tanbro marked this pull request as draft April 1, 2026 06:15

tanbro marked this pull request as ready for review April 1, 2026 06:36

qinxuye reviewed Apr 1, 2026

View reviewed changes

tanbro marked this pull request as draft April 2, 2026 01:33

tanbro marked this pull request as ready for review April 2, 2026 04:05

tanbro requested a review from qinxuye April 2, 2026 05:53

tanbro marked this pull request as draft April 2, 2026 13:28

tanbro marked this pull request as ready for review April 2, 2026 14:50

qinxuye reviewed Apr 2, 2026

View reviewed changes

Comment thread scripts/migrate_uploads_file_abs_path.README.md

Copy link
Copy Markdown

Contributor

qinxuye Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep English version only.

This was referenced Apr 3, 2026

fix(upload): store relative paths in uploaded_files.storage_path #234

Closed

Proposal: Centralized Configuration System for Xagent #259

Open

Proposal: Centralized Configuration System for Xagent #260

Closed

tanbro marked this pull request as draft April 8, 2026 06:09

tanbro force-pushed the fix/upload_relative_path branch from 5d2ba0d to 13b177a Compare April 9, 2026 01:21

tanbro marked this pull request as ready for review April 9, 2026 01:59

tanbro added 5 commits April 9, 2026 10:06

feat(upload): store relative paths in uploaded_files.storage_path

e2c5e13

- Fix kb.py import order per isort - Remove relative path migration tool script's non-English readme

remove(db): remove wrong migration scripts

8f843c3

add(db): add correct migration script

74a28e8

rogercloud requested changes Apr 16, 2026

View reviewed changes

		_auto_register = contextvars.ContextVar("_auto_register", default=False)


		def _to_relative_path(file_path: Path, user_id: int) -> str:

Conversation

tanbro commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Store Relative Paths in uploaded_files.storage_path

Summary

Motivation

Changes

New Utilities

Model Enhancement

Updated Storage Locations

Backward Compatibility

Migration

Automatic (Recommended for most cases)

Manual (For complex scenarios)

Testing

Design Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanbro Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tanbro Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogercloud left a comment

Choose a reason for hiding this comment

Review: Store relative paths in uploaded_files.storage_path

Critical (not inline — design issues)

Summary

Uh oh!

Choose a reason for hiding this comment

Major: Broken deduplication in list_all_user_files

Uh oh!

Choose a reason for hiding this comment

Major: find_file_by_path called with relative path — won't find old absolute records

Uh oh!

Choose a reason for hiding this comment

Major: Same find_file_by_path issue as above

Uh oh!

Choose a reason for hiding this comment

Major: Backfill dedup fails with mixed old/new records

Uh oh!

Choose a reason for hiding this comment

Major: Silent fallback stores absolute paths

Uh oh!

Choose a reason for hiding this comment

Minor: Multi-line warning printed for every unconvertible record

Uh oh!

Choose a reason for hiding this comment

Minor: downgrade uses str() instead of .as_posix()

Uh oh!

Choose a reason for hiding this comment

Minor: Docstring says Raises: ValueError but the function never raises it

Uh oh!

Choose a reason for hiding this comment

Minor: os.sep vs "/" inconsistency

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tanbro commented Apr 1, 2026 •

edited

Loading

Store Relative Paths in `uploaded_files.storage_path`

tanbro Apr 1, 2026 •

edited

Loading

tanbro Apr 2, 2026 •

edited

Loading

Major: Broken deduplication in `list_all_user_files`

Major: `find_file_by_path` called with relative path — won't find old absolute records

Major: Same `find_file_by_path` issue as above

Minor: `downgrade` uses `str()` instead of `.as_posix()`

Minor: Docstring says `Raises: ValueError` but the function never raises it

Minor: `os.sep` vs `"/"` inconsistency