Skip to content

Commit 13b177a

Browse files
committed
feat(upload): store relative paths in uploaded_files.storage_path
- Convert storage_path from absolute to relative paths for portability - Add path conversion utilities (to_relative_path, to_absolute_path, find_file_by_path) - Update file upload, workspace, and websocket to store relative paths - Add Alembic migration for data transformation (upgrade: abs->rel, downgrade: rel->abs) - Add manual migration tool for multiple uploads_dir scenarios - Add Chinese and English README for manual migration tool
1 parent 20ae28c commit 13b177a

17 files changed

Lines changed: 1709 additions & 60 deletions

scripts/backfill_uploaded_files.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,11 @@
2929
from sqlalchemy import create_engine # noqa: E402
3030
from sqlalchemy.orm import sessionmaker # noqa: E402
3131

32+
from xagent.web.models.task import Task # noqa: E402
33+
from xagent.web.models.uploaded_file import UploadedFile # noqa: E402
34+
from xagent.web.models.user import User # noqa: E402
35+
from xagent.web.utils.file import to_relative_path # noqa: E402
36+
3237
# Setup logging
3338
logging.basicConfig(
3439
level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
@@ -79,10 +84,6 @@ def get_database_url():
7984

8085
def scan_user_directory(user_root: Path, db_session) -> dict:
8186
"""Scan a user's directory for unregistered files."""
82-
from xagent.web.models.task import Task
83-
from xagent.web.models.uploaded_file import UploadedFile
84-
from xagent.web.models.user import User
85-
8687
# Check if user exists
8788
try:
8889
user_id = int(user_root.name.replace("user_", "", 1))
@@ -94,6 +95,7 @@ def scan_user_directory(user_root: Path, db_session) -> dict:
9495
return {"error": f"User {user_id} not found in database", "created": 0}
9596

9697
# Get existing file paths in database
98+
# Note: existing_paths may contain both absolute (old) and relative (new) paths
9799
existing_paths = {
98100
row[0]
99101
for row in db_session.query(UploadedFile.storage_path)
@@ -119,7 +121,13 @@ def scan_user_directory(user_root: Path, db_session) -> dict:
119121
if "__pycache__" in file_path.parts or "node_modules" in file_path.parts:
120122
continue
121123

122-
storage_path = str(file_path)
124+
# Use relative path for storage and comparison
125+
try:
126+
storage_path = to_relative_path(file_path, user_id)
127+
except ValueError:
128+
# File is outside UPLOADS_DIR, skip
129+
continue
130+
123131
if storage_path in existing_paths:
124132
skipped += 1
125133
continue
@@ -255,8 +263,6 @@ def check_backfill_completion(db_session) -> bool:
255263
"""Check if backfill has been completed before."""
256264
# This could check a flag in the database or a marker file
257265
# For now, we'll just check if there are any files in the database
258-
from xagent.web.models.uploaded_file import UploadedFile
259-
260266
count = db_session.query(UploadedFile).count()
261267
return count > 0
262268

Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
# 上传文件路径迁移工具
2+
3+
## 背景
4+
5+
`uploaded_files` 表中的 `storage_path` 字段需要在不同格式之间转换:
6+
7+
- **绝对路径**`/uploads/user_1/web_task_123/output/file.txt`
8+
- **相对路径**`web_task_123/output/file.txt`(不含 `user_{user_id}` 前缀)
9+
10+
## 核心功能:处理多个 uploads_dir
11+
12+
**此工具主要用于处理 `XAGENT_UPLOADS_DIR` 配置被多次修改的情况。**
13+
14+
当 uploads 目录位置改变后,数据库中会存在多个不同前缀的路径:
15+
16+
```
17+
/old/location/uploads/user_1/task_1/file.txt ← 旧的 uploads_dir
18+
/new/location/uploads/user_1/task_2/file.txt ← 中间的 uploads_dir
19+
/current/uploads/user_1/task_3/file.txt ← 当前的 uploads_dir
20+
```
21+
22+
Alembic 自动迁移只能处理**当前配置的 `UPLOADS_DIR`**,其他路径需要手动指定源目录进行转换。
23+
24+
**使用 `-d` 参数指定旧的 uploads 目录路径**
25+
```bash
26+
python scripts/migrate_uploads_file_abs_path.py migrate -d /old/location/uploads --confirm
27+
```
28+
29+
**可以指定多个源目录**(按时间顺序从旧到新):
30+
```bash
31+
python scripts/migrate_uploads_file_abs_path.py migrate \
32+
-d /oldest/uploads \
33+
-d /middle/uploads \
34+
-d /old/uploads \
35+
--confirm
36+
```
37+
38+
**示例**:假设你的 uploads 目录经历了三次迁移:
39+
40+
```
41+
/var/www/xagent/uploads ← 2023年使用
42+
/home/user/xagent/uploads ← 2024年迁移到这
43+
/mnt/d/work/xagent/uploads ← 2025年迁移到这
44+
/current/xagent/uploads ← 当前配置
45+
```
46+
47+
数据库中可能存在这样的路径:
48+
49+
```
50+
/var/www/xagent/uploads/user_1/task_1/file.txt
51+
/home/user/xagent/uploads/user_1/task_2/file.txt
52+
/mnt/d/work/xagent/uploads/user_1/task_3/file.txt
53+
```
54+
55+
运行命令将它们统一转换为相对路径:
56+
57+
```bash
58+
python scripts/migrate_uploads_file_abs_path.py migrate \
59+
-d /var/www/xagent/uploads \
60+
-d /home/user/xagent/uploads \
61+
-d /mnt/d/work/xagent/uploads \
62+
--confirm
63+
```
64+
65+
转换后都变成:
66+
67+
```
68+
task_1/file.txt
69+
task_2/file.txt
70+
task_3/file.txt
71+
```
72+
73+
## 何时使用此工具
74+
75+
### 情况 1:Alembic 迁移失败
76+
77+
当运行 `alembic upgrade``alembic downgrade` 时,如果看到类似警告:
78+
79+
```
80+
Warning: Could not convert /some/path to relative path (outside UPLOADS_DIR): ...
81+
Some paths could not be converted automatically.
82+
Please use the manual migration tool to fix these paths
83+
```
84+
85+
说明部分文件路径不在当前的 `UPLOADS_DIR` 下,自动转换失败。此时需要使用此工具手动处理。
86+
87+
### 情况 2:uploads 目录多次迁移
88+
89+
如果 `XAGENT_UPLOADS_DIR` 配置被多次修改,数据库中可能存在多个不同前缀的路径:
90+
91+
```
92+
/old/uploads/user_1/task_1/file.txt
93+
/new/uploads/user_1/task_2/file.txt
94+
/current/uploads/user_1/task_3/file.txt
95+
```
96+
97+
自动迁移只能处理当前配置的目录,其他路径需要手动指定源目录进行转换。
98+
99+
### 情况 3:混合路径格式
100+
101+
数据库中同时存在绝对路径和相对路径,需要统一格式。
102+
103+
## 使用方法
104+
105+
### 1. 检查当前状态
106+
107+
```bash
108+
python scripts/migrate_uploads_file_abs_path.py check
109+
```
110+
111+
输出示例:
112+
113+
```
114+
Migration Status
115+
┌────────────────────────────────────────────┬───────┬────────────┐
116+
│ Category │ Count │ Percentage │
117+
├────────────────────────────────────────────┼───────┼────────────┤
118+
│ Total records │ 36 │ │
119+
│ Absolute paths (need migration) │ 30 │ 83.3% │
120+
│ Relative paths (already migrated) │ 6 │ 16.7% │
121+
└────────────────────────────────────────────┴───────┴────────────┘
122+
123+
⚠ 30 absolute paths need migration.
124+
```
125+
126+
### 2. 预览迁移(dry-run 模式)
127+
128+
**默认是 dry-run 模式,不会修改数据库**,只会显示将要进行的转换。
129+
130+
```bash
131+
# 指定旧的 uploads 目录(不带 --confirm 就是预览)
132+
python scripts/migrate_uploads_file_abs_path.py migrate -d /old/path/uploads
133+
134+
# 支持指定多个源目录
135+
python scripts/migrate_uploads_file_abs_path.py migrate -d /old1/uploads -d /old2/uploads
136+
```
137+
138+
输出示例:
139+
140+
```
141+
Migration Summary
142+
┌──────────────────────────────┬───────┬───────┐
143+
│ Category │ Count │ Status│
144+
├──────────────────────────────┼───────┼───────┤
145+
│ Total scanned │ 30 │ │
146+
│ To migrate │ 30 │ ✓ │
147+
└──────────────────────────────┴───────┴───────┘
148+
149+
Preview (showing 10/30 records that will be migrated):
150+
📄 c42346a0...
151+
Before: /old/uploads/user_1/web_task_25/output/chart.html
152+
After: web_task_25/output/chart.html
153+
154+
ℹ️ DRY RUN MODE - No changes made to database
155+
Use --confirm to actually migrate these records.
156+
```
157+
158+
### 3. 执行迁移
159+
160+
**添加 `--confirm` 参数才会真正修改数据库**
161+
162+
```bash
163+
# 确认后执行迁移
164+
python scripts/migrate_uploads_file_abs_path.py migrate -d /old/path/uploads --confirm
165+
```
166+
167+
输出示例:
168+
169+
```
170+
Migration Summary
171+
┌──────────────────────────────┬───────┬───────┐
172+
│ Category │ Count │ Status│
173+
├──────────────────────────────┼───────┼───────┤
174+
│ To migrate │ 30 │ ✓ │
175+
└──────────────────────────────┴───────┴───────┘
176+
177+
✓ Successfully migrated 30 records
178+
```
179+
180+
### 4. 高级选项
181+
182+
```bash
183+
# 显示所有记录(默认只显示前 10 条)
184+
python scripts/migrate_uploads_file_abs_path.py migrate -d /old/path/uploads -v
185+
186+
# 自定义批处理大小(默认 1000)
187+
python scripts/migrate_uploads_file_abs_path.py migrate -d /old/path/uploads -b 500
188+
```
189+
190+
## 常见问题
191+
192+
### Q: 如何找到旧的 uploads 目录?
193+
194+
A: 查看数据库中的路径:
195+
196+
```bash
197+
sqlite3 xagent.db "SELECT DISTINCT substr(storage_path, 1, 50) FROM uploaded_files LIMIT 10;"
198+
```
199+
200+
或者使用 check 命令查看路径样本。
201+
202+
### Q: 迁移后还需要运行 Alembic 吗?
203+
204+
A: 取决于你的目标状态:
205+
206+
- **目标是相对路径**:手动迁移后,运行 `alembic upgrade head` 更新版本号
207+
- **目标是绝对路径**:手动迁移后,运行 `alembic downgrade -1` 回退版本号
208+
209+
### Q: 迁移会失败吗?
210+
211+
A: 如果路径不在指定的 `--uploads-dir` 下,会被跳过并统计在 "Other absolute paths" 中。添加更多 `-d` 选项来覆盖所有路径。
212+
213+
### Q: 如何回滚?
214+
215+
A: 此脚本不会备份,建议先备份数据库:
216+
217+
```bash
218+
cp xagent.db xagent.db.backup
219+
```
220+
221+
## 技术说明
222+
223+
### 路径格式规则
224+
225+
- **相对路径**:不含 `user_{user_id}` 前缀,如 `web_task_123/output/file.txt`
226+
- **绝对路径**:完整路径,如 `/uploads/user_1/web_task_123/output/file.txt`
227+
228+
### 跨平台支持
229+
230+
脚本会自动检测 Windows 风格路径(`C:\...`)和 Unix 风格路径,使用对应的 Path 类进行处理。
231+
232+
### 性能优化
233+
234+
- 使用 `yield_per` 流式处理,避免内存溢出
235+
- 批量提交(默认每 1000 条),提高性能
236+
- 支持大型数据库(数百万条记录)

0 commit comments

Comments
 (0)