perf(site): Avoid file-tree walk when only database usage is needed#548
perf(site): Avoid file-tree walk when only database usage is needed#548balamurali27 wants to merge 3 commits into
Conversation
get_size recursively listed every entry and ran islink/isfile/isdir/ getsize on each — several stat syscalls per file. On sites with large file trees this is slow enough to time out gunicorn workers. Shelling out to `du -sb` does the same apparent-size sum in C with one stat per file. ignore_dirs is mapped to du's --exclude. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The /info endpoint always computes full disk usage, including the recursive file-tree walk for public/private/backups. Callers that only need the database size (Press's database-usage refresh) pay for the whole walk, which times out workers on large sites. Accept a database_only query param on the /info route and thread it to get_usage; when set, return only the (cheap) database size and skip the file walks entirely. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
|
Tick the box to add this pull request to the merge queue (same as
|
| Shells out to `du` (C) instead of walking the tree in Python: a recursive | ||
| Python stat of every file was timing out gunicorn workers on sites with | ||
| large file trees. `du -b` reports apparent size (st_size), matching the | ||
| old behaviour. ignore_dirs is applied at the top level only, as before. | ||
| """ |
There was a problem hiding this comment.
The docstring claims
ignore_dirs is still applied "at the top level only, as before", but du --exclude=PATTERN matches the pattern against basenames at every depth in the tree — not just immediate children. In practice this doesn't matter for the current sole caller (ignore_dirs=["backups"], where "backups" only exists at the top of the private tree), but the comment documents wrong behaviour and could mislead future callers that pass names which happen to appear deeper in the tree.
| Shells out to `du` (C) instead of walking the tree in Python: a recursive | |
| Python stat of every file was timing out gunicorn workers on sites with | |
| large file trees. `du -b` reports apparent size (st_size), matching the | |
| old behaviour. ignore_dirs is applied at the top level only, as before. | |
| """ | |
| Shells out to `du` (C) instead of walking the tree in Python: a recursive | |
| Python stat of every file was timing out gunicorn workers on sites with | |
| large file trees. `du -b` reports apparent size (st_size), matching the | |
| old behaviour. Note: `du --exclude` matches basenames at every depth in | |
| the tree, not just the top level (unlike the old Python walk). | |
| """ |
Prompt To Fix With AI
This is a comment left during a code review.
Path: agent/utils.py
Line: 87-91
Comment:
The docstring claims `ignore_dirs` is still applied "at the top level only, as before", but `du --exclude=PATTERN` matches the pattern against basenames **at every depth** in the tree — not just immediate children. In practice this doesn't matter for the current sole caller (`ignore_dirs=["backups"]`, where "backups" only exists at the top of the private tree), but the comment documents wrong behaviour and could mislead future callers that pass names which happen to appear deeper in the tree.
```suggestion
Shells out to `du` (C) instead of walking the tree in Python: a recursive
Python stat of every file was timing out gunicorn workers on sites with
large file trees. `du -b` reports apparent size (st_size), matching the
old behaviour. Note: `du --exclude` matches basenames at every depth in
the tree, not just the top level (unlike the old Python walk).
"""
```
How can I resolve this? If you propose a fix, please make it concise.| output = subprocess.check_output(command, text=True) | ||
| return int(output.split(maxsplit=1)[0]) |
There was a problem hiding this comment.
subprocess.check_output without a timeout blocks indefinitely if du stalls on an unresponsive NFS/FUSE mount, holding the gunicorn worker in the same way the old Python walk did. A reasonable upper bound (e.g. 120 s) would surface the failure quickly instead of silently pinning a worker.
| output = subprocess.check_output(command, text=True) | |
| return int(output.split(maxsplit=1)[0]) | |
| output = subprocess.check_output(command, text=True, timeout=120) | |
| return int(output.split(maxsplit=1)[0]) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: agent/utils.py
Line: 97-98
Comment:
`subprocess.check_output` without a `timeout` blocks indefinitely if `du` stalls on an unresponsive NFS/FUSE mount, holding the gunicorn worker in the same way the old Python walk did. A reasonable upper bound (e.g. 120 s) would surface the failure quickly instead of silently pinning a worker.
```suggestion
output = subprocess.check_output(command, text=True, timeout=120)
return int(output.split(maxsplit=1)[0])
```
How can I resolve this? If you propose a fix, please make it concise.Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Paired PR: frappe/press#6765
Problem
get_usage(served by the/infoendpoint) always computes full disk usage, including a recursive walk of the site'spublic,private, andbackupsdirectories to sum file sizes. On sites with large file trees this walk exceeds the gunicorn timeout and kills agent web workers (WORKER TIMEOUT).Press's "refresh database usage" polls
/infofrequently but only needs the database size — yet pays for the whole file walk each time.Changes
1.
du -sbinstead of a Python walk (get_size)The old
get_sizeranislink/isfile/isdir/getsizeper entry — several stat syscalls per file. Shelling todu -sbdoes the same apparent-size sum in C with one stat per file.ignore_dirsmaps todu --exclude.2.
database_onlyflag (/inforoute →fetch_site_info→get_usage)The
/inforoute accepts?database_only=1; in that modeget_usagereturns only the (cheap) database size and skips the file walks entirely.Relationship to prior work
Complements the DB-size optimizations in #399 / #447 / #364, which made the database portion cheap but left the
public/private/backupsfile walk in place. This skips that walk for callers that don't need it.Paired change
frappe/press#6765 sends
?database_only=1from the database-usage refresh path. The two can deploy in either order — an older Press simply omits the param and gets the full walk (current behaviour).🤖 Generated with Claude Code