-
Notifications
You must be signed in to change notification settings - Fork 140
perf(site): Avoid file-tree walk when only database usage is needed #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -82,25 +82,20 @@ def download_file(url, prefix): | |||||||||
|
|
||||||||||
|
|
||||||||||
| def get_size(folder, ignore_dirs=None): | ||||||||||
| """Returns the size of the folder in bytes. Ignores symlinks""" | ||||||||||
| total_size = os.path.getsize(folder) | ||||||||||
| """Returns the apparent size of the folder in bytes. | ||||||||||
|
|
||||||||||
| if ignore_dirs is None: | ||||||||||
| ignore_dirs = [] | ||||||||||
|
|
||||||||||
| for item in os.listdir(folder): | ||||||||||
| itempath = os.path.join(folder, item) | ||||||||||
|
|
||||||||||
| if item in ignore_dirs: | ||||||||||
| continue | ||||||||||
|
|
||||||||||
| if not os.path.islink(itempath): | ||||||||||
| if os.path.isfile(itempath): | ||||||||||
| total_size += os.path.getsize(itempath) | ||||||||||
| elif os.path.isdir(itempath): | ||||||||||
| total_size += get_size(itempath) | ||||||||||
| Shells out to `du` (C) instead of walking the tree in Python: a recursive | ||||||||||
| Python stat of every file was timing out gunicorn workers on sites with | ||||||||||
| large file trees. `du -b` reports apparent size (st_size), matching the | ||||||||||
| old behaviour. ignore_dirs is applied at the top level only, as before. | ||||||||||
| """ | ||||||||||
| command = ["du", "-sb"] | ||||||||||
| for ignored in ignore_dirs or []: | ||||||||||
| command += ["--exclude", ignored] | ||||||||||
| command.append(folder) | ||||||||||
|
|
||||||||||
| return total_size | ||||||||||
| output = subprocess.check_output(command, text=True) | ||||||||||
| return int(output.split(maxsplit=1)[0]) | ||||||||||
|
Comment on lines
+95
to
+96
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Prompt To Fix With AIThis is a comment left during a code review.
Path: agent/utils.py
Line: 97-98
Comment:
`subprocess.check_output` without a `timeout` blocks indefinitely if `du` stalls on an unresponsive NFS/FUSE mount, holding the gunicorn worker in the same way the old Python walk did. A reasonable upper bound (e.g. 120 s) would surface the failure quickly instead of silently pinning a worker.
```suggestion
output = subprocess.check_output(command, text=True, timeout=120)
return int(output.split(maxsplit=1)[0])
```
How can I resolve this? If you propose a fix, please make it concise. |
||||||||||
|
|
||||||||||
|
|
||||||||||
| def is_registry_healthy(url: str, username: str, password: str) -> bool: | ||||||||||
|
|
||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore_dirsis still applied "at the top level only, as before", butdu --exclude=PATTERNmatches the pattern against basenames at every depth in the tree — not just immediate children. In practice this doesn't matter for the current sole caller (ignore_dirs=["backups"], where "backups" only exists at the top of the private tree), but the comment documents wrong behaviour and could mislead future callers that pass names which happen to appear deeper in the tree.Prompt To Fix With AI