fix(database-server): Split app and archived usage on unified servers#6685
fix(database-server): Split app and archived usage on unified servers#6685balamurali27 wants to merge 1 commit into
Conversation
On a unified server MariaDB and the application (benches, docker, archived benches) share one disk. The storage breakdown computed OS usage as `disk_used - total_db_usage`, so everything non-MariaDB — including the archived benches directory — was attributed to "Operating System". A server with 54 GB of archived benches showed 59 GB of OS usage. Derive the application footprint from the paired app server's ncdu storage breakdown (`get_storage_usage`) instead of a separate ansible call, and surface two new buckets: - app_usage: benches + docker - unused_files: archived benches Both are subtracted from os_usage so OS reflects just the OS. The values fall back to 0 on app servers running an agent that doesn't yet return `archived`, so the breakdown degrades gracefully. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Confidence Score: 3/5The Python change introduces a subtraction that can produce a negative Two real defects in the new code path: press/press/doctype/database_server/database_server.py — specifically
|
| Filename | Overview |
|---|---|
| press/press/doctype/database_server/database_server.py | Adds get_application_storage_usage helper and threads its result into the storage breakdown; os_usage can go negative if stale ncdu data exceeds df totals, and the broad contextlib.suppress(Exception) silently falls back to 0 on any agent failure with no logging. |
| dashboard/src/components/server/StorageBreakdownDialog.vue | Adds app_usage and unused_files as optional keys in databaseStorageBreakdown with correct falsy guards; the key-to-label mapping is updated accordingly. The rest of the diff is semicolon cleanup. |
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
press/press/doctype/database_server/database_server.py:2456
**`os_usage` can go negative on unified servers**
`application_usage` and `unused_files_usage` come from a live HTTP call to the agent's ncdu breakdown, while `disk_info[1]` comes from `df` run in the same ansible call. Because ncdu data can be stale, measured at a different time, or use a different accounting method than `df` (hardlinks, sparse files), the sum `total_db_usage + application_usage + unused_files_usage` can legitimately exceed `disk_info[1]`, producing a negative `os_usage`. Add a floor: `max(0, disk_info[1] - total_db_usage - application_usage - unused_files_usage)`.
### Issue 2 of 2
press/press/doctype/database_server/database_server.py:2477-2483
**Silent broad exception suppression hides agent failures**
`contextlib.suppress(Exception)` silently discards every failure from the agent HTTP call — network errors, unexpected response shapes, `frappe.throw` inside `Server.get_storage_usage`, etc. When the call fails the method returns `(0, 0)`, which is indistinguishable from a server with no benches, so `os_usage` will silently absorb the app footprint again without any indication. At minimum log the exception (`frappe.log_error`) before suppressing it so failures are observable.
Reviews (1): Last reviewed commit: "fix(database-server): Split app and arch..." | Re-trigger Greptile
| "database_usage": total_db_usage, | ||
| "binlog_indexes": binlog_indexes_size, | ||
| "os_usage": disk_info[1] - total_db_usage, | ||
| "os_usage": disk_info[1] - total_db_usage - application_usage - unused_files_usage, |
There was a problem hiding this comment.
os_usage can go negative on unified servers
application_usage and unused_files_usage come from a live HTTP call to the agent's ncdu breakdown, while disk_info[1] comes from df run in the same ansible call. Because ncdu data can be stale, measured at a different time, or use a different accounting method than df (hardlinks, sparse files), the sum total_db_usage + application_usage + unused_files_usage can legitimately exceed disk_info[1], producing a negative os_usage. Add a floor: max(0, disk_info[1] - total_db_usage - application_usage - unused_files_usage).
Prompt To Fix With AI
This is a comment left during a code review.
Path: press/press/doctype/database_server/database_server.py
Line: 2456
Comment:
**`os_usage` can go negative on unified servers**
`application_usage` and `unused_files_usage` come from a live HTTP call to the agent's ncdu breakdown, while `disk_info[1]` comes from `df` run in the same ansible call. Because ncdu data can be stale, measured at a different time, or use a different accounting method than `df` (hardlinks, sparse files), the sum `total_db_usage + application_usage + unused_files_usage` can legitimately exceed `disk_info[1]`, producing a negative `os_usage`. Add a floor: `max(0, disk_info[1] - total_db_usage - application_usage - unused_files_usage)`.
How can I resolve this? If you propose a fix, please make it concise.| with contextlib.suppress(Exception): | ||
| breakdown = frappe.get_doc("Server", app_server).get_storage_usage() | ||
| benches_bytes = (breakdown.get("benches") or {}).get("size", 0) | ||
| docker_bytes = (breakdown.get("docker") or {}).get("size", 0) | ||
| archived_bytes = (breakdown.get("archived") or {}).get("size", 0) | ||
| return int((benches_bytes + docker_bytes) / 1024), int(archived_bytes / 1024) | ||
| return 0, 0 |
There was a problem hiding this comment.
Silent broad exception suppression hides agent failures
contextlib.suppress(Exception) silently discards every failure from the agent HTTP call — network errors, unexpected response shapes, frappe.throw inside Server.get_storage_usage, etc. When the call fails the method returns (0, 0), which is indistinguishable from a server with no benches, so os_usage will silently absorb the app footprint again without any indication. At minimum log the exception (frappe.log_error) before suppressing it so failures are observable.
Prompt To Fix With AI
This is a comment left during a code review.
Path: press/press/doctype/database_server/database_server.py
Line: 2477-2483
Comment:
**Silent broad exception suppression hides agent failures**
`contextlib.suppress(Exception)` silently discards every failure from the agent HTTP call — network errors, unexpected response shapes, `frappe.throw` inside `Server.get_storage_usage`, etc. When the call fails the method returns `(0, 0)`, which is indistinguishable from a server with no benches, so `os_usage` will silently absorb the app footprint again without any indication. At minimum log the exception (`frappe.log_error`) before suppressing it so failures are observable.
How can I resolve this? If you propose a fix, please make it concise.
Codecov Report❌ Patch coverage is
❌ Your patch status has failed because the patch coverage (6.66%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #6685 +/- ##
===========================================
+ Coverage 50.17% 50.51% +0.34%
===========================================
Files 990 992 +2
Lines 83021 83520 +499
Branches 523 526 +3
===========================================
+ Hits 41659 42194 +535
+ Misses 41330 41294 -36
Partials 32 32
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Problem
On a unified server MariaDB and the application (benches, docker, archived benches) share one disk. The Database Server storage breakdown computed OS usage as a residual:
So everything non-MariaDB — including the archived benches directory (
/home/frappe/archived) — was attributed to "Operating System". A server with 54 GB of archived benches showed 59 GB of OS usage../archivedalone: 54 GBFix
Derive the application footprint from the paired app server's ncdu storage breakdown (
get_storage_usage) — a single call, no extra ansible — and surface two new buckets:app_usage: benches + dockerunused_files: archived benchesBoth are subtracted from
os_usage, so OS reflects just the OS. The newunused_filesfigure matches what "Cleanup Unused Files" actually reclaims (samedu --exclude assetsmeasurement).The values fall back to
0on app servers running an agent that doesn't yet returnarchived, so the breakdown degrades gracefully.Companion change
Requires frappe/agent#PENDING which adds the
archivedbucket to/server/storage-breakdown.🤖 Generated with Claude Code