Skip to content

fix(database-server): Split app and archived usage on unified servers#6685

Open
balamurali27 wants to merge 1 commit into
developfrom
unified-storage-breakdown
Open

fix(database-server): Split app and archived usage on unified servers#6685
balamurali27 wants to merge 1 commit into
developfrom
unified-storage-breakdown

Conversation

@balamurali27

Copy link
Copy Markdown
Contributor

Problem

On a unified server MariaDB and the application (benches, docker, archived benches) share one disk. The Database Server storage breakdown computed OS usage as a residual:

"os_usage": disk_used - total_db_usage

So everything non-MariaDB — including the archived benches directory (/home/frappe/archived) — was attributed to "Operating System". A server with 54 GB of archived benches showed 59 GB of OS usage.

Reported Actual
Operating System: 59 GB ./archived alone: 54 GB

Fix

Derive the application footprint from the paired app server's ncdu storage breakdown (get_storage_usage) — a single call, no extra ansible — and surface two new buckets:

  • app_usage: benches + docker
  • unused_files: archived benches

Both are subtracted from os_usage, so OS reflects just the OS. The new unused_files figure matches what "Cleanup Unused Files" actually reclaims (same du --exclude assets measurement).

The values fall back to 0 on app servers running an agent that doesn't yet return archived, so the breakdown degrades gracefully.

Companion change

Requires frappe/agent#PENDING which adds the archived bucket to /server/storage-breakdown.

🤖 Generated with Claude Code

On a unified server MariaDB and the application (benches, docker, archived
benches) share one disk. The storage breakdown computed OS usage as
`disk_used - total_db_usage`, so everything non-MariaDB — including the
archived benches directory — was attributed to "Operating System". A server
with 54 GB of archived benches showed 59 GB of OS usage.

Derive the application footprint from the paired app server's ncdu storage
breakdown (`get_storage_usage`) instead of a separate ansible call, and
surface two new buckets:

- app_usage: benches + docker
- unused_files: archived benches

Both are subtracted from os_usage so OS reflects just the OS. The values fall
back to 0 on app servers running an agent that doesn't yet return `archived`,
so the breakdown degrades gracefully.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Confidence Score: 3/5

The Python change introduces a subtraction that can produce a negative os_usage value when the agent's ncdu data is stale or measured differently than df, and it silently falls back to 0 on any agent failure with no logging — both of which can make the breakdown misleading without any visible error.

Two real defects in the new code path: os_usage has no floor so it can go negative when live df and stale ncdu data diverge, and contextlib.suppress(Exception) discards all agent failures silently, making the fallback to 0 indistinguishable from a healthy response. The Vue side is clean.

press/press/doctype/database_server/database_server.py — specifically get_application_storage_usage and the os_usage formula in get_storage_usage

Important Files Changed

Filename Overview
press/press/doctype/database_server/database_server.py Adds get_application_storage_usage helper and threads its result into the storage breakdown; os_usage can go negative if stale ncdu data exceeds df totals, and the broad contextlib.suppress(Exception) silently falls back to 0 on any agent failure with no logging.
dashboard/src/components/server/StorageBreakdownDialog.vue Adds app_usage and unused_files as optional keys in databaseStorageBreakdown with correct falsy guards; the key-to-label mapping is updated accordingly. The rest of the diff is semicolon cleanup.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
press/press/doctype/database_server/database_server.py:2456
**`os_usage` can go negative on unified servers**

`application_usage` and `unused_files_usage` come from a live HTTP call to the agent's ncdu breakdown, while `disk_info[1]` comes from `df` run in the same ansible call. Because ncdu data can be stale, measured at a different time, or use a different accounting method than `df` (hardlinks, sparse files), the sum `total_db_usage + application_usage + unused_files_usage` can legitimately exceed `disk_info[1]`, producing a negative `os_usage`. Add a floor: `max(0, disk_info[1] - total_db_usage - application_usage - unused_files_usage)`.

### Issue 2 of 2
press/press/doctype/database_server/database_server.py:2477-2483
**Silent broad exception suppression hides agent failures**

`contextlib.suppress(Exception)` silently discards every failure from the agent HTTP call — network errors, unexpected response shapes, `frappe.throw` inside `Server.get_storage_usage`, etc. When the call fails the method returns `(0, 0)`, which is indistinguishable from a server with no benches, so `os_usage` will silently absorb the app footprint again without any indication. At minimum log the exception (`frappe.log_error`) before suppressing it so failures are observable.

Reviews (1): Last reviewed commit: "fix(database-server): Split app and arch..." | Re-trigger Greptile

"database_usage": total_db_usage,
"binlog_indexes": binlog_indexes_size,
"os_usage": disk_info[1] - total_db_usage,
"os_usage": disk_info[1] - total_db_usage - application_usage - unused_files_usage,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 os_usage can go negative on unified servers

application_usage and unused_files_usage come from a live HTTP call to the agent's ncdu breakdown, while disk_info[1] comes from df run in the same ansible call. Because ncdu data can be stale, measured at a different time, or use a different accounting method than df (hardlinks, sparse files), the sum total_db_usage + application_usage + unused_files_usage can legitimately exceed disk_info[1], producing a negative os_usage. Add a floor: max(0, disk_info[1] - total_db_usage - application_usage - unused_files_usage).

Prompt To Fix With AI
This is a comment left during a code review.
Path: press/press/doctype/database_server/database_server.py
Line: 2456

Comment:
**`os_usage` can go negative on unified servers**

`application_usage` and `unused_files_usage` come from a live HTTP call to the agent's ncdu breakdown, while `disk_info[1]` comes from `df` run in the same ansible call. Because ncdu data can be stale, measured at a different time, or use a different accounting method than `df` (hardlinks, sparse files), the sum `total_db_usage + application_usage + unused_files_usage` can legitimately exceed `disk_info[1]`, producing a negative `os_usage`. Add a floor: `max(0, disk_info[1] - total_db_usage - application_usage - unused_files_usage)`.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +2477 to +2483
with contextlib.suppress(Exception):
breakdown = frappe.get_doc("Server", app_server).get_storage_usage()
benches_bytes = (breakdown.get("benches") or {}).get("size", 0)
docker_bytes = (breakdown.get("docker") or {}).get("size", 0)
archived_bytes = (breakdown.get("archived") or {}).get("size", 0)
return int((benches_bytes + docker_bytes) / 1024), int(archived_bytes / 1024)
return 0, 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent broad exception suppression hides agent failures

contextlib.suppress(Exception) silently discards every failure from the agent HTTP call — network errors, unexpected response shapes, frappe.throw inside Server.get_storage_usage, etc. When the call fails the method returns (0, 0), which is indistinguishable from a server with no benches, so os_usage will silently absorb the app footprint again without any indication. At minimum log the exception (frappe.log_error) before suppressing it so failures are observable.

Prompt To Fix With AI
This is a comment left during a code review.
Path: press/press/doctype/database_server/database_server.py
Line: 2477-2483

Comment:
**Silent broad exception suppression hides agent failures**

`contextlib.suppress(Exception)` silently discards every failure from the agent HTTP call — network errors, unexpected response shapes, `frappe.throw` inside `Server.get_storage_usage`, etc. When the call fails the method returns `(0, 0)`, which is indistinguishable from a server with no benches, so `os_usage` will silently absorb the app footprint again without any indication. At minimum log the exception (`frappe.log_error`) before suppressing it so failures are observable.

How can I resolve this? If you propose a fix, please make it concise.

@codecov-commenter

codecov-commenter commented Jun 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 6.66667% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.51%. Comparing base (5c0a5f4) to head (2fc6ad7).
⚠️ Report is 65 commits behind head on develop.

Files with missing lines Patch % Lines
...s/press/doctype/database_server/database_server.py 6.66% 14 Missing ⚠️

❌ Your patch status has failed because the patch coverage (6.66%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #6685      +/-   ##
===========================================
+ Coverage    50.17%   50.51%   +0.34%     
===========================================
  Files          990      992       +2     
  Lines        83021    83520     +499     
  Branches       523      526       +3     
===========================================
+ Hits         41659    42194     +535     
+ Misses       41330    41294      -36     
  Partials        32       32              
Flag Coverage Δ
dashboard 62.90% <ø> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants