Skip to content

DAOS-19033 test: test_enospace_no_aggregation never triggers DER_NOSPACE#18392

Open
knard38 wants to merge 4 commits into
masterfrom
ckochhof/fix/master/daos-19033/patch-001
Open

DAOS-19033 test: test_enospace_no_aggregation never triggers DER_NOSPACE#18392
knard38 wants to merge 4 commits into
masterfrom
ckochhof/fix/master/daos-19033/patch-001

Conversation

@knard38

@knard38 knard38 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Description

NvmeEnospace.test_enospace_no_aggregation fails on master with "This test suppose to fail because of DER_NOSPACE but it got Passed". Confirmed in weekly run master_weekly-062 (commit bfc468ba25).

The test disables aggregation and performs two sequential SCM fills: the first is expected to succeed, the second to fail with DER_NOSPACE. Both calls use percent=40:

self.start_ior_load(storage='SCM', operation="Auto_Write", percent=40)  # should succeed
self.start_ior_load(storage='SCM', operation="Auto_Write", percent=40,  # should fail
                    log_file=log_file)

calculate_ior_block_size() computes the block size from s_total (fixed pool capacity), not from current free space. Both IOR runs therefore request the same amount of data. With a 5G × 4-rank pool (20 GB SCM total), the space budget is:

% of pool
Metadata overhead ~12.6%
IOR 1 (40% of s_total) +43.4%
IOR 2 (40% of s_total) +43.4%
Total ≈ 99.3% — ~147 MB headroom remains

Because ~147 MB of SCM stays free, DER_NOSPACE is never returned and IOR 2 exits with code 0.

To fix this issue, this PR raise the second start_ior_load from percent=40 to percent=45. IOR 2 now requests 45% of 20 GB = ~9 GB when only ~8.85 GB is free, overflowing by ~183 MB and guaranteeing DER_NOSPACE:

metadata (12.6%) + IOR 1 (43.4%) + IOR 2 (45%) = 101% > 100% ✓

percent=45 is the minimum sufficient value — verified across all recent CI configurations (master_weekly-062, PR #18371, PR #18338).

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@knard38 knard38 self-assigned this Jun 1, 2026
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Ticket title is 'enospace.py: test_enospace_no_aggregation never triggers DER_NOSPACE'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-19033

@daosbuild3

Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18392/1/execution/node/742/log

@knard38 knard38 force-pushed the ckochhof/fix/master/daos-19025/patch-001 branch 2 times, most recently from 3ee04f8 to 9ed0e9f Compare June 5, 2026 14:52
@knard38 knard38 force-pushed the ckochhof/fix/master/daos-19033/patch-001 branch from c68778f to bcdf8bc Compare June 9, 2026 13:00
Base automatically changed from ckochhof/fix/master/daos-19025/patch-001 to master June 12, 2026 17:26
calculate_ior_block_size() computes the IOR block size from s_total
(fixed pool capacity), not from current free space. Both start_ior_load
calls therefore request 40% of the same total, and together with
metadata overhead they only consume ~99.3% of SCM, leaving ~147 MB
free. DER_NOSPACE is never triggered.

Fix: raise the second start_ior_load from percent=40 to percent=45 so
that the combined write volume (40% + 45% + ~12.6% metadata > 100%)
reliably exhausts SCM and returns DER_NOSPACE. At 45%, IOR 2 requests
~9 GB when only ~8.85 GB is free, overflowing by ~183 MB.

Quick-Functional: true
Test-tag: NvmeEnospace
Test-repeat: 5
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
@knard38 knard38 force-pushed the ckochhof/fix/master/daos-19033/patch-001 branch from bcdf8bc to 5fe170f Compare June 15, 2026 06:02
@knard38 knard38 marked this pull request as ready for review June 15, 2026 06:02
@knard38 knard38 requested review from a team as code owners June 15, 2026 06:02
daltonbohning
daltonbohning previously approved these changes Jun 15, 2026
Comment thread src/tests/ftest/nvme/enospace.py Outdated
Comment on lines +752 to +754
# Fill 10% more to SCM ,which should Fail because no SCM space
self.start_ior_load(
storage='SCM', operation="Auto_Write", percent=40, log_file=log_file)
storage='SCM', operation="Auto_Write", percent=45, log_file=log_file)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - the comment "Fill 10%" above is incorrect. Though it was even before your change

@knard38 knard38 Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Update invalid comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with commit 9a7e5a6

kanard38 added 2 commits June 16, 2026 14:07
Fix reviewers comments:
- Update invalid comment

Quick-Functional: true
Test-tag: NvmeEnospace
Test-repeat: 5
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@hpe.com>
@daltonbohning daltonbohning requested a review from phender June 16, 2026 15:14

@phender phender left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test fix looks good.

For testing, the commit pragmas need updating:

  • Quick-Functional: true cannot not be used for landing functional test changes as it skips the python bandit check; Skip-unit-tests: true and Skip-fault-injection-test: true are preferred and supported for functional test landing
  • Given this specific test change we should also verify it still passes w/ PMEM. The Skip-func-hw-test-medium-vmd: false commit pragma should be used.

…/daos-19033/patch-001

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium-vmd: false
Test-tag: NvmeEnospace
Test-repeat: 5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants