Skip to content

Commit fe8d7e6

Browse files
authored
unskip evo2 tests (#1058)
### Description - This PR addresses issue #1013 - In NeMo NVIDIA-NeMo/NeMo#14515, the code was updated to reduce memory consumption - This PR updates the NeMo version to 7ccb0d4. - This PR adjusts the memory thresholds to skip tests in sub-packages/bionemo-evo2/tests/bionemo/evo2/test_evo2.py - This PR adds some tools for torch memory usage. <!-- Provide a detailed description of the changes in this PR --> ### Type of changes <!-- Mark the relevant option with an [x] --> - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Refactor - [ ] Documentation update - [ ] Other (please describe): ### CI Pipeline Configuration Configure CI behavior by applying the relevant labels: - [SKIP_CI](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#skip_ci) - Skip all continuous integration tests - [INCLUDE_NOTEBOOKS_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_notebooks_tests) - Execute notebook validation tests in pytest - [INCLUDE_SLOW_TESTS](https://github.com/NVIDIA/bionemo-framework/blob/main/docs/docs/user-guide/contributing/contributing.md#include_slow_tests) - Execute tests labelled as slow in pytest for extensive testing > [!NOTE] > By default, the notebooks validation tests are skipped unless explicitly enabled. #### Authorizing CI Runs We use [copy-pr-bot](https://docs.gha-runners.nvidia.com/apps/copy-pr-bot/#automation) to manage authorization of CI runs on NVIDIA's compute resources. - If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123) - If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an `/ok to test` comment on the pull request to trigger CI. This will need to be done for each new commit. ### Usage <!--- How does a user interact with the changed code --> ```python # TODO: Add code snippet ``` ### Pre-submit Checklist <!--- Ensure all items are completed before submitting --> - [x] I have tested these changes locally - [x] I have updated the documentation accordingly - [x] I have added/updated tests as needed - [ ] All existing tests pass successfully ### Local test runs the slow test **test_evo2.py::test_golden_values_top_k_logits_and_cosine_similarity_7b** is broken on **main**, will be marked skip - (1 ) test_evo2.py::test_golden_values_top_k_logits_and_cosine_similarity_7b is broken on this commit with **NVIDIA H100 80GB HBM3** - [pytests_pr1058_unskip_evo2_tests_sub-packages-bionemo-evo2-tests-bionemo-evo2-test_evo2_20250821T0024_6e2a005d.log](https://github.com/user-attachments/files/21922792/pytests_pr1058_unskip_evo2_tests_sub-packages-bionemo-evo2-tests-bionemo-evo2-test_evo2_20250821T0024_6e2a005d.log) - (2) The same test as in (1) is broken on the commit *424050d2* in main with *NVIDIA H100 80GB HBM3* [pytests_pr1058_unskip_evo2_tests_sub-packages-bionemo-evo2-tests-bionemo-evo2-test_evo2__test_golden_values_top_k_logits_and_cosine_similarity_7b_20250821T2114_main_424050d2.log](https://github.com/user-attachments/files/21926382/pytests_pr1058_unskip_evo2_tests_sub-packages-bionemo-evo2-tests-bionemo-evo2-test_evo2__test_golden_values_top_k_logits_and_cosine_similarity_7b_20250821T2114_main_424050d2.log) the slow test **test_evo.py::test_generate_speed** is marked skip per https://nvidia.slack.com/archives/C074Z808N05/p1755185565520729?thread_ts=1755097791.370249&cid=C074Z808N05 --------- Signed-off-by: Brian Roland <broland@nvidia.com>
1 parent 1f65287 commit fe8d7e6

File tree

5 files changed

+115
-34
lines changed

5 files changed

+115
-34
lines changed

3rdparty/NeMo

Submodule NeMo updated from f4f22a2 to 7ccb0d4

sub-packages/bionemo-evo2/tests/bionemo/evo2/conftest.py

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,22 +20,30 @@
2020
import pytest
2121
import torch
2222

23+
from bionemo.testing.torch import get_device_and_memory_allocated
24+
2325

2426
def pytest_sessionstart(session):
2527
"""Called at the start of the test session."""
2628
if torch.cuda.is_available():
2729
torch.cuda.reset_peak_memory_stats()
28-
print(f"Starting test session. Initial GPU memory: {torch.cuda.memory_allocated() / 1024**3:.3f} GB")
30+
print(
31+
f"""
32+
sub-packages/bionemo-evo2/tests/bionemoe/evo2: Starting test session
33+
{get_device_and_memory_allocated()}
34+
"""
35+
)
2936

3037

3138
def pytest_sessionfinish(session, exitstatus):
3239
"""Called at the end of the test session."""
3340
if torch.cuda.is_available():
34-
peak_memory = torch.cuda.max_memory_allocated()
35-
final_memory = torch.cuda.memory_allocated()
36-
print("\nTest session complete:")
37-
print(f" Peak GPU memory: {peak_memory / 1024**3:.3f} GB")
38-
print(f" Final GPU memory: {final_memory / 1024**3:.3f} GB")
41+
print(
42+
f"""
43+
sub-packages/bionemo-evo2/tests/bionemoe/evo2: Test session complete
44+
{get_device_and_memory_allocated()}
45+
"""
46+
)
3947

4048

4149
@pytest.fixture(autouse=True)

sub-packages/bionemo-evo2/tests/bionemo/evo2/test_evo2.py

Lines changed: 58 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,44 @@
4848
logger.setLevel(logging.DEBUG) # Capture all levels in the logger itself
4949

5050

51+
def determine_memory_requirement_and_skip_if_not_met(ckpt_name: str, flash_decode: bool | None = None) -> int:
52+
"""Determine the memory requirement for a given checkpoint and flash decode condition.
53+
ckpt_name : str
54+
the name of the checkpoint to test
55+
flash_decode: bool | None
56+
whether to test with flash decode
57+
Returns:
58+
The input sequence length cap, for the model sin the checkpoint, given certain memory requirements.
59+
If the memory requirement is not met, the test is skipped.
60+
"""
61+
62+
if "1b" in ckpt_name:
63+
model_size = "1b"
64+
seq_len_cap = 6000
65+
memory_needed_by_test = 17 # max reserved rounded up, for stand-alone test
66+
elif "7b" in ckpt_name:
67+
model_size = "7b"
68+
seq_len_cap = 4000
69+
memory_needed_by_test = 32 # max reserved rounded up, for stand-alone test
70+
else:
71+
raise ValueError(f"{ckpt_name=} is not supported for testing")
72+
73+
skip_condition_flash = flash_decode is None or flash_decode
74+
gb_available = torch.cuda.mem_get_info()[0] / 1024**3
75+
skip_condition = gb_available < memory_needed_by_test and skip_condition_flash
76+
77+
if skip_condition:
78+
pytest.skip(
79+
", ".join(
80+
[
81+
f"Inference API requires at least {memory_needed_by_test}GB of available memory for {model_size} models",
82+
f"{gb_available=}",
83+
]
84+
)
85+
)
86+
return seq_len_cap
87+
88+
5189
def load_weights_sharded_inplace_nemo2_to_mcore(
5290
model: MegatronModelType,
5391
distributed_checkpoint_dir: str | Path,
@@ -152,6 +190,7 @@ def test_golden_values_top_k_logits_and_cosine_similarity(seq_len: int):
152190
assert torch.mean(torch.abs(logit_similarity - torch.ones_like(logit_similarity))) < 0.03
153191

154192

193+
@pytest.mark.skip(reason="test fails on main, not due to #1058")
155194
@pytest.mark.slow
156195
def test_golden_values_top_k_logits_and_cosine_similarity_7b(seq_len: int = 8_192):
157196
try:
@@ -181,6 +220,7 @@ def test_golden_values_top_k_logits_and_cosine_similarity_7b(seq_len: int = 8_19
181220
outputs = model(input_ids=input_ids, position_ids=position_ids, attention_mask=attention_mask)
182221
gold_standard_no_fp8_tensor = torch.load(gold_standard_no_fp8).to(device=outputs.device, dtype=outputs.dtype)
183222
is_fp8_supported, compute_capability, device_info = check_fp8_support(device.index)
223+
184224
if is_fp8_supported and compute_capability == "9.0":
185225
# Most rigurous assertion for output equivalence currently works on devices that are new enough to
186226
# support FP8.
@@ -364,11 +404,8 @@ def check_matchrate(*, ckpt_name, matchrate, assert_matchrate=True):
364404
)
365405
def test_forward(sequences: list[str], ckpt_name: str, expected_matchpercents: list[float]):
366406
assert len(sequences) > 0
367-
gb_available = torch.cuda.mem_get_info()[0] / 1024**3
368-
if (gb_available < 38 and "1b" in ckpt_name) or (gb_available < 50 and "7b" in ckpt_name):
369-
pytest.skip(
370-
f"Inference API requires more than 38GB of memory for 1b models, or 50GB for 7b models. {gb_available=}"
371-
)
407+
seq_len_cap = determine_memory_requirement_and_skip_if_not_met(ckpt_name)
408+
372409
is_fp8_supported, compute_capability, device_info = check_fp8_support(torch.cuda.current_device())
373410
skip = "evo2/1b-8k:" in ckpt_name and not is_fp8_supported
374411
if skip:
@@ -380,7 +417,7 @@ def test_forward(sequences: list[str], ckpt_name: str, expected_matchpercents: l
380417
)
381418
matchrates = []
382419
for seq in sequences:
383-
seq = seq[:6000] # TODO: artificial limit, megatron uses more memory. Vortex can process full sequences
420+
seq = seq[:seq_len_cap] # TODO: artificial limit, megatron uses more memory. Vortex can process full sequences
384421
with torch.no_grad():
385422
device = torch.cuda.current_device()
386423
tokens = torch.tensor([mcore_tokenizer.tokenize(seq)], device=device)
@@ -426,13 +463,11 @@ def test_forward(sequences: list[str], ckpt_name: str, expected_matchpercents: l
426463
)
427464
def test_forward_manual(sequences: list[str], ckpt_name: str, expected_matchpercents: list[float], flash_decode: bool):
428465
assert len(sequences) > 0
466+
seq_len_cap = determine_memory_requirement_and_skip_if_not_met(ckpt_name, flash_decode)
467+
429468
is_fp8_supported, compute_capability, device_info = check_fp8_support(torch.cuda.current_device())
430469
skip = "evo2/1b-8k:" in ckpt_name and not is_fp8_supported
431-
gb_available = torch.cuda.mem_get_info()[0] / 1024**3
432-
if (gb_available < 38 and flash_decode) or (gb_available < 50 and flash_decode and "7b" in ckpt_name):
433-
pytest.skip(
434-
f"Inference API requires more than 38GB of memory for 1b models, or 50GB for 7b models. {gb_available=}"
435-
)
470+
436471
vortex_style_fp8 = is_fp8_supported and "bf16" not in ckpt_name
437472
if skip:
438473
# This checkpoint is sensitive to FP8, so we skip it if it is not supported on the current device.
@@ -479,7 +514,9 @@ def test_forward_manual(sequences: list[str], ckpt_name: str, expected_matchperc
479514
forward_kwargs = {}
480515
matchrates = []
481516
for seq in sequences:
482-
seq = seq[:6000] # TODO: artificial limit, megatron uses more memory. Vortex can process full sequences
517+
seq = seq[
518+
:seq_len_cap
519+
] # TODO: artificial limit, megatron uses more memory. Vortex can process full sequences
483520
with torch.no_grad():
484521
device = torch.cuda.current_device()
485522
# tokens = torch.tensor([tokenizer.tokenize(seq)], device=device)
@@ -542,12 +579,9 @@ def test_batch_generate(
542579
sequences: list[str], ckpt_name: str, model_tokenizer_provider: Callable, expected_matchpercents: list[float]
543580
):
544581
assert len(sequences) > 0
582+
determine_memory_requirement_and_skip_if_not_met(ckpt_name)
583+
545584
is_fp8_supported, compute_capability, device_info = check_fp8_support(torch.cuda.current_device())
546-
gb_available = torch.cuda.mem_get_info()[0] / 1024**3
547-
if (gb_available < 38 and "1b" in ckpt_name) or (gb_available < 50 and "7b" in ckpt_name):
548-
pytest.skip(
549-
f"Inference API requires more than 38GB of memory for 1b models, or 50GB for 7b models. {gb_available=}"
550-
)
551585
skip = "evo2/1b-8k:" in ckpt_name and not is_fp8_supported
552586
if skip:
553587
# This checkpoint is sensitive to FP8, so we skip it if it is not supported on the current device.
@@ -614,11 +648,8 @@ def test_batch_generate_coding_sequences(
614648
expected_matchpercents: list[float],
615649
):
616650
assert len(coding_sequences) > 0
617-
gb_available = torch.cuda.mem_get_info()[0] / 1024**3
618-
if (gb_available < 38 and "1b" in ckpt_name) or (gb_available < 50 and "7b" in ckpt_name):
619-
pytest.skip(
620-
f"Inference API requires more than 38GB of memory for 1b models, or 50GB for 7b models. {gb_available=}"
621-
)
651+
determine_memory_requirement_and_skip_if_not_met(ckpt_name)
652+
622653
is_fp8_supported, compute_capability, device_info = check_fp8_support(torch.cuda.current_device())
623654
skip = "evo2/1b-8k:" in ckpt_name and not is_fp8_supported
624655
if skip:
@@ -706,6 +737,9 @@ def test_batch_generate_coding_sequences(
706737
)
707738

708739

740+
@pytest.mark.skip(
741+
reason="skip the test for now, and decide what to do after getting Anton's changes sorted and merged."
742+
)
709743
@pytest.mark.slow
710744
@pytest.mark.parametrize(
711745
"ckpt_name,model_tokenizer_provider,expected_tokens_sec",
@@ -723,11 +757,8 @@ def test_generate_speed(
723757
expected_tokens_sec: float,
724758
):
725759
is_fp8_supported, compute_capability, device_info = check_fp8_support(torch.cuda.current_device())
726-
gb_available = torch.cuda.mem_get_info()[0] / 1024**3
727-
if (gb_available < 38 and "1b" in ckpt_name) or (gb_available < 50 and "7b" in ckpt_name):
728-
pytest.skip(
729-
f"Inference API requires more than 38GB of memory for 1b models, or 50GB for 7b models. {gb_available=}"
730-
)
760+
determine_memory_requirement_and_skip_if_not_met(ckpt_name)
761+
731762
skip = "evo2/1b-8k:" in ckpt_name and not is_fp8_supported
732763
if skip:
733764
# This checkpoint is sensitive to FP8, so we skip it if it is not supported on the current device.

sub-packages/bionemo-testing/src/bionemo/testing/torch.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,19 @@ def recursive_assert_approx_equal(x, y, atol=1e-4, rtol=1e-4):
6161
recursive_assert_approx_equal(x[key], y[key], atol=atol, rtol=rtol)
6262
else:
6363
assert x == y
64+
65+
66+
def get_device_and_memory_allocated() -> str:
67+
"""Get the current device index, name, and memory usage."""
68+
current_device_index = torch.cuda.current_device()
69+
props = torch.cuda.get_device_properties(current_device_index)
70+
message = f"""
71+
current device index: {current_device_index}
72+
current device uuid: {props.uuid}
73+
current device name: {props.name}
74+
memory, total on device: {torch.cuda.mem_get_info()[1] / 1024**3:.3f} GB
75+
memory, available on device: {torch.cuda.mem_get_info()[0] / 1024**3:.3f} GB
76+
memory allocated for tensors etc: {torch.cuda.memory_allocated() / 1024**3:.3f} GB
77+
max memory reserved for tensors etc: {torch.cuda.max_memory_allocated() / 1024**3:.3f} GB
78+
"""
79+
return message
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: LicenseRef-Apache2
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
17+
from bionemo.testing.torch import get_device_and_memory_allocated
18+
19+
20+
def test_get_device_and_memory_allocated():
21+
message = get_device_and_memory_allocated()
22+
assert message is not None
23+
assert "memory, total on device" in message
24+
assert "memory, available on device" in message
25+
assert "memory allocated for tensors etc" in message
26+
assert "max memory reserved for tensors etc" in message

0 commit comments

Comments
 (0)