[BIONEMO-2473] Added tests for Evo2 LoRA fine-tuning#1060
Conversation
01125ba to
9750267
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1060 +/- ##
==========================================
- Coverage 79.93% 79.92% -0.02%
==========================================
Files 160 160
Lines 11859 11858 -1
==========================================
- Hits 9480 9478 -2
- Misses 2379 2380 +1
|
9750267 to
f5cbdd6
Compare
|
/ok to test f5cbdd6 |
WalkthroughUpdated training entrypoint to pass synthetic dataset sizes to MockDataModule, changed Evo2LoRA import path and CLI arg type for LoRA checkpoints, integrate a constructed Evo2LoRA as HyenaModel's model_transform and callback during LoRA finetune, added shared test helpers, refactored tests to use them, and added an end-to-end finetune integration test. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI (train_evo2)
participant Train as train.py
participant Data as MockDataModule
participant Model as HyenaModel
participant PEFT as Evo2LoRA
participant CB as Callbacks
CLI->>Train: Parse args (incl. sample counts, --lora-finetune, --lora-checkpoint-path)
Train->>Data: Init(num_train_samples, num_val_samples, num_test_samples)
alt lora_finetune enabled
Train->>PEFT: Create Evo2LoRA(peft_ckpt_path=args.lora_checkpoint_path)
Train->>Model: Init(model_transform=lora_transform)
Train->>CB: Append lora_transform to callbacks
else
Train->>Model: Init(model_transform=None)
end
Train->>Model: Fit(data=Data, callbacks=CB)
Model-->>CLI: Emit logs, checkpoints, TFEvents
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/ok to test e53f754 |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (6)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py (2)
24-27: Remove duplicated --limit-val-batches flagEach helper appends --limit-val-batches twice. Keep a single occurrence to avoid confusion about which value “wins.”
- "--model-size 1b_nv --num-layers 4 --hybrid-override-pattern SDH* --limit-val-batches 1 " + "--model-size 1b_nv --num-layers 4 --hybrid-override-pattern SDH* " - f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 " + f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 "- "--model-size 1b_nv --num-layers 4 --hybrid-override-pattern SDH* --limit-val-batches 1 " + "--model-size 1b_nv --num-layers 4 --hybrid-override-pattern SDH* " - f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 " + f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 "Also applies to: 44-47
20-29: Quote shell arguments to be path-safePaths (result dirs, ckpt dirs) should be shell-quoted to survive spaces/special chars.
+import shlex @@ - f"train_evo2 --mock-data --result-dir {path} --devices {devices} " + f"train_evo2 --mock-data --result-dir {shlex.quote(str(path))} --devices {devices} " @@ - f"--seq-length 16 --hidden-dropout 0.1 --attention-dropout 0.1 {additional_args}" + f"--seq-length 16 --hidden-dropout 0.1 --attention-dropout 0.1 {additional_args}" @@ - f"train_evo2 --mock-data --result-dir {path} --devices {devices} " + f"train_evo2 --mock-data --result-dir {shlex.quote(str(path))} --devices {devices} " @@ - f"--seq-length 16 --hidden-dropout 0.1 --attention-dropout 0.1 {additional_args} --ckpt-dir {prev_ckpt} " + f"--seq-length 16 --hidden-dropout 0.1 --attention-dropout 0.1 {additional_args} --ckpt-dir {shlex.quote(str(prev_ckpt))} "Also applies to: 41-50
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (3)
494-496: CLI type change to str is fine; add early path validationIf --lora-finetune is set and a checkpoint path is provided, fail fast when the path doesn’t exist.
parser.add_argument("--lora-checkpoint-path", type=str, default=None, help="LoRA checkpoint path")Add right after args are parsed (in train or just after parse_args returns):
@@ def train(args: argparse.Namespace) -> nl.Trainer: + if args.lora_finetune and args.lora_checkpoint_path: + if not Path(args.lora_checkpoint_path).exists(): + raise FileNotFoundError(f"LoRA checkpoint path not found: {args.lora_checkpoint_path}")
657-659: Guard callback type and NoneAppending lora_transform is correct when set; consider asserting it’s not None to catch misconfigurations early.
- if args.lora_finetune: - callbacks.append(lora_transform) + if args.lora_finetune: + assert lora_transform is not None, "Evo2LoRA should be initialized when --lora-finetune is set." + callbacks.append(lora_transform)
665-672: Pass actual model type to FLOPs callbackHardcoding "hyena" could skew FLOPs reporting for Mamba.
- flop_meas_callback = FLOPsMeasurementCallback( - model_config, - data_module, - "hyena", - ) + flop_meas_callback = FLOPsMeasurementCallback(model_config, data_module, model_type)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py (1)
52-61: Deduplicate --limit-val-batches and quote paths in mamba helpersMirror the fixes from common.py to avoid flag duplication and path issues.
- f"train_evo2 --mock-data --result-dir {path} --devices {devices} " - "--model-size hybrid_mamba_8b --num-layers 2 --hybrid-override-pattern M- --limit-val-batches 1 " + f"train_evo2 --mock-data --result-dir {shlex.quote(str(path))} --devices {devices} " + "--model-size hybrid_mamba_8b --num-layers 2 --hybrid-override-pattern M- " @@ - f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 " + f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 "- f"train_evo2 --mock-data --result-dir {path} --devices {devices} " - "--model-size hybrid_mamba_8b --num-layers 2 --hybrid-override-pattern M- --limit-val-batches 1 " + f"train_evo2 --mock-data --result-dir {shlex.quote(str(path))} --devices {devices} " + "--model-size hybrid_mamba_8b --num-layers 2 --hybrid-override-pattern M- " @@ - f"--seq-length 16 --hidden-dropout 0.1 --attention-dropout 0.1 {additional_args} --ckpt-dir {prev_ckpt}" + f"--seq-length 16 --hidden-dropout 0.1 --attention-dropout 0.1 {additional_args} --ckpt-dir {shlex.quote(str(prev_ckpt))}"Don’t forget to
import shlexat top if applying.Also applies to: 63-73
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/peft.py(0 hunks)sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py(2 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/__init__.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_lora.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py(2 hunks)
💤 Files with no reviewable changes (1)
- sub-packages/bionemo-evo2/src/bionemo/evo2/run/peft.py
🧰 Additional context used
🧬 Code graph analysis (2)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_lora.py (2)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py (2)
small_training_cmd(20-29)small_training_finetune_cmd(32-50)sub-packages/bionemo-testing/src/bionemo/testing/subprocess_utils.py (1)
run_command_in_subprocess(108-129)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py (1)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py (2)
small_training_cmd(20-29)small_training_finetune_cmd(32-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (rust)
🔇 Additional comments (4)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/__init__.py (1)
1-14: Package marker OKLicensing header + package init only. No issues.
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py (1)
38-50: Capture helpers LGTMstdout/stderr capture with distributed state context is clean.
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_lora.py (2)
109-121: Verify LoRA resume path semanticsYou pass a trainer checkpoint dir as --lora-checkpoint-path. If Evo2LoRA expects a LoRA-specific checkpoint (not a full trainer ckpt), this may silently no-op or misload. Please confirm the expected format/path and adjust to the LoRA artifact location if different.
Would you like me to scan the repo for where Evo2LoRA writes its PEFT artifacts and update this test accordingly?
24-33: Test structure and assertions look solidEnd-to-end pretrain → finetune flow and basic artifact checks are clear.
Also applies to: 71-79, 101-107
f5cbdd6 to
852987b
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (1)
788-804: Initialize lora_transform before model branching to avoid undefined variable.
lora_transformis only set in the Hyena branch; using--lora-finetunewith non‑Hyena models leads toUnboundLocalErrorlater.Apply this diff:
# Create model based on selected model type + lora_transform = None if model_type == "hyena": if args.model_size not in HYENA_MODEL_OPTIONS: raise ValueError(f"Invalid model size for Hyena: {args.model_size}") model_config = HYENA_MODEL_OPTIONS[args.model_size](**config_modifiers_init) if args.no_weight_decay_embeddings: # Override the default weight decay condition for Hyena with our bionemo version that also excludes # embeddings model_config.hyena_no_weight_decay_cond_fn = hyena_no_weight_decay_cond_with_embeddings - # Lora adaptors configuration - lora_transform = None + # LoRA adapters configuration if args.lora_finetune: lora_transform = Evo2LoRA(peft_ckpt_path=args.lora_checkpoint_path) model = llm.HyenaModel(model_config, tokenizer=data_module.tokenizer, model_transform=lora_transform)
🧹 Nitpick comments (6)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (1)
666-674: Use effective steps when early_stop_on_step is set (mock data sizing).If
--early-stop-on-stepis used, training samples should reflect that to avoid oversizing mock data.Apply this diff:
if args.mock_data: - data_module = MockDataModule( + effective_max_steps = args.early_stop_on_step or args.max_steps + data_module = MockDataModule( seq_length=args.seq_length, micro_batch_size=args.micro_batch_size, global_batch_size=global_batch_size, - num_train_samples=args.max_steps * global_batch_size, + num_train_samples=effective_max_steps * global_batch_size, num_val_samples=args.limit_val_batches * global_batch_size, num_test_samples=1, num_workers=args.workers, tokenizer=tokenizer, )sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_finetune.py (5)
100-106: Stabilize assertion: check overall improvement instead of per‑step monotonicity.Per‑step non‑increase is flaky on short, synthetic runs. Assert net improvement and parse all entries.
Apply this diff:
- val_losses = extract_val_losses(stdout_pretrain, val_steps) - - for i in range(1, len(val_losses)): - assert val_losses[i][1] <= val_losses[i - 1][1], ( - f"Validation loss increased at step {val_losses[i][0]}: {val_losses[i][1]} > {val_losses[i - 1][1]}" - ) + val_losses = extract_val_losses(stdout_pretrain, 1) + assert val_losses, "No validation-loss entries found in logs." + assert val_losses[-1][1] <= val_losses[0][1], ( + f"Validation loss did not improve overall: first={val_losses[0][1]} last={val_losses[-1][1]}" + )
157-164: Apply the same stable assertion for finetune phase.Apply this diff:
- val_losses_ft = extract_val_losses(stdout_finetune, val_steps) - - # Check that each validation loss is less than or equal to the previous one - for i in range(1, len(val_losses_ft)): - assert val_losses_ft[i][1] <= val_losses_ft[i - 1][1], ( - f"Validation loss increased at step {val_losses_ft[i][0]}: {val_losses_ft[i][1]} > {val_losses_ft[i - 1][1]}" - ) + val_losses_ft = extract_val_losses(stdout_finetune, 1) + assert val_losses_ft, "No validation-loss entries found in logs (finetune)." + assert val_losses_ft[-1][1] <= val_losses_ft[0][1], ( + f"Finetune loss did not improve overall: first={val_losses_ft[0][1]} last={val_losses_ft[-1][1]}" + )
171-182: Avoid variable shadowing; keep resume stdout separate.Reuse of
stdout_finetuneis confusing; usestdout_resume.Apply this diff:
- stdout_finetune: str = run_command_in_subprocess(command=command_resume_finetune, path=str(tmp_path)) + stdout_resume: str = run_command_in_subprocess(command=command_resume_finetune, path=str(tmp_path))
196-202: Resume phase: use the renamed variable and stable assertion.Apply this diff:
- val_losses_ft = extract_val_losses(stdout_finetune, val_steps) - - # Check that each validation loss is less than or equal to the previous one - for i in range(1, len(val_losses_ft)): - assert val_losses_ft[i][1] <= val_losses_ft[i - 1][1], ( - f"Validation loss increased at step {val_losses_ft[i][0]}: {val_losses_ft[i][1]} > {val_losses_ft[i - 1][1]}" - ) + val_losses_ft = extract_val_losses(stdout_resume, 1) + assert val_losses_ft, "No validation-loss entries found in logs (resume finetune)." + assert val_losses_ft[-1][1] <= val_losses_ft[0][1], ( + f"Resume finetune loss did not improve overall: first={val_losses_ft[0][1]} last={val_losses_ft[-1][1]}" + )
107-112: Remove duplicate event-file existence check (already done above).Apply this diff:
- # Check if directory with tensorboard logs exists - assert tensorboard_dir.exists(), "TensorBoard logs folder does not exist." - # Recursively search for files with tensorboard logger - event_files = list(tensorboard_dir.rglob("events.out.tfevents*")) - assert event_files, f"No TensorBoard event files found under {tensorboard_dir}" - assert len(matching_subfolders) == 1, "Only one checkpoint subfolder should be found." + assert len(matching_subfolders) == 1, "Only one checkpoint subfolder should be found."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py(0 hunks)sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py(4 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/__init__.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_finetune.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py(1 hunks)
💤 Files with no reviewable changes (1)
- sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py
✅ Files skipped from review due to trivial changes (1)
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py
🧰 Additional context used
🧬 Code graph analysis (2)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (1)
sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py (1)
Evo2LoRA(26-279)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_finetune.py (3)
sub-packages/bionemo-testing/src/bionemo/testing/subprocess_utils.py (1)
run_command_in_subprocess(108-129)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py (2)
small_training_cmd(20-37)small_training_finetune_cmd(40-60)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py (1)
test_train_evo2_finetune_runs(103-171)
🔇 Additional comments (3)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (3)
51-51: Correct import path for Evo2LoRA.Importing Evo2LoRA from models.peft matches the new location.
612-614: CLI type change to str aligns with Evo2LoRA signature.
--lora-checkpoint-pathasstrmatches the peft constructor.
827-829: Guard callback append to avoid UnboundLocalError for non‑Hyena.Append only when the transform was constructed.
Apply this diff:
- if args.lora_finetune: - callbacks.append(lora_transform) + if lora_transform is not None: + callbacks.append(lora_transform)
b0ac603 to
0f54e08
Compare
|
/ok to test 0f54e08 |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py (4)
54-60: Remove duplicate--limit-val-batches 1flag in Mamba cmd.Flag is set twice; keep one.
- f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 " + f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} "
67-73: Remove duplicate--limit-val-batches 1flag in Mamba finetune cmd.Same duplicate as above.
- f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 " + f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} "
77-85: Remove duplicate--limit-val-batches 1flag in Llama cmd.Simplify to a single occurrence.
- f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 " + f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} "
91-98: Remove duplicate--limit-val-batches 1flag in Llama finetune cmd.Drop the repeated flag.
- f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} --limit-val-batches 1 " + f"--max-steps {max_steps} --warmup-steps 1 --val-check-interval {val_check} "
🧹 Nitpick comments (1)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py (1)
53-99: Consider moving Mamba/Llama helpers to common.py for reuse.Unify helper builders (Mamba/Llama) in tests/bionemo/evo2/run/common.py to avoid drift with small_training_cmd variants and reduce duplication.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py(0 hunks)sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py(4 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/__init__.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_finetune.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py(1 hunks)
💤 Files with no reviewable changes (1)
- sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py
🚧 Files skipped from review as they are similar to previous changes (4)
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_finetune.py
- sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/init.py
🧰 Additional context used
🧬 Code graph analysis (1)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py (2)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py (2)
small_training_cmd(20-37)small_training_finetune_cmd(40-60)sub-packages/bionemo-testing/src/bionemo/testing/torch.py (1)
check_fp8_support(21-33)
🔇 Additional comments (2)
sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py (2)
33-34: Relative import LGTM; fixes testEnv packaging issues.Good switch to package-relative import; aligns with prior feedback and avoids ModuleNotFoundError in CI.
39-41: Docstring tweak LGTM.No functional impact; concise and clear.
0f54e08 to
79361ed
Compare
|
/ok to test 79361ed |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (1)
613-613: Validate LoRA checkpoint path before use.Type change to
stris fine, but add a lightweight existence check to fail fast if a bad path is provided.Add before constructing
Evo2LoRA(Hyena branch):from pathlib import Path if args.lora_finetune and args.lora_checkpoint_path: ckpt_p = Path(args.lora_checkpoint_path) if not ckpt_p.exists(): raise FileNotFoundError(f"LoRA checkpoint not found: {ckpt_p}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py(0 hunks)sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py(4 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/__init__.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_finetune.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py(1 hunks)
💤 Files with no reviewable changes (1)
- sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py
🚧 Files skipped from review as they are similar to previous changes (4)
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_finetune.py
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/init.py
🧰 Additional context used
🧬 Code graph analysis (1)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (1)
sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py (1)
Evo2LoRA(26-279)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: changed-files
- GitHub Check: pre-commit
- GitHub Check: changed-dirs
- GitHub Check: Analyze (rust)
🔇 Additional comments (3)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (3)
827-829: Fix NameError: lora_transform is undefined for non‑Hyena models.When
--lora-finetuneis used with mamba/llama,lora_transformisn’t defined, causing a crash.Apply this minimal fix to avoid the exception:
- if args.lora_finetune: - callbacks.append(lora_transform) + if 'lora_transform' in locals() and lora_transform is not None: + callbacks.append(lora_transform)Additionally, initialize
lora_transformand fail fast when LoRA is requested for unsupported models (place just after determiningmodel_typeand before branching):# before: "Create model based on selected model type" lora_transform = None if args.lora_finetune and model_type != "hyena": raise ValueError("--lora-finetune is currently supported only for Hyena models.")
51-51: Import path update verified—no stale imports remain.
671-674: Use eval batch size (no grad accumulation) for val/test sample counts.Validation/test don't use gradient accumulation — divide out args.grad_acc_batches for val/test sample counts.
- num_train_samples=args.max_steps * global_batch_size, - num_val_samples=args.limit_val_batches * global_batch_size, - num_test_samples=1, + num_train_samples=args.max_steps * global_batch_size, + num_val_samples=args.limit_val_batches * max(1, global_batch_size // max(1, args.grad_acc_batches)), + num_test_samples=1,
Signed-off-by: Bruno Alvisio <[email protected]>
46c8423 to
8027918
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (2)
613-613: Arg type change to str is fine; add lightweight CLI validation.Ensure the flag is only used with --lora-finetune (and optionally check local file existence).
Add post-parse validation before returning from parse_args:
args_ns = parser.parse_args(args=args) if args_ns.lora_checkpoint_path and not args_ns.lora_finetune: parser.error("--lora-checkpoint-path requires --lora-finetune.") # Optional: check existence if intended to be a local path # from pathlib import Path # if args_ns.lora_checkpoint_path and not str(args_ns.lora_checkpoint_path).startswith(("s3://","gs://")) and not Path(args_ns.lora_checkpoint_path).exists(): # parser.error(f"--lora-checkpoint-path not found: {args_ns.lora_checkpoint_path}") return args_ns
671-673: Sanity-check MockDataModule sample counts; ensure test has at least one full batch.Suggest using at least one global batch for test to avoid empty/partial-batch edge cases.
Apply this diff:
- num_test_samples=1, + num_test_samples=global_batch_size,Also, please confirm your NeMo MockDataModule supports the num_train_samples/num_val_samples/num_test_samples kwargs in your pinned version to avoid a TypeError at construction.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py(0 hunks)sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py(4 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/__init__.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_finetune.py(1 hunks)sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py(1 hunks)
💤 Files with no reviewable changes (1)
- sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py
🚧 Files skipped from review as they are similar to previous changes (4)
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/common.py
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_train.py
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/test_finetune.py
- sub-packages/bionemo-evo2/tests/bionemo/evo2/run/init.py
🧰 Additional context used
🧬 Code graph analysis (1)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (1)
sub-packages/bionemo-evo2/src/bionemo/evo2/models/peft.py (1)
Evo2LoRA(26-279)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (rust)
🔇 Additional comments (2)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (2)
51-51: Evo2LoRA import path update looks correct.Import aligns with the class location and downstream usage.
827-829: Fix UnboundLocalError: lora_transform may be undefined for non‑Hyena models.If --lora-finetune is passed with mamba/llama, lora_transform is not set, causing an error. Append only when it exists; initialize before branching.
Apply this diff:
- if args.lora_finetune: - callbacks.append(lora_transform) + if 'lora_transform' in locals() and lora_transform is not None: + callbacks.append(lora_transform)And initialize before the model-type branching:
# before: if model_type == "hyena": lora_transform = None
|
/ok to test 8027918 |
|
/ok to test 5126bea |
|
/ok to test 2ba44da |
|
/ok to test 540edf4 |
Description
Fixes and added test for Evo2LoRA
Type of changes
CI Pipeline Configuration
Configure CI behavior by applying the relevant labels:
Note
By default, the notebooks validation tests are skipped unless explicitly enabled.
Authorizing CI Runs
We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.
automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
/ok to testcomment on the pull request to trigger CI. This will need to be done for each new commit.Usage
# TODO: Add code snippetPre-submit Checklist
Summary by CodeRabbit
New Features
Tests
Chores