Skip to content

feat: Add Evo2 fine-tuning partial-conv benchmarking#1028

Merged
jwilber merged 17 commits intomainfrom
mvle/evo2-fine-tuning
Aug 29, 2025
Merged

feat: Add Evo2 fine-tuning partial-conv benchmarking#1028
jwilber merged 17 commits intomainfrom
mvle/evo2-fine-tuning

Conversation

@nvmvle
Copy link
Copy Markdown
Collaborator

@nvmvle nvmvle commented Aug 7, 2025

Description

This PR adds a benchmarking configuration for Evo2 fine-tuning with partial convolution support. The changes include:

  • A new benchmarking YAML configuration file for CI/CD pipeline to test Evo2 fine-tuning performance
  • Updates to the Evo2 training script to support the benchmarking workflow

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

CI Pipeline Configuration

Configure CI behavior by applying the relevant labels:

Note

By default, the notebooks validation tests are skipped unless explicitly enabled.

Authorizing CI Runs

We use copy-pr-bot to manage authorization of CI
runs on NVIDIA's compute resources.

  • If a pull request is opened by a trusted user and contains only trusted changes, the pull request's code will
    automatically be copied to a pull-request/ prefixed branch in the source repository (e.g. pull-request/123)
  • If a pull request is opened by an untrusted user or contains untrusted changes, an NVIDIA org member must leave an
    /ok to test comment on the pull request to trigger CI. This will need to be done for each new commit.

Usage

# Example usage of the benchmarking configuration
# The benchmarking can be triggered through CI/CD pipeline
# using the new YAML configuration at:
# ci/benchmarks/partial-conv/evo2_finetuning.yaml

Pre-submit Checklist

| - [x] I have tested these changes locally
| - [ ] I have updated the documentation accordingly
| - [ ] I have added/updated tests as needed
| - [ ] All existing tests pass successfully

Signed-off-by: My Le mvle@nvidia.com

Summary by CodeRabbit

  • New Features

    • Added an optional command-line flag to clean up GPU memory before validation/inference, improving training stability on CUDA devices; the mechanism to perform this cleanup is now included.
  • Chores

    • Added benchmark configurations to run Evo2 finetuning variants on partial-conv with preset training parameters and logging.
    • Fixed a pretraining benchmark URL token placeholder to use the correct syntax for artifact access.

@nvmvle nvmvle requested a review from jwilber August 7, 2025 16:02
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Aug 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: nvmvle <mvle@nvidia.com>
@nvmvle nvmvle force-pushed the mvle/evo2-fine-tuning branch from 3fdfada to 557a24f Compare August 12, 2025 07:30
@nvmvle nvmvle changed the title feat(evo2): Add comprehensive fine-tuning support for Evo2 models feat: Add Evo2 fine-tuning partial-conv benchmarking Aug 12, 2025
@jwilber
Copy link
Copy Markdown
Collaborator

jwilber commented Aug 15, 2025

Have we confirmed that JET has the required datasets here? I.e.
data_base_path, restore_from_checkpoint_path, dataset_config, and dataset_dir?

If so, awesome! Then just update with my comments and it looks great

@jwilber
Copy link
Copy Markdown
Collaborator

jwilber commented Aug 18, 2025

/ok to test 488bf5e

@jwilber jwilber marked this pull request as ready for review August 18, 2025 22:07
@jwilber
Copy link
Copy Markdown
Collaborator

jwilber commented Aug 18, 2025

Updated:

  • pushed data into mounted locations in eos
  • simplified yaml
  • removed train.py changes

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 18, 2025

Codecov Report

❌ Patch coverage is 31.57895% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.59%. Comparing base (7c18697) to head (d3b59bd).
⚠️ Report is 297 commits behind head on main.

Files with missing lines Patch % Lines
...s/bionemo-evo2/src/bionemo/evo2/utils/callbacks.py 33.33% 10 Missing ⚠️
...ackages/bionemo-evo2/src/bionemo/evo2/run/train.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1028      +/-   ##
==========================================
- Coverage   80.69%   80.59%   -0.11%     
==========================================
  Files         156      157       +1     
  Lines       11060    11079      +19     
==========================================
+ Hits         8925     8929       +4     
- Misses       2135     2150      +15     
Files with missing lines Coverage Δ
...ackages/bionemo-evo2/src/bionemo/evo2/run/train.py 14.34% <25.00%> (+0.17%) ⬆️
...s/bionemo-evo2/src/bionemo/evo2/utils/callbacks.py 33.33% <33.33%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

nvmvle and others added 4 commits August 19, 2025 01:35
Signed-off-by: nvmvle <mvle@nvidia.com>
Signed-off-by: nvmvle <mvle@nvidia.com>
Signed-off-by: nvmvle <mvle@nvidia.com>
@jwilber
Copy link
Copy Markdown
Collaborator

jwilber commented Aug 19, 2025

/ok to test afec57a

jwilber and others added 3 commits August 19, 2025 12:15
Signed-off-by: Jared Wilber <jwilber@nvidia.com>
Signed-off-by: nvmvle <mvle@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Aug 28, 2025

Walkthrough

Adds a new Evo2 finetuning benchmark config, updates a placeholder in the pretrain benchmark URL, introduces a GarbageCollectAtInferenceTime Lightning callback, and wires a new --garbage-collect-at-inference flag in the training script to optionally run CUDA/GC cleanup at validation start.

Changes

Cohort / File(s) Summary of Changes
Benchmark configs
ci/benchmarks/partial-conv/evo2_finetuning.yaml, ci/benchmarks/partial-conv/evo2_pretrain.yaml
Adds a new finetuning benchmark config with time_limit, key_segments, script_args, two products (finetune, lora_finetune) and a training script invocation. Updates artefacts_url placeholder in pretrain YAML to use ${{JET_GITLAB_TOKEN}}.
Training entrypoint
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py
Adds CLI flag --garbage-collect-at-inference (default false) and, when set, appends GarbageCollectAtInferenceTime() to the trainer callbacks before validation.
Callbacks
sub-packages/bionemo-evo2/src/bionemo/evo2/utils/callbacks.py
Adds GarbageCollectAtInferenceTime Lightning callback implementing on_validation_start to perform CUDA empty_cache, synchronize, reset device, and run gc.collect() inside a safe try/except.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor CI as CI / User
  participant CLI as train.py
  participant Trainer as Lightning Trainer
  participant CB as GarbageCollectAtInferenceTime
  participant CUDA as CUDA & Python GC
  participant Val as Validation Loop

  CI->>CLI: invoke train_${model} [--garbage-collect-at-inference]
  CLI->>Trainer: init Trainer (callbacks [..., CB?])

  alt garbage-collect flag enabled
    Trainer->>CB: on_validation_start()
    CB->>CUDA: empty_cache(), sync, set_device, sync, gc.collect()
    CUDA-->>CB: cleanup complete
  else flag disabled
    Trainer-->>Val: proceed to validation without extra cleanup
  end

  Trainer->>Val: start validation
  Val-->>Trainer: return metrics
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I tidied the cache with a whisker’s swish,
Swept CUDA crumbs—just as you’d wish.
Benchmarks queued and flags set right,
Validation hops on through the night.
A rabbit’s patch, clean bytes, delight. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 807f311 and f8f216b.

📒 Files selected for processing (1)
  • ci/benchmarks/partial-conv/evo2_finetuning.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ci/benchmarks/partial-conv/evo2_finetuning.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (rust)
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mvle/evo2-fine-tuning

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
sub-packages/bionemo-evo2/src/bionemo/evo2/utils/callbacks.py (1)

25-36: Tighten CUDA/GC cleanup ordering and logging.

  • Run gc.collect() before empty_cache() so Python frees tensors first.
  • Dropping set_device(current_device) is safe/redundant here.
  • Use a logger instead of print, and optionally log only on rank 0 to avoid spam.
-    def on_validation_start(self, trainer, pl_module) -> None:
+    def on_validation_start(self, trainer, pl_module) -> None:
         """Clean up CUDA memory before validation to prevent initialization errors."""
         if torch.cuda.is_available():
             try:
-                torch.cuda.empty_cache()
-                torch.cuda.synchronize()
-                current_device = torch.cuda.current_device()
-                torch.cuda.set_device(current_device)
-                torch.cuda.synchronize()
-                gc.collect()
+                gc.collect()
+                torch.cuda.empty_cache()
+                torch.cuda.synchronize()
             except Exception as e:
-                print(f"Warning: CUDA cleanup failed: {e}")
+                logger = getattr(pl_module, "log", None)
+                msg = f"CUDA cleanup failed: {e}"
+                if logger is not None:
+                    pl_module.log("gc_warning", msg, on_step=True, on_epoch=False, prog_bar=False, logger=True)
+                else:
+                    import logging; logging.getLogger(__name__).warning(msg)

Additions outside the selected range:

# at top-level imports
import logging

logger = logging.getLogger(__name__)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (1)

510-515: Flag name and help are clear; consider interplay with existing GC callback.

You already support nl_callbacks.GarbageCollectionCallback via --gc-interval. Document that both can be used together (GPU vs CPU GC) and which one to prefer in typical FP8 runs.

ci/benchmarks/partial-conv/evo2_finetuning.yaml (2)

47-47: Align max_steps and early-stop to avoid confusion.

early-stop-on-step overrides max_steps in train.py. With max_steps=10 but stop_steps=200, the run will target 200 steps. If you intend a short smoke test, set stop_steps to 10 (or drop max_steps).

-  max_steps: 10
+  max_steps: 10
@@
-    --early-stop-on-step=${stop_steps} \
+    --early-stop-on-step=10 \

Also applies to: 97-97


55-56: Note: precision key is metadata-only.

precision: fp8 isn’t consumed by the script; fp8 is toggled via --fp8/--fp8-wgrad in train.py. If you need FP8 here, add the flags; otherwise keep precision solely for grouping.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 658c538 and cf3926a.

📒 Files selected for processing (5)
  • ci/benchmarks/partial-conv/evo2_finetuning.yaml (1 hunks)
  • ci/benchmarks/partial-conv/evo2_pretrain.yaml (1 hunks)
  • sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (3 hunks)
  • sub-packages/bionemo-evo2/src/bionemo/evo2/utils/callbacks.py (1 hunks)
  • sub-packages/bionemo-evo2/src/bionemo/evo2/utils/logging/callbacks.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (1)
sub-packages/bionemo-evo2/src/bionemo/evo2/utils/callbacks.py (1)
  • GarbageCollectAtInferenceTime (22-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (5)
ci/benchmarks/partial-conv/evo2_pretrain.yaml (1)

18-18: Verify JET token interpolation inside script_args value.

Switching to ${{JET_GITLAB_TOKEN}} assumes JET expands variables within script_args before shell execution. Please confirm this resolves to a bare token at runtime; otherwise pip will receive the literal braces and auth will fail. If uncertain, prefer env expansion directly in the script (e.g., set artefacts_url from $JET_GITLAB_TOKEN in-shell) to avoid templating ambiguities and reduce secret exposure in logs.

sub-packages/bionemo-evo2/src/bionemo/evo2/utils/logging/callbacks.py (1)

46-46: LGTM on spacing.

The added blank line after the docstring improves readability; no functional changes.

sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (2)

51-51: LGTM: callback import is scoped and specific.

Importing GarbageCollectAtInferenceTime here keeps the training entrypoint self-contained.


655-656: LGTM: conditional registration of cleanup callback.

Hooking it right after TEVCallback keeps ordering predictable and isolated from LoRA transforms.

ci/benchmarks/partial-conv/evo2_finetuning.yaml (1)

58-65: Matrix overrides look correct; verify empty string expansion.

Confirm that lora_enabled: "" renders to nothing (no stray spaces) while "--lora-finetune" is injected for the second product. If templating preserves an extra space, place the flag at the end of the line or gate with a conditional.

@jwilber
Copy link
Copy Markdown
Collaborator

jwilber commented Aug 28, 2025

/ok to test 7a828cf

@jwilber jwilber enabled auto-merge August 28, 2025 22:09
Signed-off-by: Jared Wilber <jwilber@nvidia.com>
@jwilber
Copy link
Copy Markdown
Collaborator

jwilber commented Aug 28, 2025

/ok to test 9ec1f62

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
ci/benchmarks/partial-conv/evo2_finetuning.yaml (1)

88-88: Quote glob to prevent shell expansion.

SDH* will be expanded by the shell.

-    --hybrid-override-pattern SDH* \
+    --hybrid-override-pattern 'SDH*' \
🧹 Nitpick comments (2)
ci/benchmarks/partial-conv/evo2_finetuning.yaml (2)

30-30: Unused arg: workspace.

Not referenced in the script. Either use it (e.g., for result-dir) or drop to avoid drift.


31-41: Path assumptions: confirm JET availability.

Ensure /data/evo2/{preprocessed_data,checkpoints/nemo2_evo2_1b_8k,training_data_config.yaml} exist on the JET runners used by this scope.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9ec1f62 and 807f311.

📒 Files selected for processing (1)
  • ci/benchmarks/partial-conv/evo2_finetuning.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (3)
ci/benchmarks/partial-conv/evo2_finetuning.yaml (3)

3-26: LGTM on key_segments defaults.

Reasonable exclusions to keep run IDs concise.


57-66: Products overlay looks fine.

Variant and LoRA flag override pattern is clear.


98-98: Ignore --garbage-collect-at-inference concern. The flag is defined in sub-packages/bionemo-evo2/src/bionemo/evo2/run/train.py (parser.add_argument at line 511) and applied via args.garbage_collect_at_inference (line 655), so the YAML entry is valid as-is.

Likely an incorrect or invalid review comment.

@jwilber
Copy link
Copy Markdown
Collaborator

jwilber commented Aug 29, 2025

/ok to test 807f311

@jwilber
Copy link
Copy Markdown
Collaborator

jwilber commented Aug 29, 2025

/ok to test d3b59bd

@jwilber jwilber added this pull request to the merge queue Aug 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 29, 2025
@jwilber jwilber added this pull request to the merge queue Aug 29, 2025
Merged via the queue into main with commit a29272f Aug 29, 2025
19 checks passed
@jwilber jwilber deleted the mvle/evo2-fine-tuning branch August 29, 2025 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants