Skip to content

ESM2 changes to work with vLLM#1473

Open
gagank1 wants to merge 22 commits intomainfrom
gkaushik/esm2-vllm
Open

ESM2 changes to work with vLLM#1473
gagank1 wants to merge 22 commits intomainfrom
gkaushik/esm2-vllm

Conversation

@gagank1
Copy link
Collaborator

@gagank1 gagank1 commented Feb 18, 2026

Description

This PR makes the ESM2 model compatible with vLLM. Primary issues were a naming incompatibility (vLLM expects model. prefix and ESM2 uses esm.) and NVEsmModel defaults to add_pooling_layer=True when loading the checkpoint even though it's exported without pooler weights.

Usage

python test_esm2_golden_values.py from inside the container, instructions to build and run it are provided.

Type of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactor
  • Documentation update
  • Other (please describe):

Triggering Code Rabbit AI Review

To trigger a code review from code rabbit, comment on a pull request with one of these commands:

See https://docs.coderabbit.ai/reference/review-commands for a full list of commands.

Pre-submit Checklist

  • I have tested these changes locally
  • I have updated the documentation accordingly
  • I have added/updated tests as needed
  • All existing tests pass successfully

@gagank1 gagank1 self-assigned this Feb 18, 2026
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 83a067b3-7344-4cb7-9e30-06103417d7c9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

ESM2 model architecture refactored to rename the base model namespace from "esm" to "model" across core components, model classes, and related tests. A new add_pooling_layer configuration option was introduced to control pooler presence. Additionally, vLLM integration infrastructure added with Docker setup and validation tooling.

Changes

Cohort / File(s) Summary
Core Model Architecture Refactoring
bionemo-recipes/models/esm2/convert.py, export.py, modeling_esm_te.py
Namespace renaming from "esm" to "model" in weight mappings, base model references, and state dict handling. Added add_pooling_layer configuration option (defaults to True) wired through NVEsmConfig and NVEsmModel. Updated base_model_prefix and tied weights references. Removed quantized_model_init wrapper in favor of fp8_model_init context. Changed state_dict filtering to exclude both _extra_state and .inv_freq keys for vLLM compatibility.
ESM2 Test Suite Updates
bionemo-recipes/models/esm2/tests/test_cp_bshd.py, test_cp_thd.py, test_distributed_fp8.py, test_distributed_strategies.py, test_modeling_esm_te.py
Updated model layer access paths from model.esm.encoder.layers to model.model.encoder.layers across distributed testing, context-parallel gradient validation, and state_dict verification. No functional logic changes, only attribute path corrections.
Recipe Implementation Propagation
bionemo-recipes/recipes/esm2_accelerate_te/..., esm2_native_te/..., esm2_peft_te/.../esm_nv.py
Applied namespace refactoring, pooling layer configuration, and weight initialization updates from core model to recipe implementations. Updated NVEsmPreTrainedModel, NVEsmForMaskedLM, NVEsmForTokenClassification, and NVEsmLMHead across all recipe variants. Replaced direct "esm" references with "model" and removed legacy TE quantization contexts.
Training Script Updates
bionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.py, train_ddp.py, train_ddp_cp.py, train_fsdp2.py, train_fsdp2_cp.py
Updated contact_head deletion and encoder layer access paths from model.esm.* to model.model.* to align with new model structure. No logic changes, only path corrections for distributed training configurations (DDP, FSDP2, context parallelism).
vLLM Integration
bionemo-recipes/vllm/Dockerfile, README.md, launch.sh, test_esm2_golden_values.py
New vLLM inference infrastructure added: Dockerfile extends vLLM base with CUDA 13.1 tooling and transformer_engine from source; launch.sh provides Docker run wrapper with optional mounts and headless mode; test_esm2_golden_values.py validates ESM2 embedding consistency across vLLM, HuggingFace, and hub models with numerical comparison metrics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Hop, hop! The model's now refined,
From "esm" paths, we've left behind,
A "model" namespace, clean and bright,
With pooling layers—flexible might!
And vLLM joins the warren's flight, 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description covers the basic what (making ESM2 compatible with vLLM) and usage instructions, but is missing detailed explanation of the changes, their purpose, and how they achieve vLLM compatibility. Add more comprehensive details about the specific changes made, the rationale behind the namespace refactoring from 'esm' to 'model', the pooling layer changes, and how these changes enable vLLM compatibility.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ESM2 changes to work with vLLM' clearly describes the main objective of the PR - making ESM2 compatible with vLLM.
Docstring Coverage ✅ Passed Docstring coverage is 96.30% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gkaushik/esm2-vllm

Comment @coderabbitai help to get the list of available commands and usage tips.

@broland-hat
Copy link
Collaborator

@gagank1 : Are you working on a readme?

@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 23, 2026

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 23, 2026

/ok to test

@gagank1, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@gagank1 gagank1 marked this pull request as ready for review February 23, 2026 17:06
@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 23, 2026

/ok to test

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 23, 2026

/ok to test

@gagank1, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 23, 2026

/ok to test c34c09b

@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 23, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@gagank1 gagank1 requested a review from broland-hat February 23, 2026 17:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (3)
bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py (1)

405-407: Same _tied_weights_keys dict type concern.

See comment on modeling_esm_te.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py` around
lines 405 - 407, The class variable _tied_weights_keys uses the modern built-in
generic dict[...] typing which may be incompatible with the rest of the
codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is
imported) or use typing.Mapping if immutability is desired, mirroring the fix
applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str,
str]] with the same key/value entries retained.
bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py (1)

405-407: Same _tied_weights_keys dict type concern as in modeling_esm_te.py.

See comment on the canonical file — HF expects list[str], not dict[str, str].

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`
around lines 405 - 407, The _tied_weights_keys typed as ClassVar[dict[str, str]]
should be changed to ClassVar[list[str]] to match HF expectations (same fix as
in modeling_esm_te.py); replace the dict literal with a list of the relevant
parameter names (e.g. ["lm_head.decoder.weight",
"model.embeddings.word_embeddings.weight"]) and update any usages that assume
dict semantics to use the list order or explicit pairing where needed.
bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py (1)

405-407: Same _tied_weights_keys dict type concern.

See comment on modeling_esm_te.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`
around lines 405 - 407, _tied_weights_keys is annotated as a mutable dict
ClassVar which repeats the same typing concern as in modeling_esm_te.py; change
the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]]
from typing) and, to avoid accidental mutation, assign a read-only view (e.g.,
types.MappingProxyType({"lm_head.decoder.weight":
"model.embeddings.word_embeddings.weight"})); update the import list to include
typing.Mapping and types if not present and mirror the same pattern used/fixed
in modeling_esm_te.py.
🧹 Nitpick comments (4)
bionemo-recipes/vllm/launch.sh (1)

50-50: exec $DOCKER_CMD is unquoted — word splitting will break paths with spaces.

If PROJECT_ROOT contains spaces (e.g., /home/user/my projects/...), the -v argument will be incorrectly split into multiple tokens. Use a Bash array to avoid this:

🔧 Proposed fix (array-based approach)

Replace the string-based DOCKER_CMD with an array throughout the script:

-DOCKER_CMD="docker run -itd ..."
+DOCKER_CMD=("docker" "run" "-itd" "--gpus" "all" "--network" "host" "--ipc=host" "-e" "HF_TOKEN" "--rm" "--name" "${CONTAINER}_dev")
 
 if [ "$MOUNT_DIR" = true ]; then
     PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)"
-    DOCKER_CMD="$DOCKER_CMD -v ${PROJECT_ROOT}:/workspace/bionemo-framework"
+    DOCKER_CMD+=("-v" "${PROJECT_ROOT}:/workspace/bionemo-framework")
 fi
 
-DOCKER_CMD="$DOCKER_CMD $CONTAINER /bin/bash"
+DOCKER_CMD+=("$CONTAINER" "/bin/bash")
 
-exec $DOCKER_CMD
+exec "${DOCKER_CMD[@]}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/launch.sh` at line 50, The exec $DOCKER_CMD call uses an
unquoted string which allows word-splitting and breaks mount paths with spaces
(e.g., PROJECT_ROOT used in -v). Convert the string DOCKER_CMD into a Bash array
(e.g., DOCKER_CMD=(docker run ...)) and update all places that build/append to
DOCKER_CMD so they push elements into the array, then replace exec $DOCKER_CMD
with exec "${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT
value) is preserved; update any helper code that concatenates DOCKER_CMD to use
array operations instead.
bionemo-recipes/vllm/Dockerfile (2)

2-3: Base image is hosted on an internal NVIDIA GitLab registry — not pullable outside NVIDIA.

gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel requires internal network/credentials access. The commented-out nvcr.io/nvidia/vllm:26.01-py3 alternative on line 1 is the publicly accessible equivalent. Once an NGC release with vLLM ≥ 0.14 becomes available, switching to the public image will make this recipe usable by external contributors without additional setup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/Dockerfile` around lines 2 - 3, The Dockerfile's FROM
line currently uses an internal image tag
"gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is not
pullable externally; replace that base image with the public equivalent
"nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the
FROM instruction so external contributors can build without internal
credentials, and retain a short comment noting the vLLM>=0.14 requirement and to
switch back when an official public image with the needed version is available.

30-30: Pin the transformer_engine version for reproducible builds.

pip install --no-build-isolation transformer_engine[pytorch] with no version specifier will install whichever version is latest at build time. TE releases frequently and has had breaking API changes between major versions (e.g., 1.x → 2.x). A silent version bump can break the integration without any change to this file.

🔧 Proposed fix
-RUN pip install --no-build-isolation transformer_engine[pytorch]
+RUN pip install --no-build-isolation "transformer_engine[pytorch]==<verified_version>"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/Dockerfile` at line 30, The Dockerfile currently
installs transformer_engine without a version pin (the RUN pip install
--no-build-isolation transformer_engine[pytorch] line); change that to install a
specific, tested TE release by updating that RUN to include an exact version
specifier (for example: RUN pip install --no-build-isolation
transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and
won’t break on upstream major/minor bumps; optionally add a short comment noting
the chosen compatible version.
bionemo-recipes/vllm/test_esm2_golden_values.py (1)

46-63: sys.path.insert + os.chdir is fragile for test infrastructure.

sys.path.insert(0, ...) at module level (line 46) and os.chdir inside fresh_export make this script sensitive to working directory and import order. This is acceptable for a standalone validation script run manually inside a container, but consider adding a note that this is not designed to run as part of a pytest suite.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 46 - 63, The
test module mutates import paths and working directory (sys.path.insert(0, ...),
os.chdir(...) inside fresh_export) which is fragile for pytest; update the file
to document this by adding a clear module-level comment or docstring near
sys.path.insert and a brief note on fresh_export explaining it intentionally
changes cwd for export_hf_checkpoint and that the script is not intended to be
run under pytest/parallel test runners (referencing sys.path.insert,
ESM2_MODEL_DIR, and fresh_export by name), or alternatively guard execution with
a main-check so pytest won't import/run it implicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 1-36: Add a non-root user and switch to it in the Dockerfile so
the container does not run as root; create a user/group (e.g., bionemo), chown
the application directory (/workspace/bionemo) and any cache/venv paths to that
user, and add a final USER bionemo line. Locate the Dockerfile sections around
WORKDIR /workspace/bionemo and COPY . . and insert user creation
(useradd/addgroup or groupadd) and chown before switching context, then add USER
bionemo at the end so subsequent runtime processes run unprivileged.

In `@bionemo-recipes/vllm/launch.sh`:
- Around line 36-40: The script hardcodes "--name vllm_dev" in DOCKER_CMD which
prevents multiple instances and misrepresents the positional $CONTAINER
argument; update the usage comment to clarify that the positional argument is
the image name (or image_name) and change the DOCKER_CMD assignments in the
HEADLESS branch to use a container name derived from $CONTAINER (for example
--name "$CONTAINER" or --name "${CONTAINER}_dev") instead of "vllm_dev" so the
image argument is also used as the container name and avoids name collisions.

In `@bionemo-recipes/vllm/README.md`:
- Around line 21-23: Change the fenced code block that currently uses the wrong
language tag; locate the block containing the shell command "python
test_esm2_golden_values.py" in the README and replace the opening fence language
identifier from ```python to ```bash so the command is treated as a shell
snippet and highlighted correctly.
- Line 17: The README's "or use `launch.sh`" is misleading because `launch.sh
--mount_dir` mounts the repo to `/workspace/bionemo-framework` while the manual
`docker run` mounts to `/workspace/bionemo`, causing `python
test_esm2_golden_values.py` to fail when run from WORKDIR; fix by either (A)
update `launch.sh` to mount the project root into `/workspace/bionemo` instead
of `/workspace/bionemo-framework` (adjust mount target and any downstream path
references in launch.sh), or (B) update README.md to explicitly document that
`launch.sh --mount_dir` mounts at `/workspace/bionemo-framework` and instruct
users to cd into the correct subdirectory (where `test_esm2_golden_values.py`
lives) before running the test; mention the affected files `launch.sh`,
`README.md`, and `test_esm2_golden_values.py` so reviewers can locate the
changes.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 147-210: The script currently only prints comparisons and never
fails; add assertions that enforce the golden-value tolerances using RTOL and
ATOL: for each pair in pairs (refer to the pairs list and variables a, b),
assert np.allclose(a, b, rtol=RTOL, atol=ATOL) (or equivalently assert
(np.abs(a-b) <= ATOL + RTOL * np.abs(b)).all()) and fail the test if not, and
also assert cosine_sim(a, b) exceeds a sensible threshold or that exact is True
when ATOL/RTOL are zero; add per-sequence assertions inside the per-sequence
loop to ensure each sequence max-diff <= ATOL + RTOL * max(|b_i|) so the test
fails on unacceptable drift.

---

Duplicate comments:
In `@bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: _tied_weights_keys is annotated as a mutable dict
ClassVar which repeats the same typing concern as in modeling_esm_te.py; change
the annotation to an immutable mapping type (e.g., ClassVar[Mapping[str, str]]
from typing) and, to avoid accidental mutation, assign a read-only view (e.g.,
types.MappingProxyType({"lm_head.decoder.weight":
"model.embeddings.word_embeddings.weight"})); update the import list to include
typing.Mapping and types if not present and mirror the same pattern used/fixed
in modeling_esm_te.py.

In `@bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The _tied_weights_keys typed as ClassVar[dict[str, str]]
should be changed to ClassVar[list[str]] to match HF expectations (same fix as
in modeling_esm_te.py); replace the dict literal with a list of the relevant
parameter names (e.g. ["lm_head.decoder.weight",
"model.embeddings.word_embeddings.weight"]) and update any usages that assume
dict semantics to use the list order or explicit pairing where needed.

In `@bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py`:
- Around line 405-407: The class variable _tied_weights_keys uses the modern
built-in generic dict[...] typing which may be incompatible with the rest of the
codebase; change its annotation to use typing.Dict[str, str] (and ensure Dict is
imported) or use typing.Mapping if immutability is desired, mirroring the fix
applied in modeling_esm_te.py so the declaration becomes ClassVar[Dict[str,
str]] with the same key/value entries retained.

---

Nitpick comments:
In `@bionemo-recipes/vllm/Dockerfile`:
- Around line 2-3: The Dockerfile's FROM line currently uses an internal image
tag "gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel" which is
not pullable externally; replace that base image with the public equivalent
"nvcr.io/nvidia/vllm:26.01-py3" (or parametrize the base via a build ARG) in the
FROM instruction so external contributors can build without internal
credentials, and retain a short comment noting the vLLM>=0.14 requirement and to
switch back when an official public image with the needed version is available.
- Line 30: The Dockerfile currently installs transformer_engine without a
version pin (the RUN pip install --no-build-isolation
transformer_engine[pytorch] line); change that to install a specific, tested TE
release by updating that RUN to include an exact version specifier (for example:
RUN pip install --no-build-isolation
transformer_engine==<MAJOR.MINOR.PATCH>[pytorch]) so builds are reproducible and
won’t break on upstream major/minor bumps; optionally add a short comment noting
the chosen compatible version.

In `@bionemo-recipes/vllm/launch.sh`:
- Line 50: The exec $DOCKER_CMD call uses an unquoted string which allows
word-splitting and breaks mount paths with spaces (e.g., PROJECT_ROOT used in
-v). Convert the string DOCKER_CMD into a Bash array (e.g., DOCKER_CMD=(docker
run ...)) and update all places that build/append to DOCKER_CMD so they push
elements into the array, then replace exec $DOCKER_CMD with exec
"${DOCKER_CMD[@]}" so each argument (including the -v PROJECT_ROOT value) is
preserved; update any helper code that concatenates DOCKER_CMD to use array
operations instead.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py`:
- Around line 46-63: The test module mutates import paths and working directory
(sys.path.insert(0, ...), os.chdir(...) inside fresh_export) which is fragile
for pytest; update the file to document this by adding a clear module-level
comment or docstring near sys.path.insert and a brief note on fresh_export
explaining it intentionally changes cwd for export_hf_checkpoint and that the
script is not intended to be run under pytest/parallel test runners (referencing
sys.path.insert, ESM2_MODEL_DIR, and fresh_export by name), or alternatively
guard execution with a main-check so pytest won't import/run it implicitly.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73cd13d and c34c09b.

📒 Files selected for processing (20)
  • bionemo-recipes/models/esm2/convert.py
  • bionemo-recipes/models/esm2/export.py
  • bionemo-recipes/models/esm2/modeling_esm_te.py
  • bionemo-recipes/models/esm2/tests/test_cp_bshd.py
  • bionemo-recipes/models/esm2/tests/test_cp_thd.py
  • bionemo-recipes/models/esm2/tests/test_distributed_fp8.py
  • bionemo-recipes/models/esm2/tests/test_distributed_strategies.py
  • bionemo-recipes/models/esm2/tests/test_modeling_esm_te.py
  • bionemo-recipes/recipes/esm2_accelerate_te/example_8m_checkpoint/esm_nv.py
  • bionemo-recipes/recipes/esm2_native_te/example_8m_checkpoint/esm_nv.py
  • bionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.py
  • bionemo-recipes/recipes/esm2_native_te/train_ddp.py
  • bionemo-recipes/recipes/esm2_native_te/train_ddp_cp.py
  • bionemo-recipes/recipes/esm2_native_te/train_fsdp2.py
  • bionemo-recipes/recipes/esm2_native_te/train_fsdp2_cp.py
  • bionemo-recipes/recipes/esm2_peft_te/example_8m_checkpoint/esm_nv.py
  • bionemo-recipes/vllm/Dockerfile
  • bionemo-recipes/vllm/README.md
  • bionemo-recipes/vllm/launch.sh
  • bionemo-recipes/vllm/test_esm2_golden_values.py

Comment on lines +1 to +36
# FROM nvcr.io/nvidia/vllm:26.01-py3
FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel
# using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet.

# The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.)
# Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine
RUN apt-get update && apt-get install -y --no-install-recommends wget && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm cuda-keyring_1.1-1_all.deb && \
apt-get update && apt-get install -y --no-install-recommends \
cuda-nvtx-13-1 \
cuda-cupti-dev-13-1 \
cuda-nvml-dev-13-1 \
libcusparse-dev-13-1 \
libcusolver-dev-13-1 \
libcufft-dev-13-1 \
libnvjitlink-dev-13-1 \
libnvjpeg-dev-13-1 \
libcublasmp0-dev-cuda-13 \
libcudnn9-cuda-13 \
&& rm -rf /var/lib/apt/lists/*

# Install remaining dependencies
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=requirements.txt,target=/requirements.txt \
pip install -r /requirements.txt

# Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel)
RUN pip install --no-build-isolation transformer_engine[pytorch]

RUN pip install transformers[torch]==5.0.0


WORKDIR /workspace/bionemo
COPY . .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Container runs as root — add a USER directive.

No USER command is present, so all processes run as root inside the container. This is flagged by Trivy (DS-0002) and violates the principle of least privilege.

🔒 Proposed fix
 WORKDIR /workspace/bionemo
 COPY . .
+
+RUN useradd -m -u 1000 appuser && chown -R appuser /workspace/bionemo
+USER appuser
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# FROM nvcr.io/nvidia/vllm:26.01-py3
FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel
# using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet.
# The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.)
# Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine
RUN apt-get update && apt-get install -y --no-install-recommends wget && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm cuda-keyring_1.1-1_all.deb && \
apt-get update && apt-get install -y --no-install-recommends \
cuda-nvtx-13-1 \
cuda-cupti-dev-13-1 \
cuda-nvml-dev-13-1 \
libcusparse-dev-13-1 \
libcusolver-dev-13-1 \
libcufft-dev-13-1 \
libnvjitlink-dev-13-1 \
libnvjpeg-dev-13-1 \
libcublasmp0-dev-cuda-13 \
libcudnn9-cuda-13 \
&& rm -rf /var/lib/apt/lists/*
# Install remaining dependencies
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=requirements.txt,target=/requirements.txt \
pip install -r /requirements.txt
# Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel)
RUN pip install --no-build-isolation transformer_engine[pytorch]
RUN pip install transformers[torch]==5.0.0
WORKDIR /workspace/bionemo
COPY . .
# FROM nvcr.io/nvidia/vllm:26.01-py3
FROM gitlab-master.nvidia.com:5005/dl/dgx/vllm:main-py3.43005406-devel
# using this because we need vllm >= 0.14 to work with Transformers v5. no released nvidia version with this yet.
# The vLLM image has CUDA 13.1 runtime and nvcc, but missing dev headers (cusparse.h, nvtx, etc.)
# Install cuda-keyring to add NVIDIA's apt repo, then install the dev headers for transformer_engine
RUN apt-get update && apt-get install -y --no-install-recommends wget && \
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i cuda-keyring_1.1-1_all.deb && \
rm cuda-keyring_1.1-1_all.deb && \
apt-get update && apt-get install -y --no-install-recommends \
cuda-nvtx-13-1 \
cuda-cupti-dev-13-1 \
cuda-nvml-dev-13-1 \
libcusparse-dev-13-1 \
libcusolver-dev-13-1 \
libcufft-dev-13-1 \
libnvjitlink-dev-13-1 \
libnvjpeg-dev-13-1 \
libcublasmp0-dev-cuda-13 \
libcudnn9-cuda-13 \
&& rm -rf /var/lib/apt/lists/*
# Install remaining dependencies
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=bind,source=requirements.txt,target=/requirements.txt \
pip install -r /requirements.txt
# Install transformer_engine from source (force build for CUDA 13.1, not pre-built cu12 wheel)
RUN pip install --no-build-isolation transformer_engine[pytorch]
RUN pip install transformers[torch]==5.0.0
WORKDIR /workspace/bionemo
COPY . .
RUN useradd -m -u 1000 appuser && chown -R appuser /workspace/bionemo
USER appuser
🧰 Tools
🪛 Trivy (0.69.1)

[error] 1-1: Image user should not be 'root'

Specify at least 1 USER command in Dockerfile with non-root user as argument

Rule: DS-0002

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/Dockerfile` around lines 1 - 36, Add a non-root user and
switch to it in the Dockerfile so the container does not run as root; create a
user/group (e.g., bionemo), chown the application directory (/workspace/bionemo)
and any cache/venv paths to that user, and add a final USER bionemo line. Locate
the Dockerfile sections around WORKDIR /workspace/bionemo and COPY . . and
insert user creation (useradd/addgroup or groupadd) and chown before switching
context, then add USER bionemo at the end so subsequent runtime processes run
unprivileged.

Comment on lines +36 to +40
if [ "$HEADLESS" = true ]; then
DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
else
DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded --name vllm_dev prevents running multiple containers and shadows the $CONTAINER argument's meaning.

The script's positional argument ($CONTAINER) is the Docker image name, but the running container name is always vllm_dev. This has two problems:

  1. Starting a second instance (e.g., a second headless container) will fail with docker: Error response from daemon: Conflict. The container name "/vllm_dev" is already in use.
  2. The usage comment says <container_name> but it's really <image_name>, which is confusing.

Consider using $CONTAINER as both the image name and the container name (or derive the container name from it):

🔧 Proposed fix
-DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
+DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"
-DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
+DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"

Also update the usage comment on line 4:

-# Usage: ./launch.sh <container_name> [--mount_dir] [--headless]
+# Usage: ./launch.sh <image_name> [--mount_dir] [--headless]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ "$HEADLESS" = true ]; then
DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
else
DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name vllm_dev"
fi
if [ "$HEADLESS" = true ]; then
DOCKER_CMD="docker run -itd --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"
else
DOCKER_CMD="docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm --name ${CONTAINER}_dev"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/launch.sh` around lines 36 - 40, The script hardcodes
"--name vllm_dev" in DOCKER_CMD which prevents multiple instances and
misrepresents the positional $CONTAINER argument; update the usage comment to
clarify that the positional argument is the image name (or image_name) and
change the DOCKER_CMD assignments in the HEADLESS branch to use a container name
derived from $CONTAINER (for example --name "$CONTAINER" or --name
"${CONTAINER}_dev") instead of "vllm_dev" so the image argument is also used as
the container name and avoids name collisions.

docker run -it --gpus all --network host --ipc=host -e HF_TOKEN --rm -v ${PWD}:/workspace/bionemo vllm /bin/bash
```

or use `launch.sh`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

launch.sh mounts to a different container path than the manual docker run command — the README's test instruction will break after using launch.sh --mount_dir.

The manual command (line 14) mounts ${PWD}/workspace/bionemo, which is the container's WORKDIR, so python test_esm2_golden_values.py works directly. However, launch.sh --mount_dir mounts the project root (two levels up) to /workspace/bionemo-framework, leaving /workspace/bionemo populated only by the image's COPY . . layer. Running python test_esm2_golden_values.py from WORKDIR after using --mount_dir will fail unless the user navigates to the correct subdirectory. The "or use launch.sh" phrasing implies equivalence — either clarify the path difference or update launch.sh to mount to /workspace/bionemo instead of /workspace/bionemo-framework.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/README.md` at line 17, The README's "or use `launch.sh`"
is misleading because `launch.sh --mount_dir` mounts the repo to
`/workspace/bionemo-framework` while the manual `docker run` mounts to
`/workspace/bionemo`, causing `python test_esm2_golden_values.py` to fail when
run from WORKDIR; fix by either (A) update `launch.sh` to mount the project root
into `/workspace/bionemo` instead of `/workspace/bionemo-framework` (adjust
mount target and any downstream path references in launch.sh), or (B) update
README.md to explicitly document that `launch.sh --mount_dir` mounts at
`/workspace/bionemo-framework` and instruct users to cd into the correct
subdirectory (where `test_esm2_golden_values.py` lives) before running the test;
mention the affected files `launch.sh`, `README.md`, and
`test_esm2_golden_values.py` so reviewers can locate the changes.

Comment on lines +21 to +23
```python
python test_esm2_golden_values.py
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrong code fence language identifier — should be bash, not python.

The python test_esm2_golden_values.py command is a shell invocation, not Python source code. Using ```python causes syntax highlighters to misparse it.

📝 Proposed fix
-```python
+```bash
 python test_esm2_golden_values.py
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @bionemo-recipes/vllm/README.md around lines 21 - 23, Change the fenced code
block that currently uses the wrong language tag; locate the block containing
the shell command "python test_esm2_golden_values.py" in the README and replace
the opening fence language identifier from python to bash so the command
is treated as a shell snippet and highlighted correctly.


</details>

<!-- fingerprinting:phantom:poseidon:churro -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +147 to +210
if __name__ == "__main__":
print(f"GPUs: {torch.cuda.device_count()}")

# Step 0: fresh export (facebook HF -> our TE format)
print("\n[0/3] Exporting checkpoint ...")
MODEL_ID = fresh_export()

print(f"MODEL_ID: {MODEL_ID}")
print(f"REFERENCE_MODEL_ID: {REFERENCE_MODEL_ID}")
print(f"Sequences: {len(SEQUENCES)}")

# 1) vLLM on exported model
print("\n[1/3] vLLM inference (exported model) ...")
emb_vllm = vllm_embed(MODEL_ID, SEQUENCES)

# 2) HuggingFace on exported model
print("\n[2/3] HuggingFace inference (exported model) ...")
emb_hf_exported = hf_embed(MODEL_ID, SEQUENCES)

# 3) HuggingFace on reference Hub model
print("\n[3/3] HuggingFace inference (reference model) ...")
emb_hf_reference = hf_embed(REFERENCE_MODEL_ID, SEQUENCES)

# ---- Pairwise comparisons ----
pairs = [
("vLLM (exported)", "HF (exported)", emb_vllm, emb_hf_exported),
("vLLM (exported)", "HF (reference)", emb_vllm, emb_hf_reference),
("HF (exported)", "HF (reference)", emb_hf_exported, emb_hf_reference),
]

# ---- Summary table ----
header = f"{'Pair':<35} {'max |diff|':>14} {'mean |diff|':>14} {'cos sim':>12} {'exact':>7}"
sep = "-" * len(header)
print(f"\n{sep}")
print(header)
print(sep)

for name_a, name_b, a, b in pairs:
diffs = np.abs(a.astype(np.float64) - b.astype(np.float64))
label = f"{name_a} vs {name_b}"
exact = np.array_equal(a, b)
print(
f"{label:<35} {diffs.max():>14.8e} {diffs.mean():>14.8e} "
f"{cosine_sim(a, b):>12.10f} {'YES' if exact else 'NO':>7}"
)

print(sep)
print(f"Tolerance: rtol={RTOL}, atol={ATOL} (0 = exact match required)")

# Per-sequence breakdown
short = {"vLLM (exported)": "vllm", "HF (exported)": "hf_exp", "HF (reference)": "hf_ref"}
print("\nPer-sequence max |diff|:")
for i in range(len(SEQUENCES)):
row = f" seq {i}:"
for name_a, name_b, a, b in pairs:
d = float(np.abs(a[i].astype(np.float64) - b[i].astype(np.float64)).max())
row += f" {short[name_a]}_vs_{short[name_b]}={d:.8e}"
print(row)

print(sep)

# Cleanup
if torch.distributed.is_initialized():
torch.distributed.destroy_process_group()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test script has no assertions — it will never fail.

This "test" prints a comparison table but never asserts that results are within tolerance. RTOL and ATOL (line 74) are defined but unused. A silent pass regardless of output drift defeats the purpose of a golden-value test.

Add assertions after the comparison loop, for example:

Suggested assertion block
+    all_passed = True
     for name_a, name_b, a, b in pairs:
         diffs = np.abs(a.astype(np.float64) - b.astype(np.float64))
         label = f"{name_a}  vs  {name_b}"
         exact = np.array_equal(a, b)
         print(
             f"{label:<35} {diffs.max():>14.8e} {diffs.mean():>14.8e} "
             f"{cosine_sim(a, b):>12.10f} {'YES' if exact else 'NO':>7}"
         )
+        if not np.allclose(a, b, rtol=RTOL, atol=ATOL):
+            all_passed = False
 
     print(sep)
-    print(f"Tolerance: rtol={RTOL}, atol={ATOL} (0 = exact match required)")
+    print(f"Tolerance: rtol={RTOL}, atol={ATOL}")
+
+    assert all_passed, "Golden value comparison failed — see table above for details."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bionemo-recipes/vllm/test_esm2_golden_values.py` around lines 147 - 210, The
script currently only prints comparisons and never fails; add assertions that
enforce the golden-value tolerances using RTOL and ATOL: for each pair in pairs
(refer to the pairs list and variables a, b), assert np.allclose(a, b,
rtol=RTOL, atol=ATOL) (or equivalently assert (np.abs(a-b) <= ATOL + RTOL *
np.abs(b)).all()) and fail the test if not, and also assert cosine_sim(a, b)
exceeds a sensible threshold or that exact is True when ATOL/RTOL are zero; add
per-sequence assertions inside the per-sequence loop to ensure each sequence
max-diff <= ATOL + RTOL * max(|b_i|) so the test fails on unacceptable drift.

Comment on lines +77 to +78
# only needed at runtime for FP8 training efficiency; users who train with FP8 pass
# padded_vocab_size explicitly. Keeping vocab_size-sized weights in the checkpoint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users who train with FP8 pass padded_vocab_size explicitly

i'm not sure that's a great assumption 😆. CC @jomitchellnv

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea that's not true, users who train with FP8 don't pass this

@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 25, 2026

/ok to test 36cdbb2

@gagank1 gagank1 added the ciflow:all-recipes Run tests for all recipes (under bionemo-recipes) or enforce running all tests for recipes. label Feb 25, 2026
@gagank1
Copy link
Collaborator Author

gagank1 commented Feb 25, 2026

/ok to test 36cdbb2

Copy link
Collaborator

@broland-hat broland-hat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears to be changes due to (1) layer name changes, (2) toggle the inclusion of a 'pooling layer'. Let me know when the README is updated with a docker build-and-test workflow that I can use to successfully verify the tests added.

# Export without vocab padding so the checkpoint stores embeddings at the real
# vocab_size. This avoids shape-mismatch errors in vLLM's VocabParallelEmbedding,
# which expects vocab_size-shaped weights.
model_te = convert_esm_hf_to_te(model_hf_masked_lm, padded_vocab_size=None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not blocking]: Okay made some changes to convert_esm_hf_to_te

## Runtime Requirements

We recommend using the latest [NVIDIA PyTorch container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
for optimal performance and compatibility. Refer to the provided Dockerfile for details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output = model(**inputs)
```

### Running inference with vLLM
Copy link
Collaborator

@broland-hat broland-hat Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md

  • HW requirements.
    • In the 'Runtime requirements' section please put the HW requirements at a course level of detail, or link/refer to elsewhere in the project where to find the info.
      • On which HW has the code been tested (pytests) so far
      • Eg are both amd64/x86_64 and arm64 expected to work.
      • which generations of NVIDIA devices are expected to work?
  • Build and test, user to verify environment

-top-level Dockerfile WORKDIR is /workspace/bionemo2, recommend the same for this Dockerfile for consistency

torchao!=0.14.0
transformer_engine[pytorch]
transformers
vllm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not blocking]: Better to pin, this dep may update

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it is a recipes design choice to avoid pinning unless absolutely necessary to maximize compatibility when installing into external environments. @pstjohn thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please don't pin. When we get uv working, we can have a uv.lock file with a last-known good configuration and let a dependabot script update them. But we want to know if a vllm update breaks our stuff -- we'll pin (temporarily), fix it here or upstream, and then un-pin

@gagank1
Copy link
Collaborator Author

gagank1 commented Mar 9, 2026

/ok to test 6779320

Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
@gagank1 gagank1 removed the ciflow:all-recipes Run tests for all recipes (under bionemo-recipes) or enforce running all tests for recipes. label Mar 9, 2026
@gagank1
Copy link
Collaborator Author

gagank1 commented Mar 9, 2026

/ok to test 6280742

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

Caution

Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted.

Error details
{"name":"HttpError","status":500,"request":{"method":"PATCH","url":"https://api.github.com/repos/NVIDIA/bionemo-framework/issues/comments/3923853930","headers":{"accept":"application/vnd.github.v3+json","user-agent":"octokit.js/0.0.0-development octokit-core.js/7.0.6 Node.js/24","authorization":"token [REDACTED]","content-type":"application/json; charset=utf-8"},"body":{"body":"<!-- This is an auto-generated comment: summarize by coderabbit.ai -->\n<!-- This is an auto-generated comment: skip review by coderabbit.ai -->\n\n> [!IMPORTANT]\n> ## Review skipped\n> \n> Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the `.coderabbit.yaml` file in this repository. To trigger a single review, invoke the `@coderabbitai review` command.\n> \n> <details>\n> <summary>⚙️ Run configuration</summary>\n> \n> **Configuration used**: Path: .coderabbit.yaml\n> \n> **Review profile**: CHILL\n> \n> **Plan**: Pro\n> \n> **Run ID**: `178149c8-22ee-4af1-9fec-700dfc0e80ae`\n> \n> </details>\n> \n> You can disable this status message by setting the `reviews.review_status` to `false` in the CodeRabbit configuration file.\n> \n> Use the checkbox below for a quick retry:\n> - [ ] <!-- {\"checkboxId\": \"e9bb8d72-00e8-4f67-9cb2-caf3b22574fe\"} --> 🔍 Trigger review\n\n<!-- end of auto-generated comment: skip review by coderabbit.ai -->\n\n<!-- walkthrough_start -->\n\n<details>\n<summary>📝 Walkthrough</summary>\n\n## Walkthrough\n\nESM2 model architecture refactored to rename the base model namespace from \"esm\" to \"model\" across core components, model classes, and related tests. A new `add_pooling_layer` configuration option was introduced to control pooler presence. Additionally, vLLM integration infrastructure added with Docker setup and validation tooling.\n\n## Changes\n\n|Cohort / File(s)|Summary|\n|---|---|\n|**Core Model Architecture Refactoring** <br> `bionemo-recipes/models/esm2/convert.py`, `export.py`, `modeling_esm_te.py`|Namespace renaming from \"esm\" to \"model\" in weight mappings, base model references, and state dict handling. Added `add_pooling_layer` configuration option (defaults to True) wired through NVEsmConfig and NVEsmModel. Updated base_model_prefix and tied weights references. Removed quantized_model_init wrapper in favor of fp8_model_init context. Changed state_dict filtering to exclude both `_extra_state` and `.inv_freq` keys for vLLM compatibility.|\n|**ESM2 Test Suite Updates** <br> `bionemo-recipes/models/esm2/tests/test_cp_bshd.py`, `test_cp_thd.py`, `test_distributed_fp8.py`, `test_distributed_strategies.py`, `test_modeling_esm_te.py`|Updated model layer access paths from `model.esm.encoder.layers` to `model.model.encoder.layers` across distributed testing, context-parallel gradient validation, and state_dict verification. No functional logic changes, only attribute path corrections.|\n|**Recipe Implementation Propagation** <br> `bionemo-recipes/recipes/esm2_accelerate_te/...`, `esm2_native_te/...`, `esm2_peft_te/.../esm_nv.py`|Applied namespace refactoring, pooling layer configuration, and weight initialization updates from core model to recipe implementations. Updated NVEsmPreTrainedModel, NVEsmForMaskedLM, NVEsmForTokenClassification, and NVEsmLMHead across all recipe variants. Replaced direct \"esm\" references with \"model\" and removed legacy TE quantization contexts.|\n|**Training Script Updates** <br> `bionemo-recipes/recipes/esm2_native_te/tests/test_stop_and_go.py`, `train_ddp.py`, `train_ddp_cp.py`, `train_fsdp2.py`, `train_fsdp2_cp.py`|Updated contact_head deletion and encoder layer access paths from `model.esm.*` to `model.model.*` to align with new model structure. No logic changes, only path corrections for distributed training configurations (DDP, FSDP2, context parallelism).|\n|**vLLM Integration** <br> `bionemo-recipes/vllm/Dockerfile`, `README.md`, `launch.sh`, `test_esm2_golden_values.py`|New vLLM inference infrastructure added: Dockerfile extends vLLM base with CUDA 13.1 tooling and transformer_engine from source; launch.sh provides Docker run wrapper with optional mounts and headless mode; test_esm2_golden_values.py validates ESM2 embedding consistency across vLLM, HuggingFace, and hub models with numerical comparison metrics.|\n\n## Estimated code review effort\n\n🎯 3 (Moderate) | ⏱️ ~22 minutes\n\n## Poem\n\n> 🐰 *Hop, hop! The model's now refined,*\n> *From \"esm\" paths, we've left behind,*\n> *A \"model\" namespace, clean and bright,*\n> *With pooling layers—flexible might!*\n> *And vLLM joins the warren's flight,* 🚀\n\n</details>\n\n<!-- walkthrough_end -->\n\n<!-- pre_merge_checks_walkthrough_start -->\n\n<details>\n<summary>🚥 Pre-merge checks | ✅ 2 | ❌ 1</summary>\n\n### ❌ Failed checks (1 inconclusive)\n\n|     Check name    | Status         | Explanation                                                                                                                                                                                                  | Resolution                                                                                                                                                                                                        |\n| :---------------: | :------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Description check | ❓ Inconclusive | The description covers the basic what (making ESM2 compatible with vLLM) and usage instructions, but is missing detailed explanation of the changes, their purpose, and how they achieve vLLM compatibility. | Add more comprehensive details about the specific changes made, the rationale behind the namespace refactoring from 'esm' to 'model', the pooling layer changes, and how these changes enable vLLM compatibility. |\n\n<details>\n<summary>✅ Passed checks (2 passed)</summary>\n\n|     Check name     | Status   | Explanation                                                                                                                   |\n| :----------------: | :------- | :---------------------------------------------------------------------------------------------------------------------------- |\n|     Title check    | ✅ Passed | The title 'ESM2 changes to work with vLLM' clearly describes the main objective of the PR - making ESM2 compatible with vLLM. |\n| Docstring Coverage | ✅ Passed | Docstring coverage is 96.30% which is sufficient. The required threshold is 80.00%.                                           |\n\n</details>\n\n<sub>✏️ Tip: You can configure your own custom pre-merge checks in the settings.</sub>\n\n</details>\n\n<!-- pre_merge_checks_walkthrough_end -->\n\n<!-- finishing_touch_checkbox_start -->\n\n<details>\n<summary>✨ Finishing Touches</summary>\n\n<details>\n<summary>🧪 Generate unit tests (beta)</summary>\n\n- [ ] <!-- {\"checkboxId\": \"f47ac10b-58cc-4372-a567-0e02b2c3d479\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Create PR with unit tests\n- [ ] <!-- {\"checkboxId\": \"07f1e7d6-8a8e-4e23-9900-8731c2c87f58\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Post copyable unit tests in a comment\n- [ ] <!-- {\"checkboxId\": \"6ba7b810-9dad-11d1-80b4-00c04fd430c8\", \"radioGroupId\": \"utg-output-choice-group-unknown_comment_id\"} -->   Commit unit tests in branch `gkaushik/esm2-vllm`\n\n</details>\n\n</details>\n\n<!-- finishing_touch_checkbox_end -->\n\n<!-- tips_start -->\n\n---\n\nThanks for using [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=NVIDIA/bionemo-framework&utm_content=1473)! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.\n\n<details>\n<summary>❤️ Share</summary>\n\n- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)\n- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)\n- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)\n- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)\n\n</details>\n\n<sub>Comment `@coderabbitai help` to get the list of available commands and usage tips.</sub>\n\n<!-- tips_end -->\n\n<!-- internal state start -->\n\n\n<!-- DwQgtGAEAqAWCWBnSTIEMB26CuAXA9mAOYCmGJATmriQCaQDG+Ats2bgFyQAOFk+AIwBWJBrngA3EsgEBPRvlqU0AgfFwA6NPEgQAfACgjoCEYDEZyAAUASpETZWaCrKPR1AGxJcAogGUAWQAmRlhMUmQCSAB3fAoAaxj1WEgJABk0gMhIAwA5RwFKLgBGABYAdgBmbIMAVRs0rlhcXG5EDgB6DqJk7AENJmYO3IA1AEkAETGAQQ61fHJmQgAzKjZYhI7ubA8PDrKqmtrEIsgiNHOMeOKav3xsCgYSSAEqDAZYLiJ4tGxEBHiHWkzCCYAku2YkEASYQwZykXAvN4fLjMbRYHJ+XDUP5cfDcMg1GwkCTwEjRSjtSBCRC4IT4WAYAA0PBpdIZzKE0XgHkKFGZDEQ3DQSkQzNocXwWNwYAwEmZuAoEjQH3wcqpLHUHxIu1lzMQaBJuDCYDIRBqAGEKCRqHQuEEAAxBABsYEdYGKAA5oEFKhxSqUOMUAKwALSME2kDAo8G44gWHAMUGmtFoyA+4WkkCiqPizyNz38wUgSyUHgUzCF4gEXiSRtSGQCGkTkDG7w82BFkD+F2e8AwNIo2DE8AWlMHWAABtxZEaFlnpLgAPrAoKLoj4DxKDCLpXt6QaacTlD9+BKLOwZ5MDBYvuUADcCmvaMoL2w3NoHXHx4HQ7j/fQVo8BQqqnnQx7ns8thNlA5oeGgiCIPAyykvQ8HoJA5DRJAyzWrgDzPAAFBgCxgK81rxH2RAAJTQdYVpgA4AjMOooSiPEHhIAi6gkMwyDOM8vDSOwr4Itg7wXgwub0ARHj4AwaBljQNKipA4oMI47DUCOWDYNwtA2ipSm4HxKZ0B0un6TQtDMgpikLsgQoIZRNHmJY5osGw17IA4TguEYUCvBumC0GAYQIvBUkxBe+Z8GgkA2D40wTAEPgoDIJCUTEcSRfGkAAEQAALnJc1yQFw0yAbI9xZQkmVznFVrCmwAD8eXNsVmClUgDhgcwOziNwNZ5R0+CJFERl5eWqIYKmkAEUaVrPHJ6kUFa7zSMyCzPFaOFrQwmVxYKohIfADBTSxYT/IwlSlAw9oAJz9BoGguVAHVXDcVoAI7YAuYENcSpJYasLAQS8kpcIVTBKFQqjqNokBWiSZJ5bR0PKHDN5zQIkpUYj0jcAsSioXgLBafJuzyH2tCnVpGBmvm+PI1h0RoQq8BEKQVqoTNGGSplRrUJA7lKDYKhqAi+IUMscS8ceUY8ZpZZI0DfG8+KmbEQi9Eq2S6AeI1tDyLr5L0IMzHGU2BgWMLHksWwCE9t5jion5zbuaw6jebA+BYbw+BEFaTlziDkKYMe6jwApqlUMsCIy3whYhAA1PWmRzXltCx5LfBQCuNUUfTtYpOCHjMHlVHMjLuy+2BciQD7fwkLmJDcJlNKt2OPEgUXcVXlIFBWaxkmE328fAZCPQIgqyqF0Q8oLpR/JeJgunMlaSwkkXYlWnBVkdAAXiwajPMhXgqcF8txITVDiEXZ+ZqH6BaxefCvJgHxW/oxjgHnvP4MsHABBiBkGUEPc27AuC8H4MIUQ4gpAyHkOjWGEstA6G/iYKAcBUCoHDr8YBpByC3zAhA68XAqBYR8q7eQ9dkHi3UGg3QYBDA/1MAYeYixCBWn2viRAHQSzaj4SuDo/dKCaGnAmPKUjraWGmGMEBRCbT0Coc4eQADQgZkQP5eKZA0AO2fvQSsKQN5oG4G3emlIAiYHkKiMxmV2DRkzIgLkuAPjYQnmDOIHM+zR38MWRQOwBLUBSARYEGgyDIOei9LM+AFCrQJkTTKAiyxGLmsk8J7xFCUCiTRGACBkB9gYO2Ts1AaDXm0syMelA2A0xtJtPA2xcDMjSGgWQlBciyxsrzHihQUyZWIhQVEHED5aTnLmWQiBaLQB8GAYBAAJAAYgYyAizZmEGmZAcZxZTHmKIMgCySjyq7CzHCEgS4tmX0QPcR4JBFxbJ2pQCJmY+xgxpDaRcM9+wJ0hEoJgt84j7L0komJ+sOZYBcSkRmmF/GlgwnogmypCLpIAFR4z7B3YU/BAGM0EiSe4yAwnIrhQ7IUTxaI+GYL0mmRdyQc2aDwYU1KiCfh4jsg6tAhB/FwJ5YyXBFxCloMuSldAmVq3oIuMSAqhVUsooCyyYEoh4jAUtLA6SekitlRoDYgr1V9IsVqjKRB6XopoJi9RYTdWiq1XEHVwq9V7INXS3AD5ELMTghQDwsh5QXngHwJgYkaAUCFIPZACcwa/KyXwWlRrjL41RC8sS6Z6ZgRebJHoZ0BB4EfHfX6ILbHnmAtgI1YNoUYHhYKRFtF3Ing7tefGZbmK9yjPgBCkAvrxAkFmN4iBvmUn5bPRc7aJCLmjc0Zk/bJKDo7YuNQ8Fx2SoHUOkdhqx3LIlRgUl8Qp3DtncgYiWEHl7WeGqzJMMNBwTaRQSZRKjEFKwPmX19hrlPA6FiCg8Jtl2P1c2AIfY4iPkQrWhgSCwjJvAgK8yG7MVhojTDZZZdFwXmFHyqVo7jIdHXahldsaFKG3kApMFxdICWoOurUQkbNkkAmQ+YUnKaQ8uQDtLwYgS163SUBEgyEAAe4Fd7EkwAiWxuziPXkcZM5sHSNFgaiMsRNf5o6FDCHit+VGiYlvLaS7aujG1mkvgRogGAeUgqhWxrJZYG0IqeA+NNp041okiBeew8KRL412k8uVwLsxmY0Miq2rlIDTA8IG0Z/5xqOd+e6kLyBzVcZvkPf92xqy2fYJHaQ2jJNJoiCCiLzgotYuI7FuI+9EscTOil8Q+5/NWIwEhBckAFncmeNMMtnqD6UCMGkW8aZQOkFoFwZOnoOjug9EYHwNJ4ConAVkpmQNiPLATpwSAAQ6DwEcAYKRbVExsI4d3MA3CYzSH4WZoRiAQRAkK4PA8shJHSJtnIhRyrlEu1UflzLaXmytgK3FxDyxFxahHvgMe/JbJpgWAPJcwIfsfPwB854EKGWmUFRIOSKhFyITawAXg6eQb8pr6DqKUMsX4QWBaxLioTJy1Z5ACuJvYeAbXmRkAcNGIuJGLEdFgy+fT4Lkj42jsj+SAg0f0+eHp5Hp525hHxGAZip3qBuJeekTIAByZAIwUcCCsLl3Y2oKUyvprRZMs0SCxdKyxUhuAOdyQ0k+P82F/04sZXQHcGvhdtZeCQJTI4KC0U+4zfUbBsJyYqfYM5oOMDISIAAbTyoy/l+ANyUUXBeygeUAC6kAMf1YUicbLSAVA1kJhuF8CsQvgUZibuLJCJLxFHteZkuKUtF1l4hemYAi9eD4JQYCV7wJK6yNEBANY5lFp6PTBZiKYmd4MsWbqmVxnIHhwOU6uAMcT48CcWiv6BnZpN3haOFuCWm7ROeIWHeXy4IECcOttBc1RFGGN5gCy4hWMQFJdOimDTe66fQQf8gjQ4J7qiDSD6guB+YyIBZBZgLaSRCxKMw5a3wwH5aV5FZgQJZ9ClYibiAVZaISaxJvawF04GbYhBzIEXbFYYHJblI4FzSybvB/h8JFLwQnCIB4z8QwInCKh0AaB5KXg9aZjsETaDSKy27aSqRnLaAb7FwvIoGDxQ7/a16A51qXxXiR4NzBQcT0zMhaw8CUFnTTBWBjBSYRBWydbkDdYZh9aQADaVBDbFCjbjaTYkLTYmxzYLZcDLY0xrYbb+TbbaS7b7a8JHalgnZnbJJJ6Q40BXY3ZtR3byKEKPb2DPYuCvb8G4FQCtgKgBJPCoQpjx6J70zJ6tIvh4h27KjAStqqEczLLJIRziBl63piicbE53wMyxLQCDgkA/5JDcwFr3DFpRBKCBqNpw7RSvzoQd6ZQp58CoBNz0AvKMwCDwTHrebNhEiDSIr0CHruYgrAjYzLEwrah4zw61EvIlanQiJwQIRgBeBSBlhVLSyIpioxDYZZiyBJJsoWKVJtgdiUQJjZABTLGLjJL8o7TwA8Zvb0BPx5TAiTRRB5TJJtQAkwAoQvFOrIBCaZQHJDxPwWp2pWrarSoar6poZ5reZs4OqEkUmTJoYGDIkBD9QxiF56HoAtDRiZpKTLJ9iRwEYjJ263p8wHqcaPLrSh4eDLAaCnH9h475YnDinhKnbNi1BApDwP6nYBBmYAYKi/hiFRAOBmJFb8CxjaTRxx6TGFHTEMprBzTY7PD1yE4tFsG8xRCGzuKgxVFmiD4Ei2kN4UicQCyObQrmlEA3HFEqZe5xC0TKnyq/6vHcn1HDJl6XJSjPCga0CaFtFdi55LEnDAmakmrWj46AJ7FelYCMpByt5mjUnLKknxlRyJl/j/G6B1FCqxiyDLromClZlOLagSk5m3Jqr4mapUlDkWJ0nNlvI0CLg0wsZnyBpYkqkKqxIm5FIdjPDLhcYzxo4pnLIaB9jDqrAkBfTNjJh0Y4lxCswUCGLBJ3qpDOAjh/CMBXEsEgqFo0B9FFopC1GhIaBEAaDMhqlP4v4RR0CZAAUjCP7P4UDQCjRkCwTMEnTyR/horSmFnIGna0REgbCRQdGYA9qyyUA+D0y3hQCXwLJWAeh7bajAp1m8lJm8zyTjZFwUwb5NlQBEibxgQ/QCYi6CognxkxBUBmIX5YCAWZBzJoW0UNnaTjnsWtxwQ5FAIo40gdDLDcAei76bmL687kWUVHT7TIRlY/SSAKTsDIAERqUeh5mliLjxlVzqWLj4LKW4AoX4y3ECYwqBJaJQA+CxbBRgQ06USzKok2ZnSXwPwMRMD4hmweRmWvmcbMYIjQr9mHFlhpkZnLIbJVKBwhbfGrlMrGGZhRA+xYSohv5gRwT0zYA9ipVTEBxewASXgsCNL/TOnjhIXcFKmLn0CAVWBWi4W3i0AamwoukJVwIYR6z9nWXaigmcbgnLK0ZcpgQ8RtlgCkmyTCiZSpJRBNwvBAnpJdUxmQCAVQWv7v5ZCXwnVxAwW5gYDwVOSGVl6jVuailykSlSkYpFlikSlhL+ZGA/pnI+zKJgokF8GaJsXHUQWnbVqR4aCLi2U1ZLiLhnD2boB5HBlFGXpcA4wbiZ4wCdGyWQ2P7DXahw0I3qDw0ERvX8gLCR5cCAUw0cw2To0J4ZmY2nAADyxpCwCkUeONHgGeWetpVEhNV1FAZ1oFWQuKD5iAnq3ZyivZCpzAD4+63Z3kitySotUNQF0FsFd1z5iFDRKs+Kctcx31StKtvsat5tmtUAVgLJCBeW2JtohNU1IJgk3G4EvV/VVAg1JNZYkJbpkIMJp2cJsSCJZmSJzZHyKEHZMaiAdyVGHmQ8z1IpTw+Ug5+ulJNqRJ9qNJ2Gk0LyYtEttAmQzYmINo/A4ZX+/6BBqQSA8A1YvYWA5x+hhhENk5tyM5uABEeMaky+vcvMc5lAdV6aX2/l9AUQK5xSzwe5soi4h5X04EjKkcc4UQG5W5ndhNCcl5tAvdDKRoydS5iM9w75C0/RX5mpl8ztk9sSG8qol4YiJ+0yYAjFrRLwskkkWiyJ8ZrZuA7ZaGiA+9qtTcMg+1mp2xopYaFJaJMadRzYAA4rJEseZhpoilsCyQYUYSYuigBpxMJERLEtCrIbGoUFVM6T7LnhxB3LQJXC5ozIIJwUqE3boUlm3UYXXewcRBgLMtGCSKabzP8PcJuB7jNmSGBGGkIcqAiHOOKNENKY1JCN2KQMsgoXXgiIMJWI3dyOoLIFbDbNVrVjSPVo1gFi1hMkgB1l1oVVYcnKUMUENjdA4eIE4TFWeK4Zxu4SsnSuttIltmAEYDtksFRTwodskqEUEK+vZFEzSH9twDOv8LQNEb47EbIvEaAsQk9r5GooAgQdousQpWBJA08IQWqhhREpGuemGdpXWOkiepEtMbecUHjAHsIezb3mGtwzTMvhyUU2JMyEEHjIzLJHiHUfWfTplFeDQJue3trl4GWIHPcG0Pbl3sqJCt2t8i+NMd0ZUC03om040ys8LFYGcFQDTMJJo/eVchgDwdgnxPNnAovggArteJQC1kkUxAEjWLggwCUycFCbXXELcqUilvVLzNMQMswNuuBDjHWNw2AN0+zL0zzPQOaFYDE5LDeTwZJpKOMaFVXdmsBGWMsLJFhJw4BIJNfgiGQ2ptiQGU1fEixqUuyXgEEkaOAXdlAYgaOMZuFqIJFgwWQVXvjnwK3VgalukcdfgWkbseQWZGK+VvIIhMQXhKQUMXAp1QYGYZmJCf1kGENkGM4xNsCujGI1hJ40Vh4Stt4X4xAAE+wv4cE4EWE8dkCKdpE0ZHwkZHEx8rAEkxIikxAfdgkZk0kdk6kZovk/JZsfrMrMKUeoQYzIWKlQfSkE/GU8wBkg09U1HlEhnl5qWJKeSaetko07m89BngsbAItJAJ/VFlwARGgC09qHAmIeoq0zWIc504kj06y/QDlec15MyARAIGisFnbg/V3iW3wIc3qWcppZi1QDrgs4WtwN0QRAwM24lW24Ah288F2/+mi6c5tRcxuNuwsLc/kvrIGg5qLmyY3ay+gL8yAam6pHfmTrW/ALmHLTtNI3EB0NEEJeXo5rUQgMoI8LALIA+IzEHDI9io5viwRAO6SHWkwDrsOAsMyEeycHhGu6o81VcwsHjDg/+ImvwUk1Y+YTY3qwa0a64woO44DHrBa4PFa14cwIG/44E461wsdEEeE262dp6xi9OZxCy1ZAvepckz4XEQ9qGyoikeonkwYAAOrJBVvPALJ+ATBWAhDvziTepNVniq3cTELRYDypVFuFsVNnqHMFlmqAIZtZuVOHOlnoBmKeqZSyaUxo5hBXm7FrO1thmXvPDM6FozSTMg4greeeq+fOAAt8CMw6fFjSApCXybz0spuDGAzp0kddjiSWHYuxK4sl4LBZHEuksEvd7/rpWTNpEcuyJct5ZhbPCO2CsxbCsEsKvUGkiSu2lVaYBGMIgNY1jNYKSyBtYUCUc6vkf9YOPuh0cmsuFMfmvzaWtLbWscc+FccOtbROt8cushGCcevRPeuIvifO4Dg2g9D7gBsydpNyfAoKc5OFWSvb7/q/tiD/oLF1ZGJcBud/CZcbJGKGcMfPCq3Kh/NFWOY2dbPVP11xR1PFvZuXp3qfX5ZOew8+6NNFevcgpTNEvYSkuM6rQ1caFLwEsIBGqUA3HEjai1v1UMCmHWO6vWHzf2EGBjYuNLeMfMxuHreeGrZbe2t+F7e8ehN8ICfCLCfevhGFGREkDSe3YPchtPfJEvfKdQDRnAobK1HFOPweIWrTvLJ4lZ2lPI8Ubpfkmjl7LgSd2icsaYkWLwZhlPt/MAo2QcQGaZTw59zkesOlbBH0/MsPvvk/hiD4RpLeb1MufVNOn0A33EY28dDpxIb0BbICkESZ3EnZ1Xm50EnYZDN31jUsalp6KUSG4cpLX0CV5m4Iid0Isr7DzxCEGXOAQXD2ZJV6xbIWYVrp1Z8W9nrPQ1HW9m85LLKJ9/JByEwRdFxTn2/d2J0TJu82qUSepb5/p8C7wo4TuAKWmpJWjsz8aKSxK7VI/WfTtVOo+45oXqKY+X+NM2QPPDhFwlVBfX/sFkAaShu34s4Mz2TgEoA/XLViz1m7WEnQQ2J0Itymy89ZsLHRbILxtabY7W3HMXiEwOx8JnWETRys+2nxTkaA52fZl4EXAegoWajJQpbkhyygleqTALOk0URDxnuEbZNJK0yLARb8ilM0qzSTyWlSiYhcoi2mQAM1aa1RS+P7CFDnAh4yjPMLEilI8lpKCwB8A6X6h2ZcGHRX6MyHhxqQbcUoMQon0h6r96Y6/ZsPbTYZ7Vc8eveNk8nrTworCbtMzLNU9qB1oSsJEFBHVLCF1RK2tPqiQAGrkAhqqxKACp3IigIti2oKOKKXqj2B8QBlWzIBx2S9Qr697Xpi8CVYuIPgPvbDB0AEoCkog9g0sA+ET6/0Vq/9OOs0ETanJqW4DazjbzR4ylzU5TWod0UWo0M3KS3VgGZVogL8m+n+ZTIiHwDChQhXAYen/y7LT01yMgXFpAA3pUBtyldS+HPQPLfRXwDzK9IzjLRJZmKBHKsDo3/pEY98bzaOOtQGEwxxMQA7Wv7TGZ0U7cyrMtKqzhygR/BIKSHq3HCgs0CiRAdpugGQBc15MHgPmqzTTxNEicjJOfvgREF/kuBHwr4W51tIPhz8fAaQfjH0qxoACoaOCH+TWLRtFKNMbhNrCsHrRCCJZXnLUXPqflT6xYRksISfLMEiqxfEluNXiHCUKAayGXBA2aJfc+AP5P8uBUgrAVyqpdAILyNOxQUbqcFA2o9T/DdFd47QmfmZVehE0RR/I86l2VAboRaiIfFIQRHs5fVgQeMSRrGgeLvNkk0HVEoA0/TCZE+qdI9BnVH458aSOdako6hjSowJM2tcSmhRvrkJu4UgegNxXKRtY+KDggSlMz3w0ZEc2EByvxURoLsAI9wXmC0kvQdJBk2rZwNKJ9H/RSYb9LMjVReQ714utEQCnrntHLIZRjAimKPVsxWjYk+pOLDADmQTBlkBAW6tOWAiEws0LQ7lHFXhxT8NWyiCsUXHxYA9Ky9YxsZn0QCPBk8poQ+vLSOZWh8Q1ABGoGhXhSAFR7gMCOaKd5mgYM5GAqon1xKNCx+I5I8a8QLak0nRx4+0c6ONRYA+MSoVDs+UzAERi6IFQUcsjFpij9aCFSUdpAVHvc/UsVLyMsn9i55OxDGJ9hUQxJUiawL9Msc4QrBbQgJz1RKmDBUAP1XMf7XvJfD4Gjh9GjXcdkgRa7iEmC3Lf8B11QIisA+VBbAr120T3VKQwgiPBzEJrLZZwwNFVpHy9HTDyaSNAiFEmZqCoMa0xbGqzTxrqCuikAZFMiniCXk9keMGEGR0sIAQiAOgqunBg5ITU/YuWNgIGjRqCTuBFpV3k7mkjJkgWqEdWvKShFs1piCo+ifTW8E+1nwAQ0sITRGD3kC8JAVSqSE3DY1qhM1D2vNWcEeIQ65caELaI8FVpny9kx/D4L8F0B/aLEwGooC4DFCKwpQ80SRxUgaScYRsLsvrwsl9k/J9xVCg53Np7ECIYrXod7hsYI9WMWEfIfT3162Sopio5gLFN9r+CEp9JJKVYW6EsZbhoNVQf2GZAaTtBPKeilCW5Dzki4VU/9InwIj912YzFcjg+BGEM8x6qtcYZ2BmFoA5h75S+PuQXrfRmpzBaKeqTMyE0/AINe4YVT5Q8TKa1NR8HTVamM154ek/ItZLDJcBfhJpf4fzUFpStyAeMCqcbT+By1+aRGZQUFnxq/QXIyJAAEKe4a6FALgFZJ4Gu8oZqI8EUxMhHvDPpl6KKN6S2hNhkSpgzApqNZZQJWaJeRqHbiUD4gZo0WMsnjPRmXpIpp01qadVfFl1SZLJJgq2gpk0A+UFWQVIAyX4YlPiZoLiQAG88oCGNPhoE5w+40MeUSGNnzzrWo8+F4gugAF85orMZACFPCS1DNZtqE8U6grgkzmy0wZIZTOPCBpjR7Il6rl20wpp70V7UMZuSDrEZTsZJWFFTQ1qakhaFwszHxOehUQTpCEM6cwA9HChCaZM2zFJXpwhZvRnFP0dVQDHO5ox5uMrmGOtq4jxqqY2KLzGMmRirK2c8KOF3oCJj2kssIuXNDFb8y70gaInOnUUnJpaGVsqAFBV3opc2JVEjNEjL6Htzesdva6fhBGlZojRppLMfBFEj6gVG99X0ezKjmtTixedeOSyVmmxQq+rQ5CeNWgANikiBpQeE2L1qti8QFIxAAOKICW0B5RBO4ZHzro0teY08ssKFRXkMTtaoovWvRMNp/hN5ZgpuaySRast+Eks4WbHTFkL4uw3VeKs7OeDuDzxJsy8RrJVn6y0IRsi8Y6JQUF04ZzZTIocLLCWCEFNgmpFwDepWd6ewc4mqHJyTuwWpL4gUZkA0D5irygCzAjpKBoPzBpI8ugA+BYp7orab5PMNWwvo218yJUr6pQt+owRGFIcwtmwtoAcLbMXCxQDwpul8LaAD4N+UTytrVjuyEi2FAKS9Lt8POtMJulbEMY4RjG6rQZD4jtxho0R2ERrNNwsIdz+s9jRxqUE44oDdunCdAUESwHHdFwdwyQLcgIEm4iBtyUgX9hrzqNBOoSiQDQKDb0DEiTApTmkW0RsDsirVSetW0LL/RNJ70oSa7w9JGkyizaVtIzBfq/AiAPKMCMm00aITjIwIloqThhkSTL4XIXoixCiAE9caOKIONYPUQ4oDJZoS0hpxTb8NWplwy+HBN/xCUpYZwnRBsUUobh6A/ZIPikjBI8YkRLg0Om4MRJHMxWTI5ZU/xwgv8zQDUjwI4PmptckCl8EhhuNPHvEi4+UzCmSGygvKnUVwhQWWUEY7kNqBVXIaf1AlnAMmvJOuEVNcxp1m66PAnL6nGp7FBZouZ9ghDyrFIFy8qQgqlLbJlCcMvMBZX8uTl24P5h1YFCQ2eKkl/6HxL9Lb2tHWD1ZBJHBebLgY6j0KmbbBVrNwVOobIVSoQd/OVGS1Sx1FIeGcopCfK1lYEfSohXxp4VNmFAIimPjhxxlEa4zPkmIWrByQW+c0f0eIEDHTVbl8ZDoE5WzGezjIxxXnK6kGgnQzIllMADLHUj/MSVmqsFlCQcpmq55C7FSEvMmZP1FcQYBiDELlW1cDcJ5XeeeQoC9zQVsKm0SZlKpJDQF75AOfKWSQoUEVu7RWnsReR9QSc1IksF5RtIKL6e0g4UTrRLpgVOZ11X+RKNpjIVMVvxOfo5iXnRwYsnyEgPihgAzJZVhlJStmO1Vf04xYkauVkDT7lhGkUWKMnAt0WH4TepPPgA7AXl0jXMKEhNeYIHJX1eYwZD3BGT5CqN3Udqmvsn1rLqrrhYhIxKzAmQb8d8YEuKnqRTIO944U0kemCIKz5U7SUwnaXtNFy8xFhR0o8pRmX5hpLmOwjiHsPhz94AspMYIeiTSAnDsk/mQLARJ5ZESHlPLciYPDQKisWS5WWiQwo5mMTI8hNeGr/WRoDTNF3VVGRGLRmGSsa4MXGlnnEkEbV53tXwR1PikXTkSNyu5RCTAEhTJozCcKdqDahyLCNJajwCRvunkbx5gELiYTm4lkbU1ywGmjjOjmvSBJH01mZzW5otYARG4AGcLRY1fy+R4tbmQEEJr0S7y0YDyZAudzQKk6sCo6k/FlnyzCyis3cdkhVmQxTeV41BQXwtl6yogrmqFgrKVnXjcAqs4TR4GNlHjWVfm3WYTWOA9k01kizNWVMVJibWNQq3WrdT/k/iFgSWl8pQo+r1Cs18pYEMZujntSnJXUico+u7q7rkZqka3A0o/BCEvAbW6Li+r4AUj4WwKjyUBvMprRbBLWgUEtLNAWrlka0/FlPS4wfrphe+WYZ3V3KHTF6ItLLSZqVFmbmFFm5EkotfYgNitgc2FByvbbZrMtwseRY/ljnKLkSpGxGvDT5gM9k0UaJZW7LbQZyDVWc4MYjUtpYQ1RBcljPXKTluqMA29C8vF1fZqjLKjlWecYym1nbAEXqmkFVrXm1DRasSMVhRqflpFJ5+IhBYQWAhvp5A1ZPTC0FBZYASW+iuBVEG5xEZoUtRcPjdMIb3ycdgEAghtqu3ibTNn4/LfWpkr7bId/nVJAYt2olbNSrO7HbJrBrJoRa/1AxoNxsW4A3F1Hawl4rABONtufioJuLwwEsoJeISsJVIFhwYsvWC4bcniEcozQ1w+AFJbJ1V6MD1ezAiIFG2lX0BSwZyXUpUO9lOcpm0jRDGhTPExb0k/usQIHsxQLFYgPAQphfC941YaUvORmNI2qrmY/o9AQtYXmCS48SuM7RngSwGUVcraBBQAZARQ2hY4CfLEiXlkw3xYcNZgvDe9lZ2c7VdrPZOJEzABBBoBzhWAcxzW6scNu7HXxaLwCXBLglK4UJVpBN0ECZ4fYacrQG4D26VekKp3eG0yWRtmwcAIzuDytqe7OSjMMPUuHHVPwTMrQjUbbPfKpJz+pNQ/RHvmJSKMe5JDCrfrT6e8wUPvXnH7yUlM7tSEfK0DwURl7qhpNTBAGWS7SyBzsTwWMAFgv0kAfAC6j+jqs0G85iI/AGKOtNswl7/qQbJroKzQ38tcs7XYsnK0ondcaJ72IAcuRIMYM2GJoagvIHQ3/hyQgEEsEeuA6oAaY82PCUtiV11YRuTWcxpN1b1gCBsdhbvW4y0x894BbHIXsPvta67Alh2cfe60n3wIIlnk2fduBTDxMGAi+u7srzoGPdV91CF3RQZbDuymqrzL2aYssPTNpQwaWyOEN9moAPS+EaSLobdzPA9AkAZpqD1NYmdx2mYSdpZ0z2K8Y+tneHkjpCNfNFeRvFHpKryQuGZWjuOsGbVGqBcK8xvA9gBJxkPBJmucr2Q4eXanMlmkyXgsiLg5gwpYoZAma4dImMBbIwB15PCkbUFV/cZyP7AUaXBFH5ma4VdspJ0HPE00Y+TEdgc5bl7E2VegVkgVr3Yb75jevrltGEOWE5uYhzno4R56SG4B/ehAZtzkOoDR9B3TAUcaN1T71Dr6DqQvUQAL6ggS+ww47plXO719LAt3bHrd4vtUkTixzJ8nwqDIXwbySSL7qf1crjeweqhTFqx7Mh40T4bks2tC7OYicuwJYoCanoP9XeSEd/hSBAUXc0o6AK/OwEvZJHNEDcK2gfumx1t99jmDZK53U4WH8uMMTzs3k1LM78Ipe5DY8noCaEgjGAOWmgGf4Ig3+PxxVXMMBMcGR6vou+bnowMMASePedQjNAzKbQcj5XPRWSxlbqsxAmrcY9AVQ2V7WuBB0idFmIOdd0CDenrmYdYncL2dzwLRXgQHnnZOujBjEooCPXLGPF1hWwgt3WPc8YBWxvvV40QHC9kBI+gIiceUMghVD4S03ZoauM3G4mdx4NivseNr7cmWSgwFvrB5dk5SjzMGEKYIp8AATiQdNtH0H4iV0t9/SJC2BzhCQuCt7Kut4neaImeQs8XYmiYJkYnDmqK3E2JANCSEPJ+Oqo9GBMPbVKhfEcQq3C5ywHU2hJ9xVlkFNUnUolpJbdI11JUAB4L5RmMkgbySgUstkfDPHo/11g4onBl6oJiTU4mWT/+yAmWClP4tnA8Y5RH53v3yDSV/AwRn0BOA/RhI/sP5plEyn5dISDXMvTqYr3hoDTNe40xRK664bzTix8gG6d6z9ZPTHPLnsa19Nmt+eA+wM/sf8WhnDd4Z1cPiDjim6olFYYgbEvIFjxEl1A/Q7QMTMMDkzJh5467vdgQjXVZeA4TNFtB6T/oLMujacAhlMbOiIKIjdUQe0U1kapk3omhEoW0bPh0xWiAnIzQHFaiAUvjSsdak1a/ampHjWpac3AoDloU+EoiS3yakrwzO/9Naf0tWRo5lw8S0jS7IvDYwJkfSdCOEmQAfpPNP6YCLxq2lRVVyDwAghQDjx3SbFjE7aVoiwa4GtK95QSLblUbq1O2lUZdRy18661SFMQluKOYiza2oW9zeFtJKq11GhBJBRCZQXxa0FBdG/qVKwXlWeVbK5oK6KCFqqXzYOk3jb3lPpk/zHIggCjJyD3bf65og6Y9pKEANsMQioUqQpuWxaSxUR3zRrMs2ATY0BipjONRuXIiNweAMQuSu7nC7rydYT7r1cpAnAB4B+R8RNbjXWCJdp2h/edoq2nYIaTC86qwr2vW1rr2oMOS9C1q87a134gXRewO1qj3rHgT61zqsDjLMTfAMNUQCpnF5ERy6wlmezrj4Y+L8l9E14NoUFCeA1M2KC0DWZoEeTVMZHaja+GoBxJ5KZPqn0kpnr/l1Hb66dlu3PbZIr262vqt4pGqeJglBIRQEKEvkNiLyEHTTdfMXtCaulYNcdD7Wg6y8R2x+FGN+05yrD8cf9DXIoDJiY5t4ZwMRMjTdEDtApXarDpR0IgbD7nW1YLi8CKXqkc+KQFaXhS6Tob4EXWzeS4D03mAlw8y7/t6tZndIcWIQVtCOZyWvhl8GGOEtDQeJyl/6TGV3PMMOzo4+U6y8fV2rrWuzDafIzWjKS4BeTAqrm8yJWX/jiUlEDoJedl1ZYXkc6oO3JAHoMr6RKExPutaRFiCIb282iAgzOap640qoNtYAg2UvhAAOATAhAAuASXXCR4EU1rGvp3w4gymDQwkkUeJko/qOBiY7y31PV6iDX2KC6acwILHtElp9RcZK4ATgFN9lh6b2WZBLo0MS4ygCuO4uCXOl6w5BDgPgQpz7ACoPGnlFIDtg8oNkFoNuD7CNIF6ssagFwE6zBZ/heUK/H60/v5QjQtAdPK/fAcwOq4TcbdP2jWCUgb74kqE6dGAgzp5csATw99N0280x4hmraFCbQBcY0cR5KcfTCNAEO/hUeYh75dIcI5iYLuQXPg48uEP/hjDrPE6FKBf3cA24Mqlun/r4gKFL9rPHlECr0xIHAd9y+g86LMgpJMkuEKwSPCl3nykACcKJaIDqPJwChsfScYn1EWlwkSrjNEpIFkD4lFA6i8ksPBfwYDya0XIjn3u6Xdlr9xEno5pGtptHDk9jbVrMxeOJwBjsM0Y5UMmOSL5jsizEqsdsQElVAux7IAnAOObZTjni1YQnByOvpDGjwEeBpBmT+Ak4HR0eEuggpCgA8hx1dI4kc6wBB9zjIpse2LhlNqm56To802lL6NnlvTf9KYdAyvHwC3x5je1BBOQn+FsJxGYidmOLHFF6x1RYSdXZkndE58m5MXWSz47GTmOvZvGviygnT1yWkeH3EeIsrUQCcLVbi31WEtFso8Cc9KszW0FFVgLS6JGc8dFDxxw3cY84ymPPJpFtpjM7ic2P5n9jpZ8wRWdTR2A5kBG1xInCbPRZ2zhfEeDrriCO+Z9WJHIwUbWglGCNsNFTcxQ0q3lZoF5IM+22VqAgiz5sBzQaRZoYGLwhCACnWf734Qi4U+n/epITgOgE4HDky8pcQ5ahCLsrn2HfZxqQUE4NzcKA83IIjwGkxmCOsvv4uswZo14qklKeB195LGCcMys1TPQyXu16NVDrUXsTH5pBTxq2zuLyBE+My3RXHZOfA29H6WjlxduYB6OIaRLiteZpeu6uryE4Qmi65/l5b0rIWd17vS9cfYhbbVggvvbEoBAJKwoI8PMrFXwS5RQEmW3zB4Zs3Ax7F/kjeSIy6VYx00HsGsOMKZcEhuWSIYAjFZS27cTdowB0iFkeXCb98pufwh6kaL8IzxVupnd/MsNUVfEQCNQyHhoSpAPBQhRQHea8ATK75C8B4GWXLJFpow8ljvq4g/FacYkc+Le33O6SxWTp7g9Yr4OmMxurWdrCAKo5t6NdTjb02hZ71+nVuXjOZD4210hn9uhu0uEMAmA6rKAD8GIqkqMOMWXszF97Dko4ECFIAr7ySO+9MaCwEQnF2aHFH7ykRli8xVECoz1Le3DSuFL5PmeQASAgwyyc0LUAmDTAfDlQDQDcAIAfCeCAAaSoyvcEwWvF8odBDV9rRgkwGYMB4QYAANNOFkBSrGsVGWfNEO3lkC7NqrX1Q6FiBmhQ6+MBxLEJiIyLSkouuH/D+IU7Rp8sTcbveGBCUCdoOI78MTHNHUh+ArA0wGwH4B8D8hsAOnaAOBWgBsfwKAQNIGZ4WQLJLPjAbAPDLSDTA/AZniYLkFyCM5XE0SFK8x4I+boaqMC1E4NszTvgDo5wE1MLDw8EfigRH4oJX1mhtxuAcHl1fTLIBbhgMuOWyGXifhxQ62cbYyhvE6G4AvZMyvuPjcMQxgC41O6IL7jk+7BIgGzfM8uGIo44n4VyB4OnV95xf8P7oJL2AD4HMR03kXkRiO1kCOQRxgkUiG+GhnqRigIQL0oIgVFWBcGeZv473iiDrmxCQYDQPaEO9EZerbiWsUVloh+Aw8BcVSEis5H/5YkAHbKH308m678ObcaHgJGAgiAWMoYoCWPD1MoBEPivMY/hJAuTHl70xjDZBaw2kGYL5BuCyQAQvcXk4lQCAa6HEOZmPGOxmQ0gN8LyGXnwS59x0AShJQUokpPrIGwd1JmsmTF1MxvqNxjnoUpP5KD4Ap/iEbwGVQU7Egm8otEYYkMGFBr7CkKcOukI5vDJHC5ASAGpVKipEKRYqi4IH3MG/AW8fgvw5sYKCpEWSLhoAHNcjz4FyAiYSQwEQzMJBUmngP4XSssly2N37tfg4kewFGBjBNIFqOYlRj85rBOLF4RcJOMeAQWl7gB2rWcyj59BDZ7QmP01tj68bwbogOF0Z/ruJ9wQyOGgf4J++p8MXafv7+ny8YMCM/0I0KeGfBBSDXzow0BoWPH6CKJ/7fn8K6BB6AggQSkj4KQDVmsESrEud9AXzB4bDAe33yp58D7nKMl/nfVpOj++kGOtnBtOEx2QGmeJp9V3hxdMaV6A/+7bwfACzN8LBjIQr0SVEiCSwuADGeUQ76lqr7HNqRlf/PrABr95jw4EGVgWoMfLizMhKGXfXANhQp5jArA5oewH5wp56YsA2v3X319DfMgGN8FgIzCVAbNSxRbBAECACWAA0UThmJnYcxToBKkDRmapWWOs39gfvbWATxp4aPVuJBEWBW9l/cJ3zL9eYOAP+9p4R72woXvOYBedVgeFGwpvwUCDBgV/IhCHcYA0KELJ5/VAH9gSQYmFQCL/W8geIVzKQARYJCLUAz0skB8Fz0uQE4G+JgsYcGtsBEbs3+YQuR31L9woYSkZlWAsrn79iUX9WUQ30HDHMFi/EAjEIplNgMQ0FdMH0NMl7YiSh8yJGHzr15jWC20R+DdJ33sk/cSBT9YAZ5zQEifCEA6BvAmvz8DIAQABQCYpQVYuMUQDwBBtQvyugh/aAzr93bQcDEBniE3FiDOSOKCV8XwdXw8hulXnCn9o4CgMJV6AOfxfYq3CAh3djGDwP3cJuQ9yD81dZOCDB7QMPwj9luKQxx9B9WQ3vcCfAIJONifb1gn11wTcDIAdwBSF+hJkWiy/cHjTP0U5s/FiwA8hwZf2aovAd8my9WRbLzOBu7Hhl3A78OrCSClbPgF3ALfd+jGwiwKDVA1tGcDT0ZB/UgMlh33WWDHNDyK6AnwngHGniBWRDZBIZ8scOF98tzCCEv8COI11qEIJQQQLQSAO0lnhsvUNAdxo9T81+hCRBMGbAoNfrx3VxwIhEKdN/IOFgA5aZ5SkD/ZK4mlBmxMgGaRQQSFgIwXVaki50R8TmEoh3gwiEWQ8YNehh4SDVKmbBFkbEIBhSFWAD6AOQgwEo8kEZqhaV2gZsAWRcQte1Pk0vcIRxwZlEhnkJZnevBO8VqOIBexzQFTkbFnETUFANMyTc01IgdXqy9RVGWmR3Zn4KUIRBDQuID0ZOQpZBgYGvLMkyhpgUmDmVeYZ0IIBPxEXC5FtSWJnvopyU1l0Qm6WhkZxNyKgCuUguGkDWRbqJPjN5lkNIBCBKQ6hiepZBXB049AATAIHIFtHsNgIX83DUoAKDXtDKuQHiLh+8DQG790QiG0xDKAGmnQ536asifhmXPABWVagcQFuCg8egiQIQNdAPfpUQHjBUAArR9hPM4VGmlbxQuCbG5B7yf+m6JqjREOsE2+JAFwkf0E/EyD1IG4SPIkQ9Ojr9xwOswWgoQv3zhVU2QyAvAW6FnBMCmBLEBYYAECGl7Caia0DLIr8Ta3fIhwhNh4AXwIUF9RCaJgFHC6cN1EnDqcN8O0ApuZEiiVHeVML38iAEwRp5Zw9aF9IWRd8NWc+wh8PbAnwpCEJ18YOLEohyXLhyfI7wsXxgwxOUPgCocwkAn/cl3HVghEHgMvHdsBMOEMXVNSSYBUhoIkpm6ICAafBgiY9R8grc3zM2BXgMAMX3JUbA4CzsD8DFexmNnAuYzIMJWdLD9sNTKyFx523UnXZCHiVuSCM+ASyFE80g+4UXxHkGFCPVuiVAxNx/SVnHZCnTF4hqdCuRIwKQmfPWDeRxPZm1C4HglxXNtkfWxiCBigNoIx9z3ejkj8VuTC0Wxb3I1GH0WETBBEwRPUmG/cYqDoTIREYNAEoRndWhEjR6ETQARgMEX+HOglwU8AToTYS7mMCmEYKLSiGAG6DuhHoSoEqB7oO6A9BygIIDQB7QcoFqjSgAQFoBSgNAFKBKgAQHKAPQEgCCAPQD0F+ZKo/SDyjWEKAHNgKaTKMXBsonVF5hUoiAA4xgSSgFIA4lNiATo3kU+W/gDAaWXHI8oJAFsB4ZT+ikgPYHlHBsaGKLURN5AzaKEYdgWgF2idVWwBOic8Lok2ikADmgHhowUyAwA7ojfAejsgTOFPAbAMSBA9MQP/kQBzQGvCi1tSL6PygaYWgD+iMAdwAzsSAEGLYgwYxR02ioYmGIjAjg7SERjJIZGI0FNozQikgxgBCCmDAYqLSkRGQfGLnlsY+ICJAHAILEQAotKPHHJsgDaOyA2Y/KAUJcgeFDJi4YmsAUJP7ZmLZi8oScj+BcYiGPZiYSY/DCUFgHmO+NPAZ4GVxffOuiiAmAyDQbBlcHCOcA5aEUCd9CgOs2hMYEbAPCV8sRmFsBmyHMEyglY7YUboawNWJYUBY9mO+iBEMmMvIE9IgAdjHYvKG4QKIq0A+izox2PygvEMfAUhqYrmLYAyY7Ai8Ao6bIB1kKY9mNZjPYzmO5jIYDGIeCxCfmLjjPYkWIZjhhe6MziJY2vkwAQsWWP1NMYucCYB1zMGBzI4hMKDSQ0AOeEgArYisC0gWGO2ICB4+N33hUPbGAgHNUAFvEyghiSQmWopYsvFGVHMAgkM5H0bYCDQW0K33KDSTC8HwwMhOnk48J1FuN2E9GD2Ilig4R8O0gyYo3H8QOdZqitBjwxCGttB47kD4gcYLNH9wGPTAxlZUQJQFB5SJUyl3VqYdTBJRJ8Q6y8R74DxGVxgQDWKiBlcZJGVxQeHdUtIJ45ZHnMhIPHkDCawK4Oti1AW4I0At4oWOdjIYV2MohUE76O9jcjX2K4BwY/OKFig4nxA8BQ45OMhjIwTQL3jBY2OMFiE4iWKTjw4lOIrsJtW2FOtSAbBPyhs4sWKITvowuOlj3olhPG053SdlzFkAe6CdANASoHtAAAUiihToFIFQAHAebFOgUOTQHKNvoN8F6JdwoRhEZUAD0CO97QORJQS+E4TRdjnAN2K4S8oEhJaxyE5hMhjWEoGOjjIAHWXHIgRSmJpBbAVOOoSZYyGBajygNAGDABAYoAEAngZYDcj7oD0HcinQBgCdBigWgCCBlgNqOKBygBgFKAHQD0FKjaAGRPaj+o2gHtAgwFQH0g0ASoGtBlgcoCdAnQRJMLIPYuWTnlbAXmJIAyY9JN6jUk0oHtAOk5bz4cUwcoAEBkkjpOWB0k0oFoAliNACCAckuJLiSkktAHKjigEgFCT7oIIGWT4k50HtAPQfpPtAckzpNKA6k2d0oh3IDhJIBh3FrAroaAKLQYT8oCv0O5BEY7gdMLvacAuTBY76IIAsQDwAWRg8UcCi1igMxLyg6CDDn7A1OI0BA9K7HOJ8NaEohLAdCfE4yl53WERDBwxEK7CeSA4vKFeSFID5I7Cvku0F+T/khgiBTYAEFIm0wUrvXZi6EiWOuTJeV1ml5TuC3XO5CIwVCu4aAG7mmDZAZFM9i0U95M+T+wb5JxSuUxAHxTCU5xK4Bw/UlMhSKU7ZQiYzdETjl5PhBXiRSuAS5JeTJQdFL5SotSoF5TMUwFOSBBU2VDVSIUwWKhSBgsZ3ecVDF4TwFzjD3xidFogHDmdTsJJXlTIARVKgdlUzlM1SwU5bw1SAU/lO1SnE3VJKASUtmLJShY8VIItIzafW+conX51icbU68FscHUp1NRSXUjFK9TvkoIE9S8Un1JES/UnwwDSY4sVOhTjU/XQn1bfaMxpTYmfJ3iZgoW3XjTnk51LeTk0hgh5Ta0v5L5SBU31IsQotEVMDT80o1P11Q0ktJn1LjbQxrSUUjlIbSYCJtJRTcUmAjbSs0jtJKB9U8lILS+08Z1XAB0jQyHSF9eM0eSFU5tLHTVUkoHTSZ0zNNBTvkxdODTl0oJVXSw084xjNlga424BbjHdMdS90pNIPSfDI9NHBZ009IXTRUg1JDTr09dIuM0QWM0fTt01lN3TR0t9LdTJ0z2OnSv0k9KJSz0v9KXTe0yvyCDhglQ1GCtwCYL3AWUtlIlj90mDK4AnQT9K1TgU9tL2QotKARQyL0tDKUNr0yZwjTpnaNMUJbUqFhosIMl9Kgz6099I9Tm0+DPIyCUyjPdTc01xPHI3EtxIMAQowSDmj30W5AUIE6bYOmjSKUmH7QkHe3hWiEQNaNc0Gk34H+YbZIkGZg6AA6PUB3IANFVl7QKTJCinKdTNzJ7eZTOYQgAA=== -->\n\n<!-- internal state end -->"},"request":{"retryCount":3,"retries":3,"retryAfter":16}},"response":{"url":"https://api.github.com/repos/NVIDIA/bionemo-framework/issues/comments/3923853930","status":500,"headers":{"access-control-allow-origin":"*","access-control-expose-headers":"ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset","content-length":"0","content-security-policy":"default-src 'none'","content-type":"application/json; charset=utf-8","date":"Mon, 09 Mar 2026 15:05:55 GMT","referrer-policy":"origin-when-cross-origin, strict-origin-when-cross-origin","server":"github.com","strict-transport-security":"max-age=31536000; includeSubdomains; preload","vary":"Accept-Encoding, Accept, X-Requested-With","x-accepted-github-permissions":"issues=write; pull_requests=write","x-content-type-options":"nosniff","x-frame-options":"deny","x-github-api-version-selected":"2022-11-28","x-github-media-type":"github.v3; format=json","x-github-request-id":"581C:6FA3A:18528D0:68B3E61:69AEE1D0","x-ratelimit-limit":"15000","x-ratelimit-remaining":"14549","x-ratelimit-reset":"1773070068","x-ratelimit-resource":"core","x-ratelimit-used":"451","x-xss-protection":"0"},"data":""}}

gagank1 added 4 commits March 9, 2026 15:52
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
@gagank1
Copy link
Collaborator Author

gagank1 commented Mar 9, 2026

/ok to test c31c3e8

gagank1 added 2 commits March 9, 2026 17:43
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
@gagank1
Copy link
Collaborator Author

gagank1 commented Mar 9, 2026

/ok to test ef612a3

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, what? why doesn't this work by just adding vllm to requirements.txt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To link it to an existing pytorch install (ie the one in our base image) you have to build it: https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#use-an-existing-pytorch-installation

Copy link
Collaborator Author

@gagank1 gagank1 Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw that script is optional, the way i thought about it was someone might want to quickly build the image for training without waiting for vllm to install and then maybe decide to install it in which case they can use that script inside the container. or they can use the build args with the Dockerfile to install it on first build

Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
@gagank1
Copy link
Collaborator Author

gagank1 commented Mar 9, 2026

/ok to test 60043cd

Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
@gagank1
Copy link
Collaborator Author

gagank1 commented Mar 9, 2026

/ok to test c2d2351

@codecov
Copy link

codecov bot commented Mar 9, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1089 1 1088 45
View the top 1 failed test(s) by shortest run time
sub-packages/bionemo-moco/tests/bionemo/moco/interpolants/discrete_time/discrete/test_d3pm_parallel.py::test_d3pm_parallel_interpolate[1]
Stack Traces | 4.92s run time
d3pm = <bionemo.moco.interpolants.discrete_time.discrete.d3pm.D3PM object at 0x7dd3bd590140>
world_size = 1, device_type = 'cuda'

    @pytest.mark.parametrize("world_size", [1, 2])
    def test_d3pm_parallel_interpolate(
        d3pm,
        world_size,
        device_type: str = "cuda",
    ):
        if not torch.cuda.is_available():
            pytest.skip("CUDA is not available")
    
        # Check if world_size number of devices are visible
        visible_devices = torch.cuda.device_count() if device_type == "cuda" else 1  # assume 1 for non-CUDA (e.g., CPU)
        if world_size > visible_devices:
            pytest.skip(f"Insufficient devices: {world_size} devices requested, but only {visible_devices} are visible")
    
>       torch.multiprocessing.spawn(  # type: ignore
            fn=d3pm_parallel_interpolate,
            args=(
                d3pm,
                world_size,
                device_type,
            ),
            nprocs=world_size,
        )

.../discrete_time/discrete/test_d3pm_parallel.py:76: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../local/lib/python3.12.../torch/multiprocessing/spawn.py:340: in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
.../local/lib/python3.12.../torch/multiprocessing/spawn.py:296: in start_processes
    while not context.join():
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <torch.multiprocessing.spawn.ProcessContext object at 0x7dd33856a330>
timeout = None, grace_period = None

    def join(
        self, timeout: Optional[float] = None, grace_period: Optional[float] = None
    ):
        r"""Join one or more processes within spawn context.
    
        Attempt to join one or more processes in this spawn context.
        If one of them exited with a non-zero exit status, this function
        kills the remaining processes (optionally with a grace period)
        and raises an exception with the cause of the first process exiting.
    
        Returns ``True`` if all processes have been joined successfully,
        ``False`` if there are more processes that need to be joined.
    
        Args:
            timeout (float): Wait this long (in seconds) before giving up on waiting.
            grace_period (float): When any processes fail, wait this long (in seconds)
                for others to shutdown gracefully before terminating them. If they
                still don't exit, wait another grace period before killing them.
        """
        # Ensure this function can be called even when we're done.
        if len(self.sentinels) == 0:
            return True
    
        # Wait for any process to fail or all of them to succeed.
        ready = multiprocessing.connection.wait(
            self.sentinels.keys(),
            timeout=timeout,
        )
    
        error_index = None
        for sentinel in ready:
            index = self.sentinels.pop(sentinel)
            process = self.processes[index]
            process.join()
            if process.exitcode != 0:
                error_index = index
                break
    
        # Return if there was no error.
        if error_index is None:
            # Return whether or not all processes have been joined.
            return len(self.sentinels) == 0
        # An error occurred. Clean-up all processes before returning.
        # First, allow a grace period for processes to shutdown themselves.
        if grace_period is not None:
            self._join_procs_with_timeout(grace_period)
        # Then, terminate processes that are still alive. Try SIGTERM first.
        for process in self.processes:
            if process.is_alive():
                log.warning("Terminating process %s via signal SIGTERM", process.pid)
                process.terminate()
    
        # Try SIGKILL if the process isn't going down after another grace_period.
        # The reason is related to python signal handling is limited
        # to main thread and if that is in c/c++ land and stuck it won't
        # to handle it. We have seen processes getting stuck not handling
        # SIGTERM for the above reason.
        self._join_procs_with_timeout(30 if grace_period is None else grace_period)
        for process in self.processes:
            if process.is_alive():
                log.warning(
                    "Unable to shutdown process %s via SIGTERM , forcefully exiting via SIGKILL",
                    process.pid,
                )
                process.kill()
            process.join()
    
        # The file will only be created if the process crashed.
        failed_process = self.processes[error_index]
        if not os.access(self.error_files[error_index], os.R_OK):
            exitcode = self.processes[error_index].exitcode
            if exitcode < 0:
                try:
                    name = signal.Signals(-exitcode).name
                except ValueError:
                    name = f"<Unknown signal {-exitcode}>"
                raise ProcessExitedException(
                    f"process {error_index:d} terminated with signal {name}",
                    error_index=error_index,
                    error_pid=failed_process.pid,
                    exit_code=exitcode,
                    signal_name=name,
                )
            else:
                raise ProcessExitedException(
                    f"process {error_index:d} terminated with exit code {exitcode:d}",
                    error_index=error_index,
                    error_pid=failed_process.pid,
                    exit_code=exitcode,
                )
    
        with open(self.error_files[error_index], "rb") as fh:
            original_trace = pickle.load(fh)
        msg = f"\n\n-- Process {error_index:d} terminated with the following error:\n"
        msg += original_trace
>       raise ProcessRaisedException(msg, error_index, failed_process.pid)
E       torch.multiprocessing.spawn.ProcessRaisedException: 
E       
E       -- Process 0 terminated with the following error:
E       Traceback (most recent call last):
E         File ".../local/lib/python3.12.../torch/multiprocessing/spawn.py", line 90, in _wrap
E           fn(i, *args)
E         File "/__w/bionemo-framework/bionemo-framework/.../discrete_time/discrete/test_d3pm_parallel.py", line 54, in d3pm_parallel_interpolate
E           with parallel_context(rank=rank, world_size=world_size):
E         File ".../usr/lib/python3.12/contextlib.py", line 137, in __enter__
E           return next(self.gen)
E                  ^^^^^^^^^^^^^^
E         File ".../local/lib/python3.12.../moco/testing/parallel_test_utils.py", line 70, in parallel_context
E           dist.init_process_group(backend="nccl", world_size=world_size)
E         File ".../local/lib/python3.12.../torch/distributed/c10d_logger.py", line 81, in wrapper
E           return func(*args, **kwargs)
E                  ^^^^^^^^^^^^^^^^^^^^^
E         File ".../local/lib/python3.12.../torch/distributed/c10d_logger.py", line 95, in wrapper
E           func_return = func(*args, **kwargs)
E                         ^^^^^^^^^^^^^^^^^^^^^
E         File ".../local/lib/python3.12.../torch/distributed/distributed_c10d.py", line 1717, in init_process_group
E           store, rank, world_size = next(rendezvous_iterator)
E                                     ^^^^^^^^^^^^^^^^^^^^^^^^^
E         File ".../local/lib/python3.12.../torch/distributed/rendezvous.py", line 278, in _env_rendezvous_handler
E           store = _create_c10d_store(
E                   ^^^^^^^^^^^^^^^^^^^
E         File ".../local/lib/python3.12.../torch/distributed/rendezvous.py", line 198, in _create_c10d_store
E           return TCPStore(
E                  ^^^^^^^^^
E       torch.distributed.DistNetworkError: The server socket has failed to listen on any local network address. port: 37217, useIpv6: false, code: -98, name: EADDRINUSE, message: address already in use

.../local/lib/python3.12.../torch/multiprocessing/spawn.py:215: ProcessRaisedException

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Copy link
Collaborator

@pstjohn pstjohn Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we flatten this directory? e.g. no esm2/ folder. we can put modeling_esm_te.py and modeling_llama_te.py right next to each other.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could but i have a number of files for llama3 as well so it might get pretty messy: https://github.com/NVIDIA/bionemo-framework/pull/1501/changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants