Merge branch 'master' into hf-pt-2-7-tr4-55-0-training

fgbelidji · web-flow · commit ada346067053 · 2025-09-23T15:41:45.000+02:00
diff --git a/available_images.md b/available_images.md
@@ -366,8 +366,12 @@ Note: Starting from Neuron SDK 2.17.0, Dockerfiles for PyTorch Neuron Containers
 
 | Framework                                                                                                                                          | Neuron Package                                           | Neuron SDK Version | Job Type  | Supported EC2 Instance Types | Python Version Options | Example URL                                                                                                          |
 |----------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|--------------------|-----------|------------------------------|------------------------|----------------------------------------------------------------------------------------------------------------------|
+| [PyTorch 2.8.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.26.0/docker/pytorch/inference/2.8.0/Dockerfile.neuronx)              | torch-neuronx, neuronx_distributed, neuronx_distributed_inference | Neuron 2.26.0      | inference | trn1,trn2,inf2                    | 3.11 (py311)           | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:2.8.0-neuronx-py311-sdk2.26.0-ubuntu22.04     |
+| [PyTorch 2.8.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.26.0/docker/pytorch/training/2.8.0/Dockerfile.neuronx)              | torch-neuronx, neuronx_distributed, neuronx_distributed_training | Neuron 2.26.0      | training | trn1,trn2,inf2                    | 3.11 (py311)           | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.8.0-neuronx-py311-sdk2.26.0-ubuntu22.04     |
 | [PyTorch 2.7.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.25.0/docker/pytorch/inference/2.7.0/Dockerfile.neuronx)              | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_inference | Neuron 2.25.0      | inference | trn1,trn2,inf2                    | 3.10 (py310)           | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:2.7.0-neuronx-py310-sdk2.25.0-ubuntu22.04     |
 | [PyTorch 2.7.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.25.0/docker/pytorch/training/2.7.0/Dockerfile.neuronx)              | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_training | Neuron 2.25.0      | training | trn1,trn2,inf2                    | 3.10 (py310)           | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.7.0-neuronx-py310-sdk2.25.0-ubuntu22.04     |
+| [PyTorch 2.7.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.24.1/docker/pytorch/inference/2.7.0/Dockerfile.neuronx)              | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_inference | Neuron 2.24.1      | inference | trn1,trn2,inf2                    | 3.10 (py310)           | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:2.7.0-neuronx-py310-sdk2.24.1-ubuntu22.04     |
+| [PyTorch 2.7.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.24.1/docker/pytorch/training/2.7.0/Dockerfile.neuronx)              | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_training | Neuron 2.24.1      | training | trn1,trn2,inf2                    | 3.10 (py310)           | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.7.0-neuronx-py310-sdk2.24.1-ubuntu22.04     |
 | [PyTorch 2.6.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.23.0/docker/pytorch/inference/2.6.0/Dockerfile.neuronx)              | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_inference | Neuron 2.23.0      | inference | trn1,trn2,inf2                    | 3.10 (py310)           | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:2.6.0-neuronx-py310-sdk2.23.0-ubuntu22.04     |
 | [PyTorch 2.6.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.23.0/docker/pytorch/training/2.6.0/Dockerfile.neuronx)              | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_training | Neuron 2.23.0      | training | trn1,trn2,inf2                    | 3.10 (py310)           | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.6.0-neuronx-py310-sdk2.23.0-ubuntu22.04     |
 | [PyTorch 2.5.1](https://github.com/aws-neuron/deep-learning-containers/blob/2.22.0/docker/pytorch/inference/2.5.1/Dockerfile.neuronx)              | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_inference | Neuron 2.22.0      | inference | trn1,trn2,inf2                    | 3.10 (py310)           | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:2.5.1-neuronx-py310-sdk2.22.0-ubuntu22.04     |
diff --git a/huggingface/pytorch/training/docker/2.1/py3/sdk2.20.0/Dockerfile.neuronx.os_scan_allowlist.json b/huggingface/pytorch/training/docker/2.1/py3/sdk2.20.0/Dockerfile.neuronx.os_scan_allowlist.json
@@ -845,6 +845,35 @@
       }
     ],
     "linux-libc-dev": [
+      {
+         "description": "In the Linux kernel, the following vulnerability has been resolved: of: module: add buffer overflow check in of_modalias(). In of_modalias(), if the buffer happens to be too small even for the 1st snprintf() call, the len parameter will become negative and str parameter (if not NULL initially) will point beyond the buffer's end. Add the buffer overflow check after the 1st snprintf() call and fix such check after the strlen() call (accounting for the terminating NUL char).",
+         "vulnerability_id": "CVE-2024-38541",
+         "name": "CVE-2024-38541",
+         "package_name": "linux-libc-dev",
+         "package_details": {
+            "file_path": null,
+            "name": "linux-libc-dev",
+            "package_manager": "OS",
+            "version": "5.4.0",
+            "release": "192.212"
+         },
+         "remediation": {
+            "recommendation": {
+                  "text": "None Provided"
+            }
+         },
+         "cvss_v3_score": 7.8,
+         "cvss_v30_score": 0.0,
+         "cvss_v31_score": 7.8,
+         "cvss_v2_score": 0.0,
+         "cvss_v3_severity": "HIGH",
+         "source_url": "https://ubuntu.com/security/CVE-2024-38541",
+         "source": "UBUNTU_CVE",
+         "severity": "CRITICAL",
+         "status": "ACTIVE",
+         "title": "CVE-2024-38541 - linux-libc-dev",
+         "reason_to_ignore": "N/A"
+      },
       {
          "description":"In the Linux kernel, the following vulnerability has been resolved: greybus: Fix use-after-free bug in gb_interface_release due to race condition. In gb_interface_create, &intf->mode_switch_completion is bound with gb_interface_mode_switch_work. Then it will be started by gb_interface_request_mode_switch. Here is the relevant code. if (!queue_work(system_long_wq, &intf->mode_switch_work)) { ... } If we call gb_interface_release to make cleanup, there may be an unfinished work. This function will call kfree to free the object \"intf\". However, if gb_interface_mode_switch_work is scheduled to run after kfree, it may cause use-after-free error as gb_interface_mode_switch_work will use the object \"intf\". The possible execution flow that may lead to the issue is as follows: CPU0 CPU1 | gb_interface_create | gb_interface_request_mode_switch gb_interface_release | kfree(intf) (free) | | gb_interface_mode_switch_work | mutex_lock(&intf->mutex) (use) Fix it by canceling the work before kfree.",
          "vulnerability_id":"CVE-2024-39495",
diff --git a/pytorch/inference/docker/2.5/py3/Dockerfile.sagemaker.arm64.cpu.py_scan_allowlist.json b/pytorch/inference/docker/2.5/py3/Dockerfile.sagemaker.arm64.cpu.py_scan_allowlist.json
@@ -1,3 +1,7 @@
 {
-  "70612": "In Jinja2, the from_string function is prone to Server Side Template Injection (SSTI) where it takes the \"source\" parameter as a template object, renders it, and then returns it. The attacker can exploit it with {{INJECTION COMMANDS}} in a URI. \r\nNOTE: The maintainer and multiple third parties believe that this vulnerability isn't valid because users shouldn't use untrusted templates without sandboxing."
+  "70612": "In Jinja2, the from_string function is prone to Server Side Template Injection (SSTI) where it takes the \"source\" parameter as a template object, renders it, and then returns it. The attacker can exploit it with {{INJECTION COMMANDS}} in a URI. \r\nNOTE: The maintainer and multiple third parties believe that this vulnerability isn't valid because users shouldn't use untrusted templates without sandboxing.",
+  "79077": "Affected versions of the h2 package are vulnerable to HTTP Request Smuggling due to improper validation of illegal characters in HTTP headers. The package allows CRLF characters to be injected into header names and values without proper sanitisation, which can cause request boundary manipulation when HTTP/2 requests are downgraded to HTTP/1.1 by downstream servers.",
+  "78828": "Affected versions of the PyTorch package are vulnerable to Denial of Service (DoS) due to improper handling in the MKLDNN pooling implementation. The torch.mkldnn_max_pool2d function fails to properly validate input parameters, allowing crafted inputs to trigger resource exhaustion or crashes in the underlying MKLDNN library. An attacker with local access can exploit this vulnerability by passing specially crafted tensor dimensions or parameters to the max pooling function, causing the application to become unresponsive or crash.",
+  "77744": "urllib3 is a user-friendly HTTP client library for Python. Prior to 2.5.0, it is possible to disable redirects for all requests by instantiating a PoolManager and specifying retries in a way that disable redirects. By default, requests and botocore users are not affected. An application attempting to mitigate SSRF or open redirect vulnerabilities by disabling redirects at the PoolManager level will remain vulnerable. This issue has been patched in version 2.5.0.",
+  "77745": "Urllib3 is a user-friendly HTTP client library for Python. Starting in version 2.2.0 and before 2.5.0, urllib3 does not control redirects in browsers and Node.js. urllib3 supports being used in a Pyodide runtime, utilizing the JavaScript Fetch API or falling back on XMLHttpRequest. This means Python libraries can be used to make HTTP requests from a browser or Node.js. Additionally, urllib3 provides a mechanism to control redirects, but the retries and redirect parameters are ignored with Pyodide; the runtime itself determines redirect behaviour. This issue has been patched in version 2.5.0."
 }
diff --git a/pytorch/inference/docker/2.6/py3/Dockerfile.arm64.cpu b/pytorch/inference/docker/2.6/py3/Dockerfile.arm64.cpu
@@ -189,8 +189,8 @@ RUN chmod +x /usr/local/bin/dockerd-entrypoint.py
 
 # add telemetry
 COPY deep_learning_container.py /usr/local/bin/deep_learning_container.py
-COPY sitecustomize.py /usr/local/lib/${PYTHON_SHORT_VERSION}/sitecustomize.py
 RUN chmod +x /usr/local/bin/deep_learning_container.py
+# COPY sitecustomize.py /usr/local/lib/${PYTHON_SHORT_VERSION}/sitecustomize.py
 
 RUN HOME_DIR=/root \
  && curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip \
diff --git a/release_images_general.yml b/release_images_general.yml
@@ -44,14 +44,14 @@ release_images:
       public_registry: True
   4:
     framework: "vllm"
-    version: "0.10.1"
+    version: "0.10.2"
     arch_type: "x86"
     customer_type: "ec2"
     general:
       device_types: [ "gpu" ]
       python_versions: [ "py312" ]
       os_version: "ubuntu22.04"
-      cuda_version: "cu128"
+      cuda_version: "cu129"
       example: False
       disable_sm_tag: False
       force_release: False
@@ -69,4 +69,4 @@ release_images:
       example: False
       disable_sm_tag: False
       force_release: False
-      public_registry: False
+      public_registry: True
diff --git a/vllm/CHANGELOG.md b/vllm/CHANGELOG.md
@@ -2,14 +2,28 @@
 
 All notable changes to vLLM Deep Learning Containers will be documented in this file.
 
+## [0.10.2] - 2025-09-18
+### Updated
+- vllm/vllm-openai version `v0.10.2`, see [release note](https://github.com/vllm-project/vllm/releases/tag/v0.10.2) for details.
+
+### Added
+- Introducing vLLM ARM64 support for AWS Graviton (g5g) with NVIDIA T4 GPUs, using XFormers/FlashInfer as attention backend and V0 engine for Turing architecture compatibility - [release tag](https://github.com/aws/deep-learning-containers/releases/tag/v1.1-vllm-arm64-ec2-0.10.2-gpu-py312)
+
+### Sample ECR URI
+```
+763104351884.dkr.ecr.us-west-2.amazonaws.com/vllm-arm64:0.10.2-gpu-py312-cu129-ubuntu22.04-ec2-v1.1 
+763104351884.dkr.ecr.us-west-2.amazonaws.com/vllm:0.10.2-gpu-py312-cu129-ubuntu22.04-ec2-v1.0 
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.10.2-gpu-py312-cu129-ubuntu22.04-ec2
+```
+
 ## [0.10.1] - 2025-08-25
 ### Updated
 - vllm/vllm-openai version `v0.10.1.1`, see [release note](https://github.com/vllm-project/vllm/releases/tag/v0.10.1.1) for details.
 - EFA installer version `1.43.2`
 ### Sample ECR URI
 ```
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.10-gpu-py312-ec2
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.10.1-gpu-py312-cu128-ubuntu22.04-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.10-gpu-py312-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.10.1-gpu-py312-cu128-ubuntu22.04-ec2
 ```
 
 ## [0.10.0] - 2025-08-04
@@ -18,17 +32,17 @@ All notable changes to vLLM Deep Learning Containers will be documented in this
 - EFA installer version `1.43.1`
 ### Sample ECR URI
 ```
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.10-gpu-py312-ec2
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.10.0-gpu-py312-cu128-ubuntu22.04-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.10-gpu-py312-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.10.0-gpu-py312-cu128-ubuntu22.04-ec2
 ```
 
 ## [0.9.2] - 2025-07-15
 ### Updated
 - vllm/vllm-openai version `v0.9.2`, see [release note](https://github.com/vllm-project/vllm/releases/tag/v0.9.2) for details.
 ### Sample ECR URI
 ```
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.9-gpu-py312-ec2
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.9.2-gpu-py312-cu128-ubuntu22.04-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9-gpu-py312-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9.2-gpu-py312-cu128-ubuntu22.04-ec2
 ```
 
 ## [0.9.1] - 2025-06-13
@@ -37,8 +51,8 @@ All notable changes to vLLM Deep Learning Containers will be documented in this
 - EFA installer version `1.42.0`
 ### Sample ECR URI
 ```
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.9-gpu-py312-ec2
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.9.1-gpu-py312-cu128-ubuntu22.04-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9-gpu-py312-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9.1-gpu-py312-cu128-ubuntu22.04-ec2
 ```
 
 
@@ -48,8 +62,8 @@ All notable changes to vLLM Deep Learning Containers will be documented in this
 - EFA installer version `1.41.0`
 ### Sample ECR URI
 ```
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.9-gpu-py312-ec2
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.9.0-gpu-py312-cu128-ubuntu22.04-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9-gpu-py312-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9.0-gpu-py312-cu128-ubuntu22.04-ec2
 ```
 
 ## [0.8.5] - 2025-06-02
@@ -59,6 +73,6 @@ All notable changes to vLLM Deep Learning Containers will be documented in this
 - EFA installer version `1.40.0`
 ### Sample ECR URI
 ```
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.8-gpu-py312-ec2
-763104351884.dkr.ecr.us-east-1.amazonaws.com/0.8.5-gpu-py312-cu128-ubuntu22.04-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.8-gpu-py312-ec2
+763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.8.5-gpu-py312-cu128-ubuntu22.04-ec2
 ```
diff --git a/vllm/buildspec.yml b/vllm/buildspec.yml
@@ -2,7 +2,7 @@ account_id: &ACCOUNT_ID <set-$ACCOUNT_ID-in-environment>
 prod_account_id: &PROD_ACCOUNT_ID 763104351884
 region: &REGION <set-$REGION-in-environment>
 framework: &FRAMEWORK vllm
-version: &VERSION "0.10.1"
+version: &VERSION "0.10.2"
 short_version: &SHORT_VERSION "0.10"
 arch_type: &ARCH_TYPE x86_64
 autopatch_build: "False"
@@ -35,7 +35,7 @@ images:
       <<: *BUILD_CONTEXT
     image_size_baseline: 20000
     device_type: &DEVICE_TYPE gpu
-    cuda_version: &CUDA_VERSION cu128
+    cuda_version: &CUDA_VERSION cu129
     python_version: &DOCKER_PYTHON_VERSION py3
     tag_python_version: &TAG_PYTHON_VERSION py312
     os_version: &OS_VERSION ubuntu22.04
diff --git a/vllm/x86_64/gpu/Dockerfile b/vllm/x86_64/gpu/Dockerfile
@@ -1,4 +1,4 @@
-FROM docker.io/vllm/vllm-openai:v0.10.1.1 as final
+FROM docker.io/vllm/vllm-openai:v0.10.2 as final
 ARG PYTHON="python3"
 ARG EFA_VERSION="1.43.2"
 LABEL maintainer="Amazon AI"

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,7 @@`
`1`	`1`	`{`
`2`		`- "70612": "In Jinja2, the from_string function is prone to Server Side Template Injection (SSTI) where it takes the \"source\" parameter as a template object, renders it, and then returns it. The attacker can exploit it with {{INJECTION COMMANDS}} in a URI. \r\nNOTE: The maintainer and multiple third parties believe that this vulnerability isn't valid because users shouldn't use untrusted templates without sandboxing."`
	`2`	`+ "70612": "In Jinja2, the from_string function is prone to Server Side Template Injection (SSTI) where it takes the \"source\" parameter as a template object, renders it, and then returns it. The attacker can exploit it with {{INJECTION COMMANDS}} in a URI. \r\nNOTE: The maintainer and multiple third parties believe that this vulnerability isn't valid because users shouldn't use untrusted templates without sandboxing.",`
	`3`	`+ "79077": "Affected versions of the h2 package are vulnerable to HTTP Request Smuggling due to improper validation of illegal characters in HTTP headers. The package allows CRLF characters to be injected into header names and values without proper sanitisation, which can cause request boundary manipulation when HTTP/2 requests are downgraded to HTTP/1.1 by downstream servers.",`
	`4`	+ "78828": "Affected versions of the PyTorch package are vulnerable to Denial of Service (DoS) due to improper handling in the MKLDNN pooling implementation. The torch.mkldnn_max_pool2d function fails to properly validate input parameters, allowing crafted inputs to trigger resource exhaustion or crashes in the underlying MKLDNN library. An attacker with local access can exploit this vulnerability by passing specially crafted tensor dimensions or parameters to the max pooling function, causing the application to become unresponsive or crash.",
	`5`	`+ "77744": "urllib3 is a user-friendly HTTP client library for Python. Prior to 2.5.0, it is possible to disable redirects for all requests by instantiating a PoolManager and specifying retries in a way that disable redirects. By default, requests and botocore users are not affected. An application attempting to mitigate SSRF or open redirect vulnerabilities by disabling redirects at the PoolManager level will remain vulnerable. This issue has been patched in version 2.5.0.",`
	`6`	+ "77745": "Urllib3 is a user-friendly HTTP client library for Python. Starting in version 2.2.0 and before 2.5.0, urllib3 does not control redirects in browsers and Node.js. urllib3 supports being used in a Pyodide runtime, utilizing the JavaScript Fetch API or falling back on XMLHttpRequest. This means Python libraries can be used to make HTTP requests from a browser or Node.js. Additionally, urllib3 provides a mechanism to control redirects, but the retries and redirect parameters are ignored with Pyodide; the runtime itself determines redirect behaviour. This issue has been patched in version 2.5.0."
`3`	`7`	`}`
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-FROM docker.io/vllm/vllm-openai:v0.10.1.1 as final`
	`1`	`+FROM docker.io/vllm/vllm-openai:v0.10.2 as final`
`2`	`2`	`ARG PYTHON="python3"`
`3`	`3`	`ARG EFA_VERSION="1.43.2"`
`4`	`4`	`LABEL maintainer="Amazon AI"`