Skip to content

Commit 885eb0e

Browse files
authored
Merge branch 'master' into run-pt-2.7-tests
2 parents 5a8bb5f + ef47afc commit 885eb0e

File tree

17 files changed

+161
-38
lines changed

17 files changed

+161
-38
lines changed

available_images.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,8 @@ Note: Starting from Neuron SDK 2.17.0, Dockerfiles for PyTorch Neuron Containers
368368
|----------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|--------------------|-----------|------------------------------|------------------------|----------------------------------------------------------------------------------------------------------------------|
369369
| [PyTorch 2.7.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.25.0/docker/pytorch/inference/2.7.0/Dockerfile.neuronx) | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_inference | Neuron 2.25.0 | inference | trn1,trn2,inf2 | 3.10 (py310) | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:2.7.0-neuronx-py310-sdk2.25.0-ubuntu22.04 |
370370
| [PyTorch 2.7.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.25.0/docker/pytorch/training/2.7.0/Dockerfile.neuronx) | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_training | Neuron 2.25.0 | training | trn1,trn2,inf2 | 3.10 (py310) | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.7.0-neuronx-py310-sdk2.25.0-ubuntu22.04 |
371+
| [PyTorch 2.7.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.24.1/docker/pytorch/inference/2.7.0/Dockerfile.neuronx) | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_inference | Neuron 2.24.1 | inference | trn1,trn2,inf2 | 3.10 (py310) | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:2.7.0-neuronx-py310-sdk2.24.1-ubuntu22.04 |
372+
| [PyTorch 2.7.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.24.1/docker/pytorch/training/2.7.0/Dockerfile.neuronx) | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_training | Neuron 2.24.1 | training | trn1,trn2,inf2 | 3.10 (py310) | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.7.0-neuronx-py310-sdk2.24.1-ubuntu22.04 |
371373
| [PyTorch 2.6.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.23.0/docker/pytorch/inference/2.6.0/Dockerfile.neuronx) | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_inference | Neuron 2.23.0 | inference | trn1,trn2,inf2 | 3.10 (py310) | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:2.6.0-neuronx-py310-sdk2.23.0-ubuntu22.04 |
372374
| [PyTorch 2.6.0](https://github.com/aws-neuron/deep-learning-containers/blob/2.23.0/docker/pytorch/training/2.6.0/Dockerfile.neuronx) | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_training | Neuron 2.23.0 | training | trn1,trn2,inf2 | 3.10 (py310) | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training-neuronx:2.6.0-neuronx-py310-sdk2.23.0-ubuntu22.04 |
373375
| [PyTorch 2.5.1](https://github.com/aws-neuron/deep-learning-containers/blob/2.22.0/docker/pytorch/inference/2.5.1/Dockerfile.neuronx) | torch-neuronx, transformers-neuronx, neuronx_distributed, neuronx_distributed_inference | Neuron 2.22.0 | inference | trn1,trn2,inf2 | 3.10 (py310) | 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:2.5.1-neuronx-py310-sdk2.22.0-ubuntu22.04 |

data/ignore_ids_safety_scan.json

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1399,7 +1399,11 @@
13991399
"77714": "Transformers version upgrade needs to be handled in a separate image",
14001400
"77149": "Transformers version upgrade needs to be handled in a separate image",
14011401
"77985": "Transformers version upgrade needs to be handled in a separate image",
1402-
"77988": "Transformers version upgrade needs to be handled in a separate image"
1402+
"77988": "Transformers version upgrade needs to be handled in a separate image",
1403+
"78688": "Transformers version upgrade needs to be handled in a separate image",
1404+
"78828": "Pytorch version upgrade needs to be handled in a separate image",
1405+
"79595": "Transformers version upgrade needs to be handled in a separate image",
1406+
"79596": "Transformers version upgrade needs to be handled in a separate image"
14031407
}
14041408
},
14051409
"inference": {
@@ -1433,7 +1437,16 @@
14331437
"71601": "Transformers version upgrade needs to be handled in a separate image",
14341438
"71670": "Pytorch version upgrade needs to be handled in a separate image",
14351439
"71671": "Pytorch version upgrade needs to be handled in a separate image",
1436-
"71672": "Pytorch version upgrade needs to be handled in a separate image"
1440+
"71672": "Pytorch version upgrade needs to be handled in a separate image",
1441+
"77740": "Affected versions of this package are vulnerable to a potential Denial of Service (DoS) attack due to unbounded recursion when parsing untrusted Protocol Buffers data. The pure-Python implementation fails to enforce recursion depth limits when processing recursive groups, recursive messages, or a series of SGROUP tags, leading to stack overflow conditions that can crash the application by exceeding Python's recursion limit.",
1442+
"78828": "Affected versions of the PyTorch package are vulnerable to Denial of Service (DoS) due to improper handling in the MKLDNN pooling implementation. The torch.mkldnn_max_pool2d function fails to properly validate input parameters, allowing crafted inputs to trigger resource exhaustion or crashes in the underlying MKLDNN library. An attacker with local access can exploit this vulnerability by passing specially crafted tensor dimensions or parameters to the max pooling function, causing the application to become unresponsive or crash.",
1443+
"78153": "A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically within the DonutProcessor class's token2json() method. This vulnerability affects versions 4.51.3 and earlier, and is fixed in version 4.52.1. The issue arises from the regex pattern <s_(.*?)> which can be exploited to cause excessive CPU consumption through crafted input strings due to catastrophic backtracking. This vulnerability can lead to service disruption, resource exhaustion, and potential API service vulnerabilities, impacting document processing tasks using the Donut model.",
1444+
"77986": "Hugging Face Transformers versions up to 4.49.0 are affected by an improper input validation vulnerability in the image_utils.py file. The vulnerability arises from insecure URL validation using the startswith() method, which can be bypassed through URL username injection. This allows attackers to craft URLs that appear to be from YouTube but resolve to malicious domains, potentially leading to phishing attacks, malware distribution, or data exfiltration. The issue is fixed in version 4.52.1.",
1445+
"78688": "Affected versions of the Hugging Face Transformers package are vulnerable to Regular Expression Denial of Service (ReDoS) due to an inefficient regex pattern in weight name conversion. The convert_tf_weight_name_to_pt_weight_name() function uses the regular expression pattern /[^/]___([^/])/, which is susceptible to catastrophic backtracking when processing specially crafted TensorFlow weight names. An attacker can exploit this vulnerability by providing malicious weight names during model conversion between TensorFlow and PyTorch formats, causing excessive CPU consumption and potentially rendering the service unresponsive.",
1446+
"77744": "urllib3 is a user-friendly HTTP client library for Python. Prior to 2.5.0, it is possible to disable redirects for all requests by instantiating a PoolManager and specifying retries in a way that disable redirects. By default, requests and botocore users are not affected. An application attempting to mitigate SSRF or open redirect vulnerabilities by disabling redirects at the PoolManager level will remain vulnerable. This issue has been patched in version 2.5.0.",
1447+
"79077": "Affected versions of the h2 package are vulnerable to HTTP Request Smuggling due to improper validation of illegal characters in HTTP headers. The package allows CRLF characters to be injected into header names and values without proper sanitisation, which can cause request boundary manipulation when HTTP/2 requests are downgraded to HTTP/1.1 by downstream servers.",
1448+
"79595": "Affected versions of the transformers package are vulnerable to Regular Expression Denial of Service (ReDoS) due to inefficient regular expressions in the EnglishNormalizer.normalize_numbers() method",
1449+
"79596": "Affected versions of the transformers package are vulnerable to Regular Expression Denial of Service (ReDoS) due to inefficient regular expressions in the MarianTokenizer.remove_language_code() method"
14371450
}
14381451
},
14391452
"inference-neuron": {

huggingface/pytorch/inference/docker/2.1/py3/sdk2.20.0/Dockerfile.neuronx.os_scan_allowlist.json

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1541,6 +1541,64 @@
15411541
"status": "ACTIVE",
15421542
"title": "CVE-2023-52760 - linux-libc-dev",
15431543
"reason_to_ignore": "N/A"
1544+
},
1545+
{
1546+
"description": "In the Linux kernel, the following vulnerability has been resolved: of: module: add buffer overflow check in of_modalias(). In of_modalias(), if the buffer happens to be too small even for the 1st snprintf() call, the len parameter will become negative and str parameter (if not NULL initially) will point beyond the buffer's end. Add the buffer overflow check after the 1st snprintf() call and fix such check after the strlen() call (accounting for the terminating NUL char).",
1547+
"vulnerability_id": "CVE-2024-38541",
1548+
"name": "CVE-2024-38541",
1549+
"package_name": "linux-libc-dev",
1550+
"package_details": {
1551+
"file_path": null,
1552+
"name": "linux-libc-dev",
1553+
"package_manager": "OS",
1554+
"version": "5.4.0",
1555+
"release": "192.212"
1556+
},
1557+
"remediation": {
1558+
"recommendation": {
1559+
"text": "None Provided"
1560+
}
1561+
},
1562+
"cvss_v3_score": 7.8,
1563+
"cvss_v30_score": 0.0,
1564+
"cvss_v31_score": 7.8,
1565+
"cvss_v2_score": 0.0,
1566+
"cvss_v3_severity": "HIGH",
1567+
"source_url": "https://ubuntu.com/security/CVE-2024-38541",
1568+
"source": "UBUNTU_CVE",
1569+
"severity": "CRITICAL",
1570+
"status": "ACTIVE",
1571+
"title": "CVE-2024-38541 - linux-libc-dev",
1572+
"reason_to_ignore": "N/A"
1573+
},
1574+
{
1575+
"description": "A use-after-free vulnerability was found in libxml2. This issue occurs when parsing XPath elements under certain circumstances when the XML schematron has the <sch:name path=”...”/> schema elements. This flaw allows a malicious actor to craft a malicious XML document used as input for libxml, resulting in the program’s crash using libxml or other possible undefined behaviors.",
1576+
"vulnerability_id": "CVE-2025-49794",
1577+
"name": "CVE-2025-49794",
1578+
"package_name": "linux-libc-dev",
1579+
"package_details": {
1580+
"file_path": null,
1581+
"name": "linux-libc-dev",
1582+
"package_manager": "OS",
1583+
"version": "5.4.0",
1584+
"release": "192.212"
1585+
},
1586+
"remediation": {
1587+
"recommendation": {
1588+
"text": "None Provided"
1589+
}
1590+
},
1591+
"cvss_v3_score": 7.8,
1592+
"cvss_v30_score": 0.0,
1593+
"cvss_v31_score": 7.8,
1594+
"cvss_v2_score": 0.0,
1595+
"cvss_v3_severity": "HIGH",
1596+
"source_url": "https://ubuntu.com/security/CVE-2025-49794",
1597+
"source": "UBUNTU_CVE",
1598+
"severity": "CRITICAL",
1599+
"status": "ACTIVE",
1600+
"title": "CVE-2025-49794 - linux-libc-dev",
1601+
"reason_to_ignore": "N/A"
15441602
}],
15451603
"postgresql-12": [
15461604
{

huggingface/pytorch/inference/docker/2.6/py3/Dockerfile.cpu

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ RUN pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pytho
139139

140140
# Install Common python packages
141141
RUN pip install --no-cache-dir --extra-index-url https://download.pytorch.org/whl/cpu -U \
142-
opencv-python \
142+
"opencv-python==4.11.0.86" \
143143
"pyopenssl>=24.0.0" \
144144
"cryptography>=42.0.5" \
145145
"ipython>=8.10.0,<9.0" \
@@ -231,7 +231,8 @@ ENV HF_HUB_USER_AGENT_ORIGIN="aws:sagemaker:cpu:inference:regular"
231231

232232
# IPEx installation installs the numpy==1.25.1. That causes a pip check failure due to incompatibility with numba.
233233
# Re-installing numpy after IPEx installation to get the appropriate numpy version and fix pip checks.
234-
# RUN pip install --no-cache-dir \
234+
RUN pip install --no-cache-dir \
235+
"opencv-python==4.11.0.86"
235236
# "numpy<1.25" \
236237
# "pyyaml>=5.4"
237238

@@ -244,7 +245,7 @@ RUN HOME_DIR=/root \
244245
&& ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \
245246
&& rm -rf ${HOME_DIR}/oss_compliance*
246247

247-
RUN curl -o /license.txt https://aws-dlc-licenses.s3.amazonaws.com/pytorch-2.3/license.txt
248+
RUN curl -o /license.txt https://aws-dlc-licenses.s3.amazonaws.com/pytorch-2.6/license.txt
248249

249250
## Cleanup ##
250251
RUN pip cache purge \
@@ -255,4 +256,4 @@ RUN pip cache purge \
255256

256257
EXPOSE 8080 8081
257258
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
258-
CMD ["serve"]
259+
CMD ["serve"]

huggingface/pytorch/inference/docker/2.6/py3/cu124/Dockerfile.gpu

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,8 @@ RUN apt-get update \
119119
tk-dev \
120120
libffi-dev \
121121
ffmpeg \
122+
libxml2 \
123+
linux-libc-dev \
122124
&& apt-get autoremove -y \
123125
&& rm -rf /var/lib/apt/lists/* \
124126
&& apt-get clean
@@ -162,8 +164,6 @@ RUN /opt/conda/bin/conda install -y -c conda-forge \
162164
"mkl<2024.1.0" \
163165
mkl-include \
164166
parso \
165-
scipy \
166-
numpy \
167167
pandas \
168168
pyarrow \
169169
typing \
@@ -190,7 +190,8 @@ RUN pip install --upgrade pip --no-cache-dir --trusted-host pypi.org --trusted-h
190190

191191
# Install Common python packages
192192
RUN pip install --no-cache-dir -U \
193-
opencv-python \
193+
"opencv-python==4.11.0.86" \
194+
scipy \
194195
# "nvgpu" is a dependency of TS but is disabled in SM DLC build,
195196
# via ENV Variable "TS_DISABLE_SYSTEM_METRICS=true" in the SM section of this file.
196197
# due to incompatibility with SM hosts
@@ -265,7 +266,8 @@ RUN pip install --no-cache-dir \
265266
diffusers==${DIFFUSERS_VERSION} \
266267
peft==${PEFT_VERSION} \
267268
accelerate==${ACCELERATE_VERSION} \
268-
sagemaker-huggingface-inference-toolkit==${SAGEMAKER_HF_INFERENCE_VERSION}
269+
sagemaker-huggingface-inference-toolkit==${SAGEMAKER_HF_INFERENCE_VERSION} \
270+
"opencv-python==4.11.0.86"
269271

270272
# hf_transfer will be a built-in feature, remove the env variavle then
271273
ENV HF_HUB_ENABLE_HF_TRANSFER="1"
@@ -280,7 +282,7 @@ RUN HOME_DIR=/root \
280282
&& ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} \
281283
&& rm -rf ${HOME_DIR}/oss_compliance*
282284

283-
RUN curl -o /license.txt https://aws-dlc-licenses.s3.amazonaws.com/pytorch-2.3/license.txt
285+
RUN curl -o /license.txt https://aws-dlc-licenses.s3.amazonaws.com/pytorch-2.6/license.txt
284286

285287
## Cleanup ##
286288
RUN pip cache purge \
@@ -289,4 +291,4 @@ RUN pip cache purge \
289291

290292
EXPOSE 8080 8081
291293
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
292-
CMD ["serve"]
294+
CMD ["serve"]

huggingface/pytorch/training/docker/2.1/py3/sdk2.20.0/Dockerfile.neuronx.os_scan_allowlist.json

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -845,6 +845,35 @@
845845
}
846846
],
847847
"linux-libc-dev": [
848+
{
849+
"description": "In the Linux kernel, the following vulnerability has been resolved: of: module: add buffer overflow check in of_modalias(). In of_modalias(), if the buffer happens to be too small even for the 1st snprintf() call, the len parameter will become negative and str parameter (if not NULL initially) will point beyond the buffer's end. Add the buffer overflow check after the 1st snprintf() call and fix such check after the strlen() call (accounting for the terminating NUL char).",
850+
"vulnerability_id": "CVE-2024-38541",
851+
"name": "CVE-2024-38541",
852+
"package_name": "linux-libc-dev",
853+
"package_details": {
854+
"file_path": null,
855+
"name": "linux-libc-dev",
856+
"package_manager": "OS",
857+
"version": "5.4.0",
858+
"release": "192.212"
859+
},
860+
"remediation": {
861+
"recommendation": {
862+
"text": "None Provided"
863+
}
864+
},
865+
"cvss_v3_score": 7.8,
866+
"cvss_v30_score": 0.0,
867+
"cvss_v31_score": 7.8,
868+
"cvss_v2_score": 0.0,
869+
"cvss_v3_severity": "HIGH",
870+
"source_url": "https://ubuntu.com/security/CVE-2024-38541",
871+
"source": "UBUNTU_CVE",
872+
"severity": "CRITICAL",
873+
"status": "ACTIVE",
874+
"title": "CVE-2024-38541 - linux-libc-dev",
875+
"reason_to_ignore": "N/A"
876+
},
848877
{
849878
"description":"In the Linux kernel, the following vulnerability has been resolved: greybus: Fix use-after-free bug in gb_interface_release due to race condition. In gb_interface_create, &intf->mode_switch_completion is bound with gb_interface_mode_switch_work. Then it will be started by gb_interface_request_mode_switch. Here is the relevant code. if (!queue_work(system_long_wq, &intf->mode_switch_work)) { ... } If we call gb_interface_release to make cleanup, there may be an unfinished work. This function will call kfree to free the object \"intf\". However, if gb_interface_mode_switch_work is scheduled to run after kfree, it may cause use-after-free error as gb_interface_mode_switch_work will use the object \"intf\". The possible execution flow that may lead to the issue is as follows: CPU0 CPU1 | gb_interface_create | gb_interface_request_mode_switch gb_interface_release | kfree(intf) (free) | | gb_interface_mode_switch_work | mutex_lock(&intf->mutex) (use) Fix it by canceling the work before kfree.",
850879
"vulnerability_id":"CVE-2024-39495",

huggingface/pytorch/training/docker/2.5/py3/cu124/Dockerfile.gpu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ ENV HF_HUB_ENABLE_HF_TRANSFER="1"
6464
RUN apt-get update \
6565
# TODO: Remove upgrade statements once packages are updated in base image
6666
&& apt-get -y upgrade --only-upgrade systemd openssl cryptsetup libkrb5-3 linux-libc-dev libsqlite3-0 \
67-
&& apt-get install -y git git-lfs wget tar \
67+
&& apt-get install -y git git-lfs wget tar libxml2 \
6868
&& wget https://go.dev/dl/go1.24.2.linux-amd64.tar.gz \
6969
&& rm -rf /usr/local/go \
7070
&& tar -C /usr/local -xzf go1.24.2.linux-amd64.tar.gz \

release_images_general.yml

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,14 +44,28 @@ release_images:
4444
public_registry: True
4545
4:
4646
framework: "vllm"
47-
version: "0.10.1"
47+
version: "0.10.2"
4848
arch_type: "x86"
4949
customer_type: "ec2"
5050
general:
5151
device_types: [ "gpu" ]
5252
python_versions: [ "py312" ]
5353
os_version: "ubuntu22.04"
54-
cuda_version: "cu128"
54+
cuda_version: "cu129"
55+
example: False
56+
disable_sm_tag: False
57+
force_release: False
58+
public_registry: True
59+
5:
60+
framework: "vllm"
61+
version: "0.10.2"
62+
arch_type: "arm64"
63+
customer_type: "ec2"
64+
general:
65+
device_types: [ "gpu" ]
66+
python_versions: [ "py312" ]
67+
os_version: "ubuntu22.04"
68+
cuda_version: "cu129"
5569
example: False
5670
disable_sm_tag: False
5771
force_release: False

test/dlc_tests/sanity/test_boottime_container_security.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@
77
@pytest.mark.model("N/A")
88
@pytest.mark.canary("Run security test regularly on production images")
99
def test_security(image):
10+
if "vllm" in image:
11+
pytest.skip(
12+
"vLLM images do not require pip check as they are managed by vLLM devs. Skipping test."
13+
)
1014
repo_name, image_tag = image.split("/")[-1].split(":")
1115
container_name = f"{repo_name}-{image_tag}-security"
1216

@@ -20,10 +24,7 @@ def test_security(image):
2024
)
2125
try:
2226
docker_exec_cmd = f"docker exec -i {container_name}"
23-
if "vllm" in image:
24-
run_command = f"python3 /test/bin/security_checks.py"
25-
else:
26-
run_command = f"python /test/bin/security_checks.py"
27+
run_command = f"python /test/bin/security_checks.py"
2728

2829
run(f"{docker_exec_cmd} {run_command} --image_uri {image}", hide=True)
2930
finally:

0 commit comments

Comments
 (0)