Skip to content

Commit a66e749

Browse files
DevakiBolleneniDevakiBolleneni
andauthored
v1: sglang ec2 (#5595)
* sglang ec2 * add ec2 in docker file * add ec2 entrypoint * add buildspec ec2 * add tests * modify tests * fix formatting * run * fix cuda compat * add sglang framework * fix * skip telemetry * fix tests * fix tests * fix * fix tests * fix upstream tests * fix hf token * fix * fix * fix * fix * fix * fix * rerun sglang ec2 tests * skip telemetry tests * skip telemetry tests * skip telemetry tests * skip telemetry tests * skip telemetry tests * skip telemetry tests * skip telemetry tests * skip telemetry tests * skip telemetry tests * skip telemetry tests * rerun to debug * skip telemetry tests * skip logic for telemetry tests * skip logic for telemetry tests * change skip logic * change skip logic * change skip logic * add pytest filter to exclude telemetry marked tests in ec2 test suite * add pytest filter to exclude telemetry marked tests in ec2 test suite * raise error for hf_token if not found * update skip logic to handle specified tests * revert toml file --------- Co-authored-by: DevakiBolleneni <devakib@amazon.com>
1 parent bb0d379 commit a66e749

File tree

8 files changed

+416
-1
lines changed

8 files changed

+416
-1
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/usr/bin/env bash
2+
# Check if telemetry file exists before executing
3+
# Execute telemetry script if it exists, suppress errors
4+
bash /usr/local/bin/bash_telemetry.sh >/dev/null 2>&1 || true
5+
6+
python3 -m sglang.launch_server "$@"

sglang/buildspec-ec2.yml

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
account_id: &ACCOUNT_ID <set-$ACCOUNT_ID-in-environment>
2+
prod_account_id: &PROD_ACCOUNT_ID 763104351884
3+
region: &REGION <set-$REGION-in-environment>
4+
framework: &FRAMEWORK sglang
5+
version: &VERSION "0.5.6"
6+
short_version: &SHORT_VERSION "0.5"
7+
arch_type: &ARCH_TYPE x86_64
8+
autopatch_build: "False"
9+
10+
repository_info:
11+
build_repository: &BUILD_REPOSITORY
12+
image_type: &IMAGE_TYPE gpu
13+
root: .
14+
repository_name: &REPOSITORY_NAME !join [ pr, "-", *FRAMEWORK ]
15+
repository: &REPOSITORY !join [ *ACCOUNT_ID, .dkr.ecr., *REGION, .amazonaws.com/, *REPOSITORY_NAME ]
16+
release_repository_name: &RELEASE_REPOSITORY_NAME !join [ *FRAMEWORK ]
17+
release_repository: &RELEASE_REPOSITORY !join [ *PROD_ACCOUNT_ID, .dkr.ecr., *REGION, .amazonaws.com/, *RELEASE_REPOSITORY_NAME ]
18+
19+
context:
20+
build_context: &BUILD_CONTEXT
21+
deep_learning_container:
22+
source: src/deep_learning_container.py
23+
target: deep_learning_container.py
24+
install_efa:
25+
source: scripts/install_efa.sh
26+
target: install_efa.sh
27+
start_cuda_compat:
28+
source: sglang/build_artifacts/start_cuda_compat.sh
29+
target: start_cuda_compat.sh
30+
sagemaker_entrypoint:
31+
source: sglang/build_artifacts/dockerd_entrypoint.sh
32+
target: dockerd_entrypoint.sh
33+
34+
images:
35+
sglang_ec2:
36+
<<: *BUILD_REPOSITORY
37+
context:
38+
<<: *BUILD_CONTEXT
39+
image_size_baseline: 26000
40+
device_type: &DEVICE_TYPE gpu
41+
cuda_version: &CUDA_VERSION cu129
42+
python_version: &DOCKER_PYTHON_VERSION py3
43+
tag_python_version: &TAG_PYTHON_VERSION py312
44+
os_version: &OS_VERSION ubuntu22.04
45+
tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-ec2" ]
46+
latest_release_tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-ec2" ]
47+
skip_build: "False"
48+
docker_file: !join [ *FRAMEWORK, /, *ARCH_TYPE, /, *DEVICE_TYPE, /Dockerfile ]
49+
target: sglang-ec2
50+
build: true
51+
enable_common_stage_build: false
52+
test_configs:
53+
test_platforms:
54+
- sanity
55+
- security
56+
- ec2

sglang/x86_64/gpu/Dockerfile

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,24 @@ RUN echo 'source /usr/local/bin/bash_telemetry.sh' >> /etc/bash.bashrc \
7474
&& rm -rf /var/lib/apt/lists/* \
7575
&& rm -rf /root/.cache | true
7676

77+
# =======================================================
78+
# ====================== EC2 ============================
79+
# =======================================================
80+
81+
FROM base AS sglang-ec2
82+
83+
RUN dpkg -l | grep -E "cuda|nvidia|libnv" | awk '{print $2}' | xargs apt-mark hold \
84+
&& apt-get update \
85+
&& apt-get upgrade -y \
86+
&& apt-get clean
87+
88+
RUN rm -rf /tmp/*
89+
90+
COPY dockerd_entrypoint.sh /usr/local/bin/dockerd_entrypoint.sh
91+
RUN chmod +x /usr/local/bin/dockerd_entrypoint.sh
92+
93+
ENTRYPOINT ["/usr/local/bin/dockerd_entrypoint.sh"]
94+
7795
# =======================================================
7896
# ====================== sagemaker ======================
7997
# =======================================================

test/dlc_tests/conftest.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,8 @@
109109
"pytorch_trcomp_training",
110110
# Autogluon
111111
"autogluon_training",
112+
# SGLang
113+
"sglang",
112114
# Processor fixtures
113115
"gpu",
114116
"cpu",

test/dlc_tests/ec2/sglang/__init__.py

Whitespace-only changes.

test/dlc_tests/ec2/sglang/ec2_tests/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)