Skip to content

Commit d581cbc

Browse files
authored
Upgrade VM image (#325)
1 parent 3459caa commit d581cbc

File tree

4 files changed

+76
-9
lines changed

4 files changed

+76
-9
lines changed

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,8 @@ torch-stable = [
9595
"transformers>=4.55.0",
9696
# vLLM 0.11.1 requires PyTorch 2.9.0, which is incompatible with flash-attn
9797
# https://github.com/Dao-AILab/flash-attention/issues/1967
98-
"vllm>=0.10.2,!=0.11.1",
98+
# Similar issues with vLLM 0.11.2
99+
"vllm>=0.10.2,!=0.11.1,!=0.11.2",
99100
# LiteLLM can then be upgraded with new vLLM
100101
"litellm[proxy]>=1.78",
101102
]

scripts/build_vm_image.sh

Lines changed: 69 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,12 @@ sudo apt-get install -y \
2525
tmux \
2626
vim \
2727
git-lfs \
28-
nodejs
28+
nodejs \
29+
gnupg2 \
30+
apt-transport-https \
31+
ca-certificates \
32+
gnupg \
33+
lsb-release
2934

3035
git lfs install
3136

@@ -37,9 +42,9 @@ sudo reboot now
3742

3843
# Install CUDA Toolkit
3944
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
40-
sudo dpkg -i cuda-keyring_1.1-1_all.deb
45+
sudo dpkg -i cuda-keyring_1.1-1_all.deb && rm cuda-keyring_1.1-1_all.deb
4146
sudo apt-get update
42-
sudo apt-get -y install cuda-toolkit-12-8
47+
sudo apt-get -y install cuda-toolkit
4348
sudo reboot now
4449

4550
# Add paths globally
@@ -49,6 +54,67 @@ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
4954
EOF
5055
sudo chmod +x /etc/profile.d/cuda.sh
5156

57+
# Add Docker's official GPG key
58+
sudo install -m 0755 -d /etc/apt/keyrings
59+
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
60+
sudo chmod a+r /etc/apt/keyrings/docker.asc
61+
62+
# Add the repository to Apt sources:
63+
sudo tee /etc/apt/sources.list.d/docker.sources <<EOF
64+
Types: deb
65+
URIs: https://download.docker.com/linux/ubuntu
66+
Suites: $(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}")
67+
Components: stable
68+
Signed-By: /etc/apt/keyrings/docker.asc
69+
EOF
70+
71+
sudo apt -y update
72+
73+
# Install the Docker packages
74+
sudo apt -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
75+
76+
# Create docker group only if it doesn't exist
77+
# sudo groupadd docker
78+
79+
# Add current user to docker group if not already a member
80+
sudo usermod -aG docker "$USER"
81+
# A hack to add cloudtest user to docker group as well
82+
sudo sed -i '/^docker:/ s/$/,cloudtest/' /etc/group
83+
# This shouldn't be run on CI
84+
# newgrp docker
85+
86+
# Install NVIDIA Container Toolkit
87+
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
88+
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
89+
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
90+
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
91+
92+
sudo apt-get update
93+
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.18.0-1
94+
sudo apt-get install -y \
95+
nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
96+
nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
97+
libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
98+
libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
99+
100+
# Configure the NVIDIA Container Toolkit
101+
sudo nvidia-ctk runtime configure --runtime=docker
102+
sudo systemctl restart docker
103+
104+
# Install Azure CLI
105+
curl -sLS https://packages.microsoft.com/keys/microsoft.asc |
106+
gpg --dearmor | sudo tee /etc/apt/keyrings/microsoft.gpg > /dev/null
107+
sudo chmod go+r /etc/apt/keyrings/microsoft.gpg
108+
AZ_DIST=$(lsb_release -cs)
109+
echo "Types: deb
110+
URIs: https://packages.microsoft.com/repos/azure-cli/
111+
Suites: ${AZ_DIST}
112+
Components: main
113+
Architectures: $(dpkg --print-architecture)
114+
Signed-by: /etc/apt/keyrings/microsoft.gpg" | sudo tee /etc/apt/sources.list.d/azure-cli.sources
115+
sudo apt-get update
116+
sudo apt-get install -y azure-cli
117+
52118
# Disable the periodical apt-get upgrade.
53119
# Sometimes, unattended upgrade blocks apt-get install
54120
sudo sed -i -e "s/Update-Package-Lists \"1\"/Update-Package-Lists \"0\"/g" /etc/apt/apt.conf.d/10periodic

scripts/litellm_sanity_check.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99

1010
def main() -> None:
11-
client = openai.OpenAI()
11+
client = openai.OpenAI(timeout=30.0)
1212
models = client.models.list()
1313
print("Available models:", models)
1414

uv.lock

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)