Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions packer/cpu/buildkite-cpu-ami.pkr.hcl
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,11 @@ build {
source = "vllm-cache-source"
}

# Configure network settings for high-throughput operations
provisioner "shell" {
script = "scripts/configure-network.sh"
}

# Install BuildKit as standalone systemd service (runs as ec2-user with sudo)
provisioner "shell" {
script = "scripts/install-build-tools.sh"
Expand Down
50 changes: 50 additions & 0 deletions packer/cpu/scripts/configure-network.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/bin/bash
set -eu -o pipefail

# Network tuning for high-throughput container image operations
# Optimized for r6in instances with 100Gbps networking

echo "=== Configuring network sysctl settings ==="

cat <<'EOF' | sudo tee /etc/sysctl.d/99-vllm-network.conf
# Network tuning for high-throughput Docker builds
# Reference: https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions 'Reference: https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html' but this URL appears to be for AWS Data Transfer Terminal, which may not be the most relevant reference for general Docker/ECR network tuning. Consider updating the reference to a more appropriate AWS documentation page for network optimization or removing it if not directly relevant.

Suggested change
# Reference: https://docs.aws.amazon.com/datatransferterminal/latest/userguide/tech-requirements.html

Copilot uses AI. Check for mistakes.

# BBR congestion control - helps sustained ECR transfers
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

# Avoid slow start after idle - helps frequent connections
net.ipv4.tcp_slow_start_after_idle = 0

# Reasonable buffers (enough for ECR rate)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 1048576 16777216
net.ipv4.tcp_wmem = 4096 1048576 16777216
EOF

# Apply sysctl settings
sudo sysctl -p /etc/sysctl.d/99-vllm-network.conf

# -----------------------------------------------------------------------------
# Docker daemon configuration for high-throughput registry operations
# -----------------------------------------------------------------------------
echo "=== Configuring Docker daemon ==="

# Update Docker daemon config to increase concurrent downloads/uploads
# Use jq to merge with existing config, or create new if doesn't exist
if [[ -f /etc/docker/daemon.json ]]; then
# Merge with existing config
sudo jq '. + {"max-concurrent-downloads": 16, "max-concurrent-uploads": 16}' /etc/docker/daemon.json | sudo tee /etc/docker/daemon.json.tmp
sudo mv /etc/docker/daemon.json.tmp /etc/docker/daemon.json
Comment on lines +38 to +40
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jq command reads from /etc/docker/daemon.json and writes output to a temporary file, but if jq fails for any reason (e.g., invalid JSON in the original file), the temporary file might be left in place or the original file could be left in an inconsistent state. Consider adding error handling to ensure atomic updates and cleanup.

Suggested change
# Merge with existing config
sudo jq '. + {"max-concurrent-downloads": 16, "max-concurrent-uploads": 16}' /etc/docker/daemon.json | sudo tee /etc/docker/daemon.json.tmp
sudo mv /etc/docker/daemon.json.tmp /etc/docker/daemon.json
# Merge with existing config using a temporary file for atomic update
tmpfile="$(mktemp /tmp/daemon.json.XXXXXX)"
trap 'rm -f "$tmpfile"' EXIT
if sudo jq '. + {"max-concurrent-downloads": 16, "max-concurrent-uploads": 16}' /etc/docker/daemon.json >"$tmpfile"; then
sudo mv "$tmpfile" /etc/docker/daemon.json
fi

Copilot uses AI. Check for mistakes.
else
# Create new config
echo '{"max-concurrent-downloads": 16, "max-concurrent-uploads": 16}' | sudo tee /etc/docker/daemon.json
fi

# Restart Docker to apply new config
sudo systemctl restart docker
echo "Docker daemon configured and restarted"

echo "=== Network configuration complete ==="