PyLet Examples

Configuration Files

PyLet supports TOML configuration files for defining instance configurations. See configs/ for examples.

Using Config Files

# Submit using config file
pylet submit --config configs/inference.toml

# Override config values with CLI args
pylet submit --config configs/inference.toml --gpu-units 0  # CPU-only for testing

# Config file with additional command
pylet submit --config configs/simple.toml echo "extra args"

Config File Format

# job.toml - Example config
name = "my-instance"              # Optional, defaults to filename
command = ["python", "train.py"]  # Array format (recommended)
# command = "python train.py"     # String format also works

[resources]
gpus = 1                          # GPU count (auto-allocated)
cpus = 4                          # CPU cores
memory = "16Gi"                   # Memory (Gi, Mi, Ki units)

[env]
HF_TOKEN = "${HF_TOKEN}"          # Interpolate from shell environment
STATIC_VAR = "fixed_value"        # Static value

[labels]
type = "inference"                # Custom metadata

Precedence

When the same setting is specified in multiple places, the highest priority wins:

Priority	Source	Example
1 (highest)	CLI Arguments	`--gpu-units 0`
2	Config File	`gpus = 1` in job.toml
3 (lowest)	Defaults	`gpus = 0`

Python API

See start_vllm.py for a complete example of using PyLet from Python:

from pylet.client import PyletClient

async def main():
    client = PyletClient("http://localhost:8000")

    # Submit instance
    instance_id = await client.submit_instance(
        command="vllm serve Qwen/Qwen2.5-1.5B-Instruct --port $PORT",
        resource_requirements={"cpu_cores": 1, "gpu_units": 1, "memory_mb": 4096},
        name="my-vllm",
    )

    # Wait for RUNNING, then get endpoint
    endpoint = await client.get_instance_endpoint(instance_id)

    # Send requests to http://{endpoint}/v1/completions
    # ...

    # Cleanup
    await client.cancel_instance(instance_id)
    await client.close()

Running vLLM on PyLet

Prerequisites

vLLM installed: pip install vllm
A machine with GPU(s)

Step-by-step

# Terminal 1: Start head node
pylet start

# Terminal 2: Start worker with GPU(s)
pylet start --head localhost:8000 --gpu-units 1

# Terminal 3: Submit vLLM instance
# Use $PORT so vLLM binds to the worker-allocated port
pylet submit 'vllm serve Qwen/Qwen2.5-1.5B-Instruct --port $PORT' \
    --gpu-units 1 --name vllm-test

# Check instance status
pylet get-instance --name vllm-test

# Get endpoint (wait for RUNNING status)
pylet get-endpoint --name vllm-test
# Output: 192.168.1.10:15600

# Test inference
curl http://<endpoint>/v1/models
curl http://<endpoint>/v1/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"Qwen/Qwen2.5-1.5B-Instruct","prompt":"Hello","max_tokens":50}'

# View logs (streams in real-time)
pylet logs <instance-id>
pylet logs <instance-id> --follow

# Cancel instance (graceful shutdown)
pylet cancel <instance-id>

Key behaviors

Feature	How it works
Port	Worker sets `PORT` env var (15600-15700). Use `--port $PORT` in your command.
GPU	Worker sets `CUDA_VISIBLE_DEVICES` based on allocated GPUs.
Endpoint	`pylet get-endpoint` returns `worker_ip:port` for client access.
Logs	Captured via sidecar, available in real-time via `pylet logs`.
Cancel	Sends SIGTERM, waits 30s grace period, then SIGKILL if needed.

Running SGLang

pylet submit 'python -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --port $PORT' \
    --gpu-units 1 --name sglang-test

Multi-GPU instance

# Worker with 4 GPUs
pylet start --head localhost:8000 --gpu-units 4

# Request 2 GPUs for tensor parallelism
pylet submit 'vllm serve meta-llama/Llama-3.1-70B-Instruct --port $PORT --tensor-parallel-size 2' \
    --gpu-units 2 --name llama70b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyLet Examples

Configuration Files

Using Config Files

Config File Format

Precedence

Python API

Running vLLM on PyLet

Prerequisites

Step-by-step

Key behaviors

Running SGLang

Multi-GPU instance

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PyLet Examples

Configuration Files

Using Config Files

Config File Format

Precedence

Python API

Running vLLM on PyLet

Prerequisites

Step-by-step

Key behaviors

Running SGLang

Multi-GPU instance