All the code for the hands-on exercies can be found in this repository.
Table of Contents
To request an account on Zaratan, please join slack at the link above, and fill this Google form.
We have pre-built the dependencies required for this tutorial on Zaratan. This will be activated automatically when you run the bash scripts.
Model weights and the training dataset have
been downloaded in /scratch/zt1/project/sc25/shared/.
CONFIG_FILE=configs/single_gpu.json sbatch train_single.shOpen configs/single_gpu.json and change precision to bf16-mixed and then run -
CONFIG_FILE=configs/single_gpu.json sbatch train_single.shCONFIG_FILE=configs/ddp.json sbatch train_multi.shCONFIG_FILE=configs/fsdp.json sbatch train_multi.shCONFIG_FILE=configs/axonn.json sbatch train_multi.shAdd more prompts to data/inference/prompts.txt if you want. Then run
CONFIG_FILE=configs/inference_yalis.json sbatch infer_single.shOpen infer.sh and change YALIS_DISABLE_COMPILE from 1 to 0. Then run
CONFIG_FILE=configs/inference_yalis.json sbatch infer_single.shOpen infer.sh and change YALIS_DISABLE_DECODE_CUDAGRAPHS from 1 to 0 (make sure torch compile is also enabled). Then run
CONFIG_FILE=configs/inference_yalis.json sbatch infer_single.shCONFIG_FILE=configs/inference_yalis.json sbatch infer_multi.shQuery the vllm server we setup as follows:
# Usage: ./llm_request.sh <server_ip> "<prompt>" [max_tokens]
./llm_request.sh <vLLM Server IP> "San Francisco is a" 64Change the prompt and the max tokens argument to play around with command