Skip to content
Bojian Zheng edited this page Aug 7, 2023 · 41 revisions

Thank you for your interest in Grape 🍇.

Installation

Please note that since part of Grape 🍇 involves communicating with the NVIDIA GPU kernel module via the /proc filesystem (which, to my best knowledge, cannot be handled easily with Docker), all of the following steps are done natively (i.e., outside the Docker environment). We also assume that the OS is either Ubuntu 20.04 or 22.04.

  1. Checkout Grape's 🍇 source code:

    git clone https://github.com/UofT-EcoSystem/Grape-MICRO56-Artifact
  2. Make sure that common software dependencies are installed properly:

    ./scripts/Installation/0-install_build_essentials.sh
  3. Install our customized NVIDIA GPU driver and then reboot the machine:

    ./scripts/Installation/1-install_NVIDIA_GPU_driver.sh
    sudo reboot

    When the machine is rebooted, make sure that the message NVRM: loading customized kernel module from Grape appears when running the command sudo dmesg. If it does not, reinstall the GPU driver and then reboot again:

    # Note the `--reinstall` option.
    ./scripts/Installation/1-install_NVIDIA_GPU_driver.sh --reinstall
    sudo reboot
  4. Install CUDA:

    ./scripts/Installation/2-install_CUDA.sh
  5. Build PyTorch:

    ./scripts/Installation/3-build_PyTorch.sh
  6. Checkout the HuggingFace Transformers submodule (no building or installation is required):

    git submodule update --init submodules/transformers
  7. Finally, use the activate script to modify the environment variables accordingly:

    source scripts/Installation/activate

Experiment Workflow

Metadata Compression (Figure 13)

The script

./scripts/Experiment_Workflow/1-test_metadata_compression.sh

runs the experiments that compress CUDA graphs' memory regions and calculate the compression ratios for different models. At the end of the experiments, the results are dumped into a CSV file named "metadata_compression.csv" and visualized as follows:

Model    Original Size Compressed Size
GPT-2    ___           ___
GPT-J    ___           ___
Wav2Vec2 ___           ___

Runtime Performance (Figure 11)

./scripts/Experiment_Workflow/2-test_runtime_performance.sh --model=gpt2
./scripts/Experiment_Workflow/2-test_runtime_performance.sh --model=gptj
./scripts/Experiment_Workflow/2-test_runtime_performance.sh --model=wav2vec2

runs the experiments that measure the runtime performance of different models (under 3 different settings, namely Baseline, PtGraph, and Grape as described in the paper). At the end of the experiments, the results are dumped into a CSV file named "speedometer.csv" and visualized as follows:

Name     Attrs              Avg Std min Median MAX
Baseline {"Model": "GPT-2"} ___ ___ ___ ___    ___
PtGraph  {"Model": "GPT-2"} ___ ___ ___ ___    ___
Grape    {"Model": "GPT-2"} ___ ___ ___ ___    ___
...

Runtime Breakdown (Figure 12)

Clone this wiki locally