- Overview
- Features
- Install Platform and App SDK
- Enable Root Access
- Activate the
qeffEnvironment - Download the Pre-compiled Llama 3.3 70B Model
- Extract the Model
- Download the code demo.py in this repository
- Demo
- Example
This guide outlines the steps to implement a chatbot using the Llama 3.3 70B model on the AIC100 Ultra platform.
- Pre-compiled Llama3.3 70B models with 8k ctx_len.
- Python API "QEfficient.generation.text_generation_inference.cloud_ai_100_exec_kv".
- Enabling steam function for smooth inference result.
- while loop to avoid reload model to AIC100 Ultra card.
Follow the official installation guide from Efficient Transformers:
👉 https://quic.github.io/efficient-transformers/source/installation.html
sudo -isource /opt/qti-aic/dev/python/qeff/bin/activateExtract the tarball to your desired directory:
tar -xzvf qpc_16cores_128pl_8192cl_1fbs_4devices_mxfp6_mxint8.tar.gz -C /your/target/folderDownload or directly copy the code demo.py in this repository
python demo.py