Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions examples/rkllm/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
!librkllmrt.so
135 changes: 135 additions & 0 deletions examples/rkllm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# RKLLM Workflow (Conversion + C++/Python Inference on Axon)

This `rkllm/` folder contains:
- `convert.py`: converts a Hugging Face model directory to `.rkllm`
- `inference.cpp`: RKLLM C++ runtime inference app
- `inference.py`: RKLLM Python runtime inference app (ctypes wrapper)
- `dataset.json`: optional calibration dataset
- `rkllm.h` + `librkllmrt.so`: RKLLM C++/Python runtime build dependencies

Recommended flow:
1. Convert once on host machine
2. Copy the generated `.rkllm` model to Axon
3. Run inference on Axon using either C++ or Python

## 0) Get Started

```bash
git clone https://github.com/vicharak-in/Axon-NPU-Guide.git
cd Axon-NPU-Guide/rkllm
```

`rkllm/` is the working folder for this guide and includes the required runtime files.

## 1) Common Conversion

### 1.1 Create environment + get toolkit

```bash
python3 -m venv venv-rkllm
source venv-rkllm/bin/activate

git clone https://github.com/airockchip/rknn-llm.git
```

If using Python 3.12:

```bash
export BUILD_CUDA_EXT=0
pip install rknn-llm/rkllm-toolkit/packages/rkllm_toolkit-1.2.3-cp312-cp312-linux_x86_64.whl
```

If you hit `No module named pkg_resources`:

```bash
pip install "setuptools==68.0.0"
```

### 1.2 Download model from Hugging Face

```bash
sudo apt install -y git-lfs
git lfs install

git clone https://huggingface.co/Qwen/Qwen3-0.6B
# Example alternative:
# git clone https://huggingface.co/Qwen/Qwen2-1.5B
```

### 1.3 Convert to RKLLM

Qwen3-0.6B example:

```bash
python3 convert.py -i ./Qwen3-0.6B -o <output-file-name.rkllm> --device cpu --dtype float32 --quantized-dtype w8a8 --quantized-algorithm normal --optimization-level 1 --num-npu-core 3 --target-platform rk3588 --max-context 4096
```
Add the flag: --dataset <path/to/dataset.json> when using a calibration dataset.

Notes:
- Use `--dataset dataset.json` to enable calibration dataset quantization.
- `--max-context` must be `>0`, `<=16384`, and a multiple of `32`.
- `--quantized-algorithm grq/gdq` requires `--device cuda` in `convert.py`.

After conversion, copy only the generated `.rkllm` model file to your Axon `rkllm/` folder.

---

## 2) C++ Inference on Axon

### 2.1 Compile

```bash
g++ -O2 -std=c++17 -I. inference.cpp -L. -lrkllmrt -Wl,-rpath,'$ORIGIN' -o inference
```
> Note: keep the `librkllmrt.so` file and the `rkllm.h` file in the same directory as the inference.cpp file for the above command to work.

### 2.2 Run

```bash
./inference --model <path/to/model.rkllm> --target-platform rk3588 --stream --print-perf --keep-history
```

Useful behavior:
- If `--prompt` is not passed, it starts interactive mode.
- Interactive commands:
- `exit`
- `clear` (clears KV cache)

---

## 3) Python Inference on Axon (with venv)

`inference.py` uses only Python stdlib + `librkllmrt.so`, so a lightweight venv is enough.

### 3.1 Create env

```bash
python3 -m venv venv-rkllm
source venv-rkllm/bin/activate
```

### 3.2 Inference

Single-shot prompt

```bash
python3 inference.py -m <path/to/model.rkllm> --target-platform rk3588 --stream --print-perf --prompt "Hello"
```

Keep chat history across turns (interactive mode):

```bash
python3 inference.py -m <path/to/model.rkllm> --target-platform rk3588 --stream --print-perf --keep-history
```

---

## 4) Troubleshooting

- `ModuleNotFoundError: No module named 'pkg_resources'`
- Run: `pip install "setuptools==68.0.0"`

- `OSError: librkllmrt.so: cannot open shared object file`
- Pass `--runtime-lib /full/path/librkllmrt.so` or set `LD_LIBRARY_PATH`.
- Confirm you are using the Linux `aarch64` runtime on Axon.
---
42 changes: 42 additions & 0 deletions examples/rkllm/dataset.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
[
{"input":"Explain gravity in simple terms.","target":"Gravity is the force that attracts objects with mass toward each other."},
{"input":"What is artificial intelligence?","target":"Artificial intelligence is the field of creating machines that can perform tasks requiring human-like intelligence."},
{"input":"Describe how photosynthesis works.","target":"Photosynthesis is the process plants use to convert sunlight, water, and carbon dioxide into glucose and oxygen."},
{"input":"What causes earthquakes?","target":"Earthquakes occur when tectonic plates suddenly shift along faults in the Earth's crust."},
{"input":"Explain the purpose of the Internet.","target":"The Internet connects computers worldwide to share information, services, and communication."},
{"input":"What is machine learning?","target":"Machine learning is a branch of AI that enables computers to learn patterns from data without explicit programming."},
{"input":"Define climate change.","target":"Climate change refers to long-term shifts in temperature and weather patterns caused largely by human activity."},
{"input":"Explain the concept of black holes.","target":"Black holes are extremely dense regions of space where gravity is so strong that nothing can escape."},
{"input":"What is a computer algorithm?","target":"An algorithm is a step-by-step procedure used to solve a problem or perform a computation."},
{"input":"Describe the water cycle.","target":"The water cycle describes how water evaporates, condenses into clouds, and returns to Earth as precipitation."},
{"input":"What is the purpose of education?","target":"Education helps people gain knowledge, develop skills, and understand the world around them."},
{"input":"Explain renewable energy.","target":"Renewable energy comes from sources like sunlight, wind, and water that naturally replenish."},
{"input":"What is quantum computing?","target":"Quantum computing uses quantum mechanics principles like superposition and entanglement to process information."},
{"input":"Describe neural networks.","target":"Neural networks are machine learning models inspired by the brain that learn patterns through layers of interconnected nodes."},
{"input":"What is data science?","target":"Data science combines statistics, computing, and domain knowledge to extract insights from data."},
{"input":"Explain natural language processing.","target":"Natural language processing allows computers to understand, interpret, and generate human language."},
{"input":"What is the role of satellites?","target":"Satellites orbit Earth to provide communication, navigation, weather monitoring, and scientific observation."},
{"input":"Describe the solar system.","target":"The solar system consists of the Sun and the celestial bodies that orbit it, including planets and asteroids."},
{"input":"What is cybersecurity?","target":"Cybersecurity protects computer systems and networks from attacks, theft, and damage."},
{"input":"Explain blockchain technology.","target":"Blockchain is a decentralized digital ledger that securely records transactions across many computers."},
{"input":"What is robotics?","target":"Robotics is the engineering field focused on designing and building machines that perform automated tasks."},
{"input":"Describe cloud computing.","target":"Cloud computing provides computing resources like servers and storage over the internet."},
{"input":"What is big data?","target":"Big data refers to extremely large datasets that require advanced tools for processing and analysis."},
{"input":"Explain computer vision.","target":"Computer vision enables machines to interpret and understand visual information from images and videos."},
{"input":"What is deep learning?","target":"Deep learning is a subset of machine learning using multi-layer neural networks to model complex patterns."},
{"input":"Describe autonomous vehicles.","target":"Autonomous vehicles use sensors, AI, and control systems to navigate without human drivers."},
{"input":"What is edge computing?","target":"Edge computing processes data closer to where it is generated to reduce latency and bandwidth usage."},
{"input":"Explain the concept of sustainability.","target":"Sustainability involves using resources responsibly so future generations can also meet their needs."},
{"input":"What is genetic engineering?","target":"Genetic engineering involves modifying the DNA of organisms to achieve specific traits."},
{"input":"Describe the function of DNA.","target":"DNA stores genetic instructions used in the growth and functioning of living organisms."},
{"input":"What is a database?","target":"A database is an organized collection of structured information that can be easily accessed and managed."},
{"input":"Explain operating systems.","target":"An operating system manages hardware resources and provides services for computer programs."},
{"input":"What is virtualization?","target":"Virtualization allows multiple virtual machines to run on a single physical computer."},
{"input":"Describe the purpose of APIs.","target":"APIs allow software systems to communicate and exchange data with each other."},
{"input":"What is an embedded system?","target":"An embedded system is a specialized computer designed to perform dedicated functions within larger devices."},
{"input":"Explain sensor technology.","target":"Sensors detect physical signals like temperature or light and convert them into measurable data."},
{"input":"What is satellite imaging?","target":"Satellite imaging captures images of Earth using sensors mounted on orbiting satellites."},
{"input":"Describe machine perception.","target":"Machine perception enables computers to interpret sensory data such as images, sound, or motion."},
{"input":"What is distributed computing?","target":"Distributed computing uses multiple computers working together to solve large computational problems."},
{"input":"Explain artificial neural networks.","target":"Artificial neural networks are computational systems inspired by biological neural networks."}
]
Loading