Releases · airockchip/rknn-llm · GitHub

08 Apr 10:05

yhcvb

release-v1.2.0 Latest

Latest

Supports custom model conversion.
Supports chat_template configuration.
Enables multi-turn dialogue interactions.
Implements automatic prompt cache reuse for improved inference efficiency.
Expands maximum context length to 16K.
Supports embedding flash storage to reduce memory usage.
Introduces the GRQ Int4 quantization algorithm.
Supports GPTQ-Int8 model conversion.
Compatible with the RK3562 platform.
Added support for visual multimodal models such as InternVL2, Janus, and Qwen2.5-VL.
Supports CPU core configuration.
Added support for Gemma3
Added support for Python 3.9/3.11/3.12

Assets 2

11 Dec 11:39

yhcvb

release-v1.1.4

Add support for converting HuggingFace GPTQ-int4 models (requires groupsize to be 32, 64, or 128, and desc_act set to false).
Add support for TeleChat/TeleChat2/MiniCPM-S models.
Support exporting llm model in Qwen2VL
Resolve issues with LoRA inference.
Fix an import error related to IPython.

Assets 2

05 Nov 07:41

yhcvb

release-v1.1.2

Fix inference error in chatglm3 model
Fix inference issue with embedding input
Support exporting llm model in MiniCPMV

Assets 2

18 Oct 10:17

yhcvb

release-v1.1.1

Fixed the inference error in the minicpm3 mode
Fixed the runtime error in rkllm_server_demo.
Added the rkllm-toolkit installation package for Python 3.10.
Supported gguf model conversion when tie_word_embeddings is set to true.

Assets 2

11 Oct 08:53

yhcvb

release-v1.1.0

Added support for grouped quantization (w4a16 group sizes of 32/64/128, w8a8 group sizes of 128/256/512).
Added gdq algorithm to improve 4-bit quantization accuracy.
Added hybrid quantization algorithm, supporting a combination of grouped and non-grouped quantization based on specified ratios.
Added support for Llama3, Gemma2, and Minicpm3 models.
Added support for gguf model conversion (currently supports q4_0 and fp16 only).
Added support for LoRa models.
Added storage and loading of prompt cache
Added PC-side emulation accuracy testing and inference interface support for rkllm-toolkit.
Fixed catastrophic forgetting issue when the token count exceeds max_context.
Optimized prefill speed.
Optimized generate speed.
Optimized model initialization time
Added support for four input interfaces: prompt, embedding, token, and multimodal.

Assets 2

09 May 09:37

yhcvb

release-v1.0.1

Optimize model conversion memory occupation
Optimize inference memory occupation
Increase prefill speed
Reduce initialization time
Improve quantization accuracy
Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
Add Server invocation
Add inference interruption interface
Add logprob and token_id to the return value

Assets 2