You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Include vison capabilities for llama.cpp integration. Better error handling. Better documentation in code. Change notation for config.ini for GGUF models
Copy file name to clipboardExpand all lines: README.md
+42-39Lines changed: 42 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# RKLLama: LLM Server and Client for Rockchip 3588/3576
2
2
3
-
### [Version: 0.0.69](#New-Version)
3
+
### [Version: 0.0.70](#New-Version)
4
4
5
5
Video demo ( version 0.0.1 ):
6
6
@@ -533,10 +533,12 @@ The structure of a GGUF model is similar to rkllm models. You need a folder with
533
533
└── qwen3.5-4b:q8_0
534
534
└── model.gguf (can have any name but must end in .gguf)
535
535
└── config.ini (optional)
536
+
└── mmproj.gguf (optional - can have any name but is recommended tu include substring 'mmproj' in the name. Must end in .gguf. Only apply for multimodal models for vision capabilities)
537
+
536
538
537
539
```
538
540
539
-
The contents of the config.ini are llama.cpp environment vars for RKNPU inference explained by the author of the fork: https://github.com/invisiofficial/rk-llama.cpp/tree/rknpu2/ggml/src/ggml-rknpu2 (RKNPU_DOMAINS variable is skipped because rkllama handles it) and llama.cpp argument for the llama-server process: https://github.com/invisiofficial/rk-llama.cpp/blob/rknpu2/tools/server/README.md
541
+
The contents of the config.ini are llama.cpp environment vars for RKNPU inference explained by the author of the fork: https://github.com/invisiofficial/rk-llama.cpp/tree/rknpu2/ggml/src/ggml-rknpu2 (RKNPU_DOMAINS variable is skipped because rkllama handles it) and llama.cpp arguments for the llama-server process: https://github.com/invisiofficial/rk-llama.cpp/blob/rknpu2/tools/server/README.md (Your are only allowed to use arguments that starts with '--')
540
542
541
543
Some examples of config.ini files:
542
544
@@ -549,20 +551,20 @@ The structure of a GGUF model is similar to rkllm models. You need a folder with
interval (float): Seconds to wait between retry attempts.
@@ -671,8 +685,8 @@ def wait_for_service(
671
685
stdout, _=process.communicate()
672
686
673
687
# Kill the process
674
-
server_process.kill()
675
-
server_process.wait(timeout=5)
688
+
process.kill()
689
+
process.wait(timeout=5)
676
690
677
691
# Check if insufficient memory in the current domain
678
692
if"RKNPU ERROR: Out of memory in allowed IOMMU domains"instdout:
@@ -687,6 +701,11 @@ def wait_for_service(
687
701
# requests.get() waits for the server response unless a timeout is set [InlineCitation-1-Guide to Handling Python Requests Timeout](https://oxylabs.io/blog/python-requests-timeout)
688
702
response=requests.get(url, timeout=timeout)
689
703
ifresponse.status_code==expected_status:
704
+
705
+
# Wait for warm up subprocess to prevent error:
706
+
# requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
707
+
logger.debug(f"Waiting to finish warmup subprocess for llama-server...")
708
+
time.sleep(5)
690
709
returnTrue, None
691
710
692
711
exceptrequests.RequestException:
@@ -697,8 +716,8 @@ def wait_for_service(
697
716
logger.error(f"Timeout waiting for llama-server process to start....")
0 commit comments