When performing sequential requests to the /api/embed endpoint on an RK3588 NPU, only the first request returns a valid vector. Subsequent requests return vectors filled with zeros or consistent garbage values, even though the HTTP response code is 200 OK.
The issue persists until the model is explicitly unloaded from memory.
Steps to Reproduce:
Start the server with an embedding model (e.g., qwen3-embedding:0.6b-rk3588).
Send a POST request to /api/embed.
Result: Valid vector returned at index [0].
Immediately send a second POST request to /api/embed with different text.
Result: The returned vector at index [0] is invalid (contains zeros or static noise artifacts).
Observed Workaround: The issue is resolved if the model is forced to unload between requests. Since passing "keep_alive": 0 directly to the /api/embed endpoint appears to be ignored in this build.
Hardware: Rockchip RK3588
Endpoint: /api/embed
Model: qwen3-embedding (.rkllm)
My container is called ollama, but it's executing rkllama.
Execute again without unload:

When performing sequential requests to the /api/embed endpoint on an RK3588 NPU, only the first request returns a valid vector. Subsequent requests return vectors filled with zeros or consistent garbage values, even though the HTTP response code is 200 OK.
The issue persists until the model is explicitly unloaded from memory.
Steps to Reproduce:
Start the server with an embedding model (e.g., qwen3-embedding:0.6b-rk3588).
Send a POST request to /api/embed.
Result: Valid vector returned at index [0].
Immediately send a second POST request to /api/embed with different text.
Result: The returned vector at index [0] is invalid (contains zeros or static noise artifacts).
Observed Workaround: The issue is resolved if the model is forced to unload between requests. Since passing "keep_alive": 0 directly to the /api/embed endpoint appears to be ignored in this build.
Hardware: Rockchip RK3588
Endpoint: /api/embed
Model: qwen3-embedding (.rkllm)
My container is called ollama, but it's executing rkllama.
Execute again without unload: