Skip to content

Sequential embedding requests return invalid/zeroed vectors on RK3588 (NPU) #117

@advillalba

Description

@advillalba

When performing sequential requests to the /api/embed endpoint on an RK3588 NPU, only the first request returns a valid vector. Subsequent requests return vectors filled with zeros or consistent garbage values, even though the HTTP response code is 200 OK.

The issue persists until the model is explicitly unloaded from memory.

Steps to Reproduce:

Start the server with an embedding model (e.g., qwen3-embedding:0.6b-rk3588).

Send a POST request to /api/embed.

Result: Valid vector returned at index [0].

Immediately send a second POST request to /api/embed with different text.

Result: The returned vector at index [0] is invalid (contains zeros or static noise artifacts).

Observed Workaround: The issue is resolved if the model is forced to unload between requests. Since passing "keep_alive": 0 directly to the /api/embed endpoint appears to be ignored in this build.

Hardware: Rockchip RK3588
Endpoint: /api/embed
Model: qwen3-embedding (.rkllm)

My container is called ollama, but it's executing rkllama.

Image Image

Execute again without unload:

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions