openclaw-strix-embed

Local, GPU-accelerated, OpenAI-compatible /v1/embeddings API on AMD Strix Halo. BAAI/bge-m3 by default, no API keys, no fees, no data leaving your network.

What this is

Local, GPU-accelerated, OpenAI-compatible embeddings API. Drop-in replacement for OpenAI's /v1/embeddings endpoint, no API keys, no usage fees, no data leaving your network.

Built for AMD Strix Halo (RDNA 3.5 / gfx1151) with ROCm, but falls back to CPU if no GPU is available.

Why

Every vector database and RAG pipeline needs an embeddings API. The standard options, OpenAI text-embedding-3-small, Google text-embedding-004, charge per token and send your data to external servers. This runs the same API contract locally, for free, on your own hardware.

Project Structure

.
├── .gitignore              # Ignores .env and data/
├── .env.template           # Template, copy to .env
├── README.md               # This file
├── llm.txt                 # Complete technical reference
├── Dockerfile              # Ubuntu Rolling + ROCm PyTorch + FastAPI
├── docker-compose.yml      # Service definition with GPU passthrough
├── entrypoint.sh           # GPU check + uvicorn start
├── server.py               # OpenAI-compatible FastAPI server
└── data/                   # Persistent model cache (git-ignored)
    └── models/             # HuggingFace model files

Prerequisites

Docker with compose plugin
AMD Strix Halo (or any RDNA 3.5 GPU) for GPU mode
~7 GB disk for the model + Docker image

Quick Start

cp .env.template .env
# Edit .env if you want to change the model or port
docker compose up -d --build

First start downloads the model (~4.3 GB). Subsequent starts load from cache in seconds.

Verify:

# Check GPU detection
docker logs openclaw-embeddings

# Test the API
curl http://localhost:8484/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model":"BAAI/bge-m3","input":"Hello world"}'

API

POST /v1/embeddings

OpenAI-compatible. Works with any client that speaks the OpenAI embeddings format.

Request:

{
  "model": "BAAI/bge-m3",
  "input": "text to embed"
}

input accepts a single string or an array of strings for batch embedding.

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0123, -0.044, ...]
    }
  ],
  "model": "BAAI/bge-m3",
  "usage": {"prompt_tokens": 3, "total_tokens": 3}
}

GET /v1/models

Lists available models and their dimensions.

GET /health

Returns {"status": "ok", "model_loaded": true} when ready.

Configuration

All configurable via .env:

Variable	Default	Description
`EMBEDDING_MODEL`	`BAAI/bge-m3`	HuggingFace model ID
`EMBEDDING_PORT`	`8484`	Host port for the API

Model

Default: BAAI/bge-m3

Property	Value
Dimensions	1024
Languages	100+ (excellent EN + ES)
Max tokens	8192
Size	~2.2 GB (weights)

You can swap it for any sentence-transformers compatible model by changing EMBEDDING_MODEL in .env.

Using with Multipass VMs

If OpenClaw runs inside a Multipass VM and this embeddings service runs on the host, localhost won't work, it points to the VM, not the host.

Find the host IP on the Multipass bridge:

# On the host
ip addr show mpqemubr0 | grep 'inet '
# Example output: inet 10.5.162.1/24 ...

Test from inside the VM:

multipass exec <vm-name> -- curl http://10.5.162.1:8484/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model":"BAAI/bge-m3","input":"connectivity test"}'

Configure OpenClaw with:

Setting	Value
Base URL	`http://<host-bridge-ip>:8484/v1`
Model	`BAAI/bge-m3`
Auth	None

Performance

Tested on AMD Ryzen AI Max (Strix Halo) with Radeon 8060S iGPU:

Metric	Value
Latency (single text)	~47ms
Latency (first request, cold)	~270ms
GPU VRAM used	~1.5 GB
Model load time (from cache)	~10s

GPU Details

This container uses the same ROCm setup from rocm-strix-docker:

HSA_OVERRIDE_GFX_VERSION=11.5.1, required for ROCm to recognize Strix Halo
privileged: true, grants /dev/kfd and /dev/dri access for GPU compute
ipc: host, shared memory for PyTorch
PyTorch wheels from https://rocm.prereleases.amd.com/whl/gfx1151/ (ROCm 7.11 prerelease)
UV manages Python 3.12 + all packages (no pip)

License

MIT for original code in this repository (FastAPI server, Dockerfile, Compose configs, scripts). Third-party model weights (BAAI/bge-m3) and runtimes (sentence-transformers, transformers, PyTorch ROCm) retain their own upstream licenses; this repository does not redistribute them.

Verified Output

[embeddings] ========================================
[embeddings] Model: BAAI/bge-m3
[embeddings] ========================================
[embeddings] Checking GPU...
[embeddings] GPU: Radeon 8060S Graphics
[embeddings] VRAM: 124.6 GB
[embeddings] ROCm/HIP: 7.2.53150-7b886380f9
[embeddings] Device: cuda
[embeddings] Starting server on port 80...
[embeddings] Loading BAAI/bge-m3 on cuda
[embeddings] GPU: Radeon 8060S Graphics
[embeddings] Model loaded. Dimension: 1024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

openclaw-strix-embed

What this is

Why

Project Structure

Prerequisites

Quick Start

API

POST /v1/embeddings

GET /v1/models

GET /health

Configuration

Model

Using with Multipass VMs

Performance

GPU Details

License

Verified Output

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
llm.txt		llm.txt
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

openclaw-strix-embed

What this is

Why

Project Structure

Prerequisites

Quick Start

API

POST /v1/embeddings

GET /v1/models

GET /health

Configuration

Model

Using with Multipass VMs

Performance

GPU Details

License

Verified Output

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages