This guide covers the sprint-owned local and self-hosted provider paths:
POST /v1/providers/ollama/registerPOST /v1/providers/llamacpp/registerPOST /v1/providers/vllm/registerPOST /v1/providers/testPOST /v1/runtime/invokeGET /v1/providersGET /v1/providers/{provider_id}
Scope note: this page documents Ollama, llama.cpp / llama-server, and self-hosted vLLM.
- Start Alice API and data services.
- Authenticate and obtain a hosted session bearer token.
- Have a thread ID available for runtime invoke.
- Run at least one local model backend:
- Ollama server (default
http://127.0.0.1:11434) - llama.cpp server (default
http://127.0.0.1:8080) - vLLM server (recommended
http://127.0.0.1:8001)
- Ollama server (default
curl -sS -X POST "http://127.0.0.1:8000/v1/providers/ollama/register" \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"display_name": "Ollama Local",
"base_url": "http://127.0.0.1:11434",
"default_model": "llama3.2:latest"
}'curl -sS -X POST "http://127.0.0.1:8000/v1/providers/llamacpp/register" \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"display_name": "llama.cpp Local",
"base_url": "http://127.0.0.1:8080",
"default_model": "Meta-Llama-3.1-8B-Instruct"
}'curl -sS -X POST "http://127.0.0.1:8000/v1/providers/vllm/register" \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"display_name": "vLLM Self-Hosted",
"base_url": "http://127.0.0.1:8001",
"default_model": "mistral-small-instruct"
}'curl -sS -X POST "http://127.0.0.1:8000/v1/providers/test" \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider_id": "'"$PROVIDER_ID"'",
"prompt": "Reply with one sentence confirming local connectivity."
}'Capability snapshots include deterministic local model enumeration and health posture fields:
health_statushealth_endpointmodels_endpointinvoke_endpointmodel_countmodels
curl -sS -X POST "http://127.0.0.1:8000/v1/runtime/invoke" \
-H "Authorization: Bearer $SESSION_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"provider_id": "'"$PROVIDER_ID"'",
"thread_id": "'"$THREAD_ID"'",
"message": "Summarize current runtime status in one sentence."
}'Use the sprint helper script for a full register/test/invoke flow:
./scripts/run_phase11_local_provider_e2e.py \
--session-token "$SESSION_TOKEN" \
--thread-id "$THREAD_ID" \
--provider ollama \
--model llama3.2:latestOr for llama.cpp:
./scripts/run_phase11_local_provider_e2e.py \
--session-token "$SESSION_TOKEN" \
--thread-id "$THREAD_ID" \
--provider llamacpp \
--model Meta-Llama-3.1-8B-InstructOr for vLLM:
./scripts/run_phase11_local_provider_e2e.py \
--session-token "$SESSION_TOKEN" \
--thread-id "$THREAD_ID" \
--provider vllm \
--model mistral-small-instruct