Sync assistant docs with the multi-provider model

claude · frousselet · commit 61dd994c02da · 2026-06-14T19:52:23.000+02:00
The feature/overview docs still described Ask Cairn as an Ollama-sidecar,
local-only feature. The Ollama compose sidecar (the `ai` profile) no longer
exists and the default backend has been Mistral since 0.27.0, so the
installation steps were broken and the descriptions inaccurate.

Rewrite docs/installation.md around the pluggable provider model (Mistral
default, plus openai / anthropic / self-hosted ollama), and make the assistant
references in docs/api.md, docs/features.md, docs/mcp-server.md and
docs/modules/README.md provider-neutral. The canonical per-provider setup
stays in docs/modules/assistant/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/api.md b/docs/api.md
@@ -47,10 +47,10 @@ GET  /api/v1/auth/me/        # current user profile
 
 ## Assistant (Ask Cairn)
 
-`POST /api/v1/assistant/ask/` answers a simple natural-language question using the optional local AI assistant (see [docs/modules/assistant/](modules/assistant/README.md)).
+`POST /api/v1/assistant/ask/` answers a simple natural-language question using the optional AI assistant (pluggable LLM provider; see [docs/modules/assistant/](modules/assistant/README.md)).
 
 Request body: `{"q": "Quelles décisions ont été prises lors de la dernière revue de direction ?", "language": "fr"}` (`language` optional, defaults to the request language).
 
 Response `200`: `{"summary": "...", "language": "fr", "degraded": false, "refused_tools": [], "results": [{"tool": "list_management_review_decisions", "label": "Decisions", "error": null, "records": [{"title": "DECS-1 ...", "subtitle": "pending", "url": "/reports/decisions/<uuid>/", "icon": "bi-check2-square"}]}]}`. Records are real database objects the caller is allowed to read; the summary sentence is AI-generated and must be verified against them.
 
-Errors: `400` on invalid `q`; `503` with a stable code (`assistant_disabled`, `assistant_unreachable`, `model_missing`, `model_error`) when the assistant or its Ollama sidecar is unavailable.
+Errors: `400` on invalid `q`; `503` with a stable code (`assistant_disabled`, `assistant_unreachable`, `model_missing`, `model_error`) when the assistant is disabled or its configured LLM provider is unavailable.
diff --git a/docs/features.md b/docs/features.md
@@ -107,7 +107,7 @@ Detailed feature reference for Cairn. For module-level specifications (business
 | Real-Time Dashboard | WebSocket-powered live statistics via Django Channels with animated counters and auto-reconnect |
 | Calendar & iCal | Unified calendar view across all modules with iCal subscription feed and per-user tokens |
 | Global Search | Multi-category search across all domain objects |
-| Ask Cairn (optional) | Natural-language questions in the command palette, answered by a small local AI model (Ollama sidecar, `ai` compose profile) routing to read-only data tools with the caller's permissions; answers cite real records under an AI-labeled summary, and data never leaves the host. See [docs/modules/assistant/](modules/assistant/README.md) |
+| Ask Cairn (optional) | Natural-language questions in the command palette, answered by a pluggable LLM provider (Mistral AI by default; OpenAI / any OpenAI-compatible endpoint; Claude; self-hosted Ollama) routing to read-only data tools with the caller's permissions; answers cite real records under an AI-labeled summary. Off by default. See [docs/modules/assistant/](modules/assistant/README.md) |
 | Reports | Configurable report generation (SoA PDF, Audit report PDF, Management review PPTX/DOCX) with status tracking |
 | Management reviews | Persistent ISO 27001:2022 clause 9.3 workflow with life cycle, decisions, ISMS changes, participants, snapshot-based auditability, and retrochaining to action plans, treatment plans, and objectives |
 | Stakeholder feedback | Formal feedback channel (clause 9.3.2.e) with sentiment, severity, and traceability to issues and expectations |
diff --git a/docs/installation.md b/docs/installation.md
@@ -103,22 +103,19 @@ Then sign in with `elise.moreau@voltara.example` / `VoltaraDemo!2026` (superuser
 
 ## AI assistant (optional)
 
-"Ask Cairn" answers simple natural-language questions from the command palette (Ctrl+K), e.g. *"Quelles décisions ont été prises lors de la dernière revue de direction ?"*. It runs entirely on your host through an [Ollama](https://ollama.com/) sidecar: no data leaves the machine, and every data access enforces the asking user's permissions.
+"Ask Cairn" answers simple natural-language questions from the command palette (Ctrl+K), e.g. *"Quelles décisions ont été prises lors de la dernière revue de direction ?"*. Every data access enforces the asking user's permissions, and the answer cites the real matching records. The LLM backend is a **pluggable provider** (`AI_ASSISTANT_PROVIDER`); the feature is **off by default** and the palette works unchanged when it is disabled or the backend is unreachable.
 
-```bash
-# 1. Start the sidecar (the `ai` profile is opt-in)
-docker compose --profile ai up -d
-
-# 2. Pull the model once (kept in the ollama_models volume)
-docker compose exec ollama ollama pull qwen3:1.7b
+Default (Mistral AI, third-party EU-hosted API): no sidecar, no model download, no GPU.
 
-# 3. Enable the feature in .env, then restart web
-# AI_ASSISTANT_ENABLED=True
+```bash
+# In .env, then restart web:
+AI_ASSISTANT_ENABLED=True
+AI_ASSISTANT_PROVIDER=mistral
+AI_ASSISTANT_API_KEY=your-mistral-api-key
+AI_ASSISTANT_MODEL=mistral-small-latest
 ```
 
-Sizing: the default `qwen3:1.7b` model needs roughly 2-4 GB of RAM (CPU-only inference). The first question after startup loads the model (10-20 extra seconds); warm questions take about 5-30 seconds. Any Ollama chat model can be used instead via `AI_ASSISTANT_MODEL`. Without the profile (or if Ollama is down) the palette works exactly as before.
-
-On macOS, Docker containers cannot use the Metal GPU, which caps you at small models. For noticeably better answer phrasing, install the [native Ollama app](https://ollama.com/download) on the host, pull a 4B-class model (`ollama pull qwen3:4b`), and point Cairn at it instead of the compose profile: `AI_ASSISTANT_OLLAMA_URL=http://host.docker.internal:11434` and `AI_ASSISTANT_MODEL=qwen3:4b` in `.env` (do not start the `ai` profile then, to avoid a port conflict on 11434). Model guidance and measurements: [docs/modules/assistant/](modules/assistant/README.md).
+Other backends are configured the same way: `openai` for OpenAI (ChatGPT) or any OpenAI-compatible endpoint (vLLM, LiteLLM, LocalAI...), `anthropic` for Claude, and `ollama` for a self-hosted, no-egress deployment pointed at your own [Ollama](https://ollama.com/) instance. With a third-party provider, the question text and the compact record fields used for routing leave the platform. Provider setup, model guidance, the data-egress detail and semantic search are all documented in [docs/modules/assistant/](modules/assistant/README.md).
 
 ## Scheduled lifecycle commands
 
diff --git a/docs/mcp-server.md b/docs/mcp-server.md
@@ -173,4 +173,4 @@ Additional tools:
 
 | Tool | Description |
 | ---- | ----------- |
-| `ask_assistant` | Ask the Ask Cairn natural-language assistant a read-only question about GRC data (e.g. "Which decisions were made at the last management review?"). Requires the optional Ollama sidecar (`AI_ASSISTANT_ENABLED`); the answer cites real records and data access enforces the caller's permissions. See [docs/modules/assistant/](modules/assistant/README.md). |
+| `ask_assistant` | Ask the Ask Cairn natural-language assistant a read-only question about GRC data (e.g. "Which decisions were made at the last management review?"). Requires the optional assistant feature (`AI_ASSISTANT_ENABLED`, backed by a pluggable LLM provider); the answer cites real records and data access enforces the caller's permissions. See [docs/modules/assistant/](modules/assistant/README.md). |
diff --git a/docs/modules/README.md b/docs/modules/README.md
@@ -15,7 +15,7 @@ docs/modules/
 │   └── ebios-rm/                EBIOS RM workshops (W0-W5) per ANSSI v1.5
 ├── management-review/           ISO 27001 §9.3 management review entities
 ├── governance/                  Cross-cutting platform governance (lifecycle workflow framework)
-└── assistant/                   Ask Cairn: optional local AI question mode (no persistent entities)
+└── assistant/                   Ask Cairn: optional AI question mode, pluggable LLM provider (no persistent entities)
 ```
 
 Each module directory contains: