Update docs for reranker

w1z1x0 · w1z1x0 · commit 7a71334862b6 · 2026-04-27T00:28:36.000+08:00
diff --git a/docs/docs/components/clients/consistentmi.md b/docs/docs/components/clients/consistentmi.md
@@ -24,7 +24,7 @@ ConsistentMI simulates clients in motivational interviewing (MI) sessions with c
 
 1. **Load Profile**: Reads the character JSON (personas, beliefs, acceptable plans, motivation topics) and initializes `stage` and `receptivity`.
 2. **Initialize Prompts**: Builds a system prompt that anchors the client’s behavior/goal and injects personas + beliefs for consistency.
-3. **Track Topic Engagement**: Matches the therapist’s latest utterance to a motivation topic, then uses the topic graph distance to update `engagement` and count repeated off-topic turns.
+3. **Track Topic Engagement**: Matches the therapist’s latest utterance to a motivation topic using a reranker-backed topic matcher, then uses the topic graph distance to update `engagement` and count repeated off-topic turns. If reranking is unavailable or returns no valid scores, ConsistentMI falls back to lexical matching.
 4. **Verify Motivation (Optional)**: If the therapist addresses the client’s core motivation, the client enters a short `Motivation` state for an acknowledging response.
 5. **Sample a Stage-Consistent Action**: An LLM predicts an action distribution conditioned on recent context and the current stage.
 6. **Select Grounding Detail**: For actions like `Inform/Downplay/Blame/Hesitate/Plan`, the client selects a relevant persona/belief/plan (only when the therapist asks a question) to ground the next reply.
@@ -58,16 +58,43 @@ response = client.generate_response(
 print(response)
 ```
 
+> ⚠️ **Hint:**
+>
+> - ConsistentMI use a local reranker served through vLLM's OpenAI-compatible `/rerank` endpoint.
+> - Set `LOCAL_BASE_URL` and `LOCAL_API_KEY` in `.env`; PatientHub reuses them for the reranker.
+> - Use `reranker_model_type=LOCAL`.
+> - Set `reranker_model_name` to the LiteLLM vLLM route, e.g. `hosted_vllm/BAAI/bge-reranker-v2-m3`.
+> - If the reranker server runs on the same machine, prefer `127.0.0.1` over `0.0.0.0` in `LOCAL_BASE_URL`.
+
 ## Configuration
 
-| Option             | Description                      | Default                                        |
-| ------------------ | -------------------------------- | ---------------------------------------------- |
-| `prompt_path`      | Path to prompt file              | `data/prompts/client/consistentMI.yaml`        |
-| `data_path`        | Path to character file           | `data/characters/ConsistentMI.json`            |
-| `data_idx`         | Character index                  | `0`                                            |
-| `topics_path`      | Topics from Wiki                 | `data/resources/ConsistentMI/topics.json`      |
-| `topic_graph_path` | Correlation between topics       | `data/resources/ConsistentMI/topic_graph.json` |
-| `model_retriever`  | retrieve the most relevant topic | None                                           |
+| Option                | Description                                             | Default                                        |
+| --------------------- | ------------------------------------------------------- | ---------------------------------------------- |
+| `prompt_path`         | Path to prompt file                                     | `data/prompts/client/consistentMI.yaml`        |
+| `data_path`           | Path to character file                                  | `data/characters/ConsistentMI.json`            |
+| `data_idx`            | Character index                                         | `0`                                            |
+| `topics_path`         | Topics from Wiki                                        | `data/resources/ConsistentMI/topics.json`      |
+| `topic_graph_path`    | Correlation between topics                              | `data/resources/ConsistentMI/topic_graph.json` |
+| `reranker_model_type` | Provider key for topic reranking                        | `LOCAL`                                        |
+| `reranker_model_name` | LiteLLM model route for the reranker                    | `hosted_vllm/BAAI/bge-reranker-v2-m3`          |
+
+### Local Reranker Example
+
+```yaml
+client:
+  agent_name: consistentMI
+  model_type: OPENAI
+  model_name: gpt-4o
+  reranker_model_type: LOCAL
+  reranker_model_name: hosted_vllm/BAAI/bge-reranker-v2-m3
+```
+
+With a local vLLM reranker server, your `.env` should contain:
+
+```bash
+LOCAL_BASE_URL=http://127.0.0.1:7891/v1
+LOCAL_API_KEY=EMPTY
+```
 
 ## Character Data Format
 
diff --git a/docs/docs/getting-started/configuration.md b/docs/docs/getting-started/configuration.md
@@ -22,11 +22,13 @@ For example,
 OPENAI_API_KEY=your_openai_key
 OPENAI_BASE_URL=https://api.openai.com
 
-# For VLLM (n this case, model_type = VLLM)
-VLLM_BASE_URL=http://127.0.0.1
-VLLM_API_KEY=None
+# For local OpenAI-compatible servers (model_type = LOCAL)
+LOCAL_BASE_URL=http://127.0.0.1:8000/v1
+LOCAL_API_KEY=EMPTY
 ```
 
+`model_type` is used to select the environment-variable namespace. For example, `model_type=LOCAL` makes PatientHub read `LOCAL_BASE_URL` and `LOCAL_API_KEY`.
+
 ## Model Configuration
 
 ### Using OpenAI (Default)
@@ -65,9 +67,14 @@ config = {
 ```yaml
 client:
   agent_name: consistentMI
-  initial_stage: precontemplation # precontemplation, contemplation, preparation, action
+  model_type: OPENAI
+  model_name: gpt-4o
+  reranker_model_type: LOCAL
+  reranker_model_name: hosted_vllm/BAAI/bge-reranker-v2-m3
 ```
 
+`ConsistentMI` uses the main `model_type` / `model_name` pair for response generation and a separate `reranker_model_type` / `reranker_model_name` pair for topic matching. The reranker currently reuses `LOCAL_BASE_URL` and `LOCAL_API_KEY`.
+
 #### SimPatient
 
 ```yaml
diff --git a/docs/docs/getting-started/installation.md b/docs/docs/getting-started/installation.md
@@ -75,6 +75,44 @@ LOCAL_API_KEY=EMPTY
 
 Then set your config to use `model_type=LOCAL` and `model_name` to the model name exposed by your vLLM server.
 
+### Local Reranker Models via vLLM
+
+`ConsistentMI` can also use a local reranker served by vLLM's OpenAI-compatible `/rerank` endpoint.
+
+1) Start a reranker model with vLLM:
+
+```bash
+vllm serve BAAI/bge-reranker-v2-m3 --host 0.0.0.0 --port 7891
+```
+
+2) Point `LOCAL_BASE_URL` at the reranker server:
+
+```bash
+LOCAL_BASE_URL=http://127.0.0.1:7891/v1
+LOCAL_API_KEY=EMPTY
+```
+
+3) Use the LiteLLM vLLM route in `ConsistentMI`:
+
+```yaml
+client:
+  agent_name: consistentMI
+  reranker_model_type: LOCAL
+  reranker_model_name: hosted_vllm/BAAI/bge-reranker-v2-m3
+```
+
+:::tip Localhost vs 0.0.0.0
+Use `0.0.0.0` for the server listen address, but use `127.0.0.1` or the machine's real IP in `LOCAL_BASE_URL`.
+:::
+
+:::tip Proxy settings
+If your shell exports `http_proxy` or `https_proxy`, local requests to the reranker can be sent to the proxy instead of your vLLM server. For local testing, either unset those variables or set:
+
+```bash
+export NO_PROXY=127.0.0.1,localhost
+```
+:::
+
 :::note vLLM fails to start
 it’s usually a CUDA/driver mismatch on the serving machine—check your NVIDIA driver/CUDA runtime and use a vLLM version compatible with your environment.
 :::