Document and test server-side LLM sampling (#127)

simba-git · web-flow · commit 0e9dcb55f616 · 2025-07-11T12:53:32.000-05:00
diff --git a/README.md b/README.md
@@ -415,6 +415,15 @@ EnrichMCP adds three critical layers on top of MCP:
 
 The result: AI agents can work with your data as naturally as a developer using an ORM.
 
+## Server-Side LLM Sampling
+
+EnrichMCP can request language model completions through MCP's **sampling**
+feature. Call `ctx.ask_llm()` or the `ctx.sampling()` alias from any resource
+and the connected client will choose an LLM and pay for the usage. You can tune
+behavior using options like `model_preferences`, `allow_tools`, and
+`max_tokens`. See [docs/server_side_llm.md](docs/server_side_llm.md) for more
+details.
+
 ## Examples
 
 Check out the [examples directory](examples/README.md):
diff --git a/docs/api/context.md b/docs/api/context.md
@@ -25,6 +25,21 @@ context = EnrichContext()
 The context exposes a `cache` attribute for storing values across the request,
 user, or global scopes.
 
+## LLM Integration
+
+Use `ask_llm()` (or the `sampling()` alias) to request completions from the client-side LLM. See the [Server-Side LLM guide](../server_side_llm.md) for more details:
+
+```python
+from enrichmcp import prefer_fast_model
+
+result = await ctx.ask_llm(
+    "Summarize our latest sales numbers",
+    model_preferences=prefer_fast_model(),
+    max_tokens=200,
+)
+print(result.content.text)
+```
+
 ## Extending Context
 
 For now, if you need context functionality, you can extend the base class:
diff --git a/docs/server_side_llm.md b/docs/server_side_llm.md
@@ -0,0 +1,58 @@
+# Server-Side LLM Sampling
+
+MCP includes a **sampling** feature that lets the server ask the client to run an LLM request.
+This keeps API keys and billing on the client side while giving your EnrichMCP
+application the ability to generate text or run tool-aware prompts.
+
+`EnrichContext.ask_llm()` (and its alias `sampling()`) is the helper used to make
+these requests. The method mirrors the MCP sampling API and supports a number of
+tuning parameters.
+
+## Parameters
+
+| Name | Description |
+|------|-------------|
+| `messages` | Text or `SamplingMessage` objects to send to the LLM. Strings are converted to user messages automatically. |
+| `system_prompt` | Optional system prompt that defines overall behavior. |
+| `max_tokens` | Maximum number of tokens the client should generate. Defaults to 1000. |
+| `temperature` | Sampling temperature for controlling randomness. |
+| `model_preferences` | `ModelPreferences` object describing cost, speed and intelligence priorities. Use `prefer_fast_model()` or `prefer_smart_model()` as shortcuts. |
+| `allow_tools` | Controls what tools the LLM can see: `"none"`, `"thisServer"`, or `"allServers"`. |
+| `stop_sequences` | Strings that stop generation when encountered. |
+
+### Model Preferences
+
+`ModelPreferences` let the server express whether it cares more about cost,
+speed or intelligence when the client chooses an LLM. Two convenience functions
+are provided:
+
+```python
+from enrichmcp import prefer_fast_model, prefer_smart_model
+```
+
+Use `prefer_fast_model()` when low latency and price are most important. Use
+`prefer_smart_model()` when you need the best reasoning capability.
+
+### Tool Access
+
+Set `allow_tools` to allow the client LLM to inspect available MCP tools.
+This enables context-aware answers where the LLM can suggest reading or calling
+other resources.
+
+## Example
+
+```python
+@app.retrieve
+async def summarize(text: str, ctx: EnrichContext) -> str:
+    result = await ctx.ask_llm(
+        f"Summarize this: {text}",
+        model_preferences=prefer_fast_model(),
+        max_tokens=200,
+        allow_tools="thisServer",
+    )
+    return result.content.text
+```
+
+MCP sampling gives your server lightweight LLM features without storing API
+credentials. See the [travel planner example](../examples/server_side_llm_travel_planner) for a complete
+implementation.
diff --git a/examples/README.md b/examples/README.md
@@ -14,6 +14,7 @@ This directory contains examples demonstrating how to use EnrichMCP.
 - [basic_memory](basic_memory) - simple note-taking API using FileMemoryStore
 - [caching](caching) - request caching with ContextCache
 - [openai_chat_agent](openai_chat_agent) - interactive chat client
+- [server_side_llm_travel_planner](server_side_llm_travel_planner) - LLM-backed travel suggestions
 
 ## Hello World
 
diff --git a/examples/server_side_llm_travel_planner/README.md b/examples/server_side_llm_travel_planner/README.md
@@ -0,0 +1,20 @@
+# Server-Side LLM Travel Planner
+
+This example demonstrates how an EnrichMCP server can use the `ctx.sampling()`
+helper to ask the client for language model assistance.
+
+The API exposes two resources:
+
+- `list_destinations()` – returns a list of predefined destinations
+- `plan_trip(preferences)` – uses LLM sampling to pick the top three destinations
+  that match the user's preferences
+
+Run the server:
+
+```bash
+python app.py
+```
+
+Then invoke it with an MCP client such as `mcp_use` or the `openai_chat_agent`
+example. Describe your travel preferences and the server will respond with three
+suggested destinations.
diff --git a/examples/server_side_llm_travel_planner/app.py b/examples/server_side_llm_travel_planner/app.py
@@ -0,0 +1,88 @@
+"""Travel planner example using server-side LLM sampling."""
+
+from __future__ import annotations
+
+import json
+from typing import Annotated
+
+from pydantic import Field
+
+from enrichmcp import EnrichContext, EnrichMCP, EnrichModel, prefer_fast_model
+
+app = EnrichMCP(
+    title="Travel Planner",
+    description="Suggest destinations based on user preferences using LLM sampling",
+)
+
+
+class Destination(EnrichModel):
+    """Popular travel destination."""
+
+    name: str = Field(description="Name of the destination")
+    region: str = Field(description="Region of the world")
+    summary: str = Field(description="Short description of the location")
+
+
+DESTINATIONS = [
+    Destination(
+        name="Paris",
+        region="Europe",
+        summary="Romantic city known for art, fashion and the Eiffel Tower",
+    ),
+    Destination(
+        name="Tokyo",
+        region="Asia",
+        summary="Bustling metropolis blending modern tech and ancient temples",
+    ),
+    Destination(
+        name="New York",
+        region="North America",
+        summary="Iconic skyline, diverse food and world-class museums",
+    ),
+    Destination(
+        name="Sydney",
+        region="Australia",
+        summary="Harbour city with famous opera house and beautiful beaches",
+    ),
+    Destination(
+        name="Cape Town",
+        region="Africa",
+        summary="Mountain backdrop, coastal views and vibrant culture",
+    ),
+]
+
+
+@app.retrieve
+def list_destinations() -> list[Destination]:
+    """Return the full list of available destinations."""
+
+    return DESTINATIONS
+
+
+@app.retrieve
+async def plan_trip(
+    preferences: Annotated[str, Field(description="Your travel preferences")],
+    ctx: EnrichContext,
+) -> list[Destination]:
+    """Return three destinations that best match the given preferences."""
+
+    bullet_list = "\n".join(f"- {d.name}: {d.summary}" for d in DESTINATIONS)
+    prompt = (
+        "Select the three best destinations from the list below based on the "
+        "given preferences. Reply with a JSON list of names only.\nPreferences: "
+        f"{preferences}\n\n{bullet_list}"
+    )
+    result = await ctx.sampling(
+        prompt,
+        model_preferences=prefer_fast_model(),
+        max_tokens=50,
+    )
+    try:
+        names = json.loads(result.content.text)
+    except Exception:
+        return []
+    return [d for d in DESTINATIONS if d.name in names]
+
+
+if __name__ == "__main__":
+    app.run()
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -63,6 +63,7 @@ nav:
   - Home: index.md
   - Getting Started: getting-started.md
   - Core Concepts: concepts.md
+  - Server-Side LLM: server_side_llm.md
   - Examples: examples.md
   - SQLAlchemy: sqlalchemy.md
   - API Reference:
diff --git a/src/enrichmcp/__init__.py b/src/enrichmcp/__init__.py
@@ -24,9 +24,15 @@
 # Public exports
 from typing import TYPE_CHECKING
 
+from mcp.types import ModelPreferences
+
 from .app import EnrichMCP
 from .cache import MemoryCache, RedisCache
-from .context import EnrichContext
+from .context import (
+    EnrichContext,
+    prefer_fast_model,
+    prefer_smart_model,
+)
 from .datamodel import (
     DataModelSummary,
     EntityDescription,
@@ -67,6 +73,7 @@
     "FieldDescription",
     "MemoryCache",
     "ModelDescription",
+    "ModelPreferences",
     "PageResult",
     "PaginatedResult",
     "PaginationParams",
@@ -77,6 +84,8 @@
     "ToolKind",
     "__version__",
     "combine_lifespans",
+    "prefer_fast_model",
+    "prefer_smart_model",
 ]
 
 # Add SQLAlchemy to exports if available
diff --git a/src/enrichmcp/context.py b/src/enrichmcp/context.py
@@ -4,7 +4,16 @@
 Provides a thin wrapper over FastMCP's Context for request handling.
 """
 
+from typing import Literal
+
 from mcp.server.fastmcp import Context  # pyright: ignore[reportMissingTypeArgument]
+from mcp.types import (
+    CreateMessageResult,
+    ModelHint,
+    ModelPreferences,
+    SamplingMessage,
+    TextContent,
+)
 
 from .cache import ContextCache
 
@@ -36,3 +45,86 @@ def cache(self) -> ContextCache:
         if self._cache is None:
             raise ValueError("Cache is not configured")
         return self._cache
+
+    # ------------------------------------------------------------------
+    # LLM Integration
+    # ------------------------------------------------------------------
+
+    def _convert_messages(
+        self, messages: str | list[str | SamplingMessage]
+    ) -> list[SamplingMessage]:
+        """Convert plain strings to ``SamplingMessage`` objects."""
+
+        if isinstance(messages, str):
+            messages = [messages]
+
+        converted: list[SamplingMessage] = []
+        for msg in messages:
+            if isinstance(msg, SamplingMessage):
+                converted.append(msg)
+            elif isinstance(msg, str):
+                converted.append(
+                    SamplingMessage(
+                        role="user",
+                        content=TextContent(type="text", text=msg),
+                    )
+                )
+            else:
+                raise TypeError("messages must be str or SamplingMessage")
+        return converted
+
+    async def ask_llm(
+        self,
+        messages: str | list[str | SamplingMessage],
+        *,
+        system_prompt: str | None = None,
+        max_tokens: int = 1000,
+        temperature: float | None = None,
+        model_preferences: ModelPreferences | None = None,
+        allow_tools: Literal["none", "thisServer", "allServers"] | None = "none",
+        stop_sequences: list[str] | None = None,
+    ) -> CreateMessageResult:
+        """Request LLM sampling via the connected client."""
+
+        sampling_messages = self._convert_messages(messages)
+        session = self._request_context.session  # type: ignore[attr-defined]
+        return await session.create_message(
+            messages=sampling_messages,
+            system_prompt=system_prompt,
+            max_tokens=max_tokens,
+            temperature=temperature,
+            model_preferences=model_preferences,
+            include_context=allow_tools,
+            stop_sequences=stop_sequences,
+        )
+
+    async def sampling(
+        self,
+        messages: str | list[str | SamplingMessage],
+        **kwargs,
+    ) -> CreateMessageResult:
+        """Alias for :meth:`ask_llm`."""
+
+        return await self.ask_llm(messages, **kwargs)
+
+
+def prefer_fast_model() -> ModelPreferences:
+    """Model preferences optimized for speed and cost."""
+
+    return ModelPreferences(
+        hints=[ModelHint(name="gpt-4o-mini"), ModelHint(name="claude-3-haiku")],
+        costPriority=0.8,
+        speedPriority=0.9,
+        intelligencePriority=0.3,
+    )
+
+
+def prefer_smart_model() -> ModelPreferences:
+    """Model preferences optimized for intelligence and capability."""
+
+    return ModelPreferences(
+        hints=[ModelHint(name="gpt-4o"), ModelHint(name="claude-3-opus")],
+        costPriority=0.2,
+        speedPriority=0.3,
+        intelligencePriority=0.9,
+    )
diff --git a/tests/test_llm.py b/tests/test_llm.py