Skip to content

Commit 0e9dcb5

Browse files
authored
Document and test server-side LLM sampling (#127)
1 parent 55bbe02 commit 0e9dcb5

File tree

10 files changed

+340
-1
lines changed

10 files changed

+340
-1
lines changed

README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,15 @@ EnrichMCP adds three critical layers on top of MCP:
415415

416416
The result: AI agents can work with your data as naturally as a developer using an ORM.
417417

418+
## Server-Side LLM Sampling
419+
420+
EnrichMCP can request language model completions through MCP's **sampling**
421+
feature. Call `ctx.ask_llm()` or the `ctx.sampling()` alias from any resource
422+
and the connected client will choose an LLM and pay for the usage. You can tune
423+
behavior using options like `model_preferences`, `allow_tools`, and
424+
`max_tokens`. See [docs/server_side_llm.md](docs/server_side_llm.md) for more
425+
details.
426+
418427
## Examples
419428

420429
Check out the [examples directory](examples/README.md):

docs/api/context.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,21 @@ context = EnrichContext()
2525
The context exposes a `cache` attribute for storing values across the request,
2626
user, or global scopes.
2727

28+
## LLM Integration
29+
30+
Use `ask_llm()` (or the `sampling()` alias) to request completions from the client-side LLM. See the [Server-Side LLM guide](../server_side_llm.md) for more details:
31+
32+
```python
33+
from enrichmcp import prefer_fast_model
34+
35+
result = await ctx.ask_llm(
36+
"Summarize our latest sales numbers",
37+
model_preferences=prefer_fast_model(),
38+
max_tokens=200,
39+
)
40+
print(result.content.text)
41+
```
42+
2843
## Extending Context
2944

3045
For now, if you need context functionality, you can extend the base class:

docs/server_side_llm.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Server-Side LLM Sampling
2+
3+
MCP includes a **sampling** feature that lets the server ask the client to run an LLM request.
4+
This keeps API keys and billing on the client side while giving your EnrichMCP
5+
application the ability to generate text or run tool-aware prompts.
6+
7+
`EnrichContext.ask_llm()` (and its alias `sampling()`) is the helper used to make
8+
these requests. The method mirrors the MCP sampling API and supports a number of
9+
tuning parameters.
10+
11+
## Parameters
12+
13+
| Name | Description |
14+
|------|-------------|
15+
| `messages` | Text or `SamplingMessage` objects to send to the LLM. Strings are converted to user messages automatically. |
16+
| `system_prompt` | Optional system prompt that defines overall behavior. |
17+
| `max_tokens` | Maximum number of tokens the client should generate. Defaults to 1000. |
18+
| `temperature` | Sampling temperature for controlling randomness. |
19+
| `model_preferences` | `ModelPreferences` object describing cost, speed and intelligence priorities. Use `prefer_fast_model()` or `prefer_smart_model()` as shortcuts. |
20+
| `allow_tools` | Controls what tools the LLM can see: `"none"`, `"thisServer"`, or `"allServers"`. |
21+
| `stop_sequences` | Strings that stop generation when encountered. |
22+
23+
### Model Preferences
24+
25+
`ModelPreferences` let the server express whether it cares more about cost,
26+
speed or intelligence when the client chooses an LLM. Two convenience functions
27+
are provided:
28+
29+
```python
30+
from enrichmcp import prefer_fast_model, prefer_smart_model
31+
```
32+
33+
Use `prefer_fast_model()` when low latency and price are most important. Use
34+
`prefer_smart_model()` when you need the best reasoning capability.
35+
36+
### Tool Access
37+
38+
Set `allow_tools` to allow the client LLM to inspect available MCP tools.
39+
This enables context-aware answers where the LLM can suggest reading or calling
40+
other resources.
41+
42+
## Example
43+
44+
```python
45+
@app.retrieve
46+
async def summarize(text: str, ctx: EnrichContext) -> str:
47+
result = await ctx.ask_llm(
48+
f"Summarize this: {text}",
49+
model_preferences=prefer_fast_model(),
50+
max_tokens=200,
51+
allow_tools="thisServer",
52+
)
53+
return result.content.text
54+
```
55+
56+
MCP sampling gives your server lightweight LLM features without storing API
57+
credentials. See the [travel planner example](../examples/server_side_llm_travel_planner) for a complete
58+
implementation.

examples/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ This directory contains examples demonstrating how to use EnrichMCP.
1414
- [basic_memory](basic_memory) - simple note-taking API using FileMemoryStore
1515
- [caching](caching) - request caching with ContextCache
1616
- [openai_chat_agent](openai_chat_agent) - interactive chat client
17+
- [server_side_llm_travel_planner](server_side_llm_travel_planner) - LLM-backed travel suggestions
1718

1819
## Hello World
1920

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Server-Side LLM Travel Planner
2+
3+
This example demonstrates how an EnrichMCP server can use the `ctx.sampling()`
4+
helper to ask the client for language model assistance.
5+
6+
The API exposes two resources:
7+
8+
- `list_destinations()` – returns a list of predefined destinations
9+
- `plan_trip(preferences)` – uses LLM sampling to pick the top three destinations
10+
that match the user's preferences
11+
12+
Run the server:
13+
14+
```bash
15+
python app.py
16+
```
17+
18+
Then invoke it with an MCP client such as `mcp_use` or the `openai_chat_agent`
19+
example. Describe your travel preferences and the server will respond with three
20+
suggested destinations.
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
"""Travel planner example using server-side LLM sampling."""
2+
3+
from __future__ import annotations
4+
5+
import json
6+
from typing import Annotated
7+
8+
from pydantic import Field
9+
10+
from enrichmcp import EnrichContext, EnrichMCP, EnrichModel, prefer_fast_model
11+
12+
app = EnrichMCP(
13+
title="Travel Planner",
14+
description="Suggest destinations based on user preferences using LLM sampling",
15+
)
16+
17+
18+
class Destination(EnrichModel):
19+
"""Popular travel destination."""
20+
21+
name: str = Field(description="Name of the destination")
22+
region: str = Field(description="Region of the world")
23+
summary: str = Field(description="Short description of the location")
24+
25+
26+
DESTINATIONS = [
27+
Destination(
28+
name="Paris",
29+
region="Europe",
30+
summary="Romantic city known for art, fashion and the Eiffel Tower",
31+
),
32+
Destination(
33+
name="Tokyo",
34+
region="Asia",
35+
summary="Bustling metropolis blending modern tech and ancient temples",
36+
),
37+
Destination(
38+
name="New York",
39+
region="North America",
40+
summary="Iconic skyline, diverse food and world-class museums",
41+
),
42+
Destination(
43+
name="Sydney",
44+
region="Australia",
45+
summary="Harbour city with famous opera house and beautiful beaches",
46+
),
47+
Destination(
48+
name="Cape Town",
49+
region="Africa",
50+
summary="Mountain backdrop, coastal views and vibrant culture",
51+
),
52+
]
53+
54+
55+
@app.retrieve
56+
def list_destinations() -> list[Destination]:
57+
"""Return the full list of available destinations."""
58+
59+
return DESTINATIONS
60+
61+
62+
@app.retrieve
63+
async def plan_trip(
64+
preferences: Annotated[str, Field(description="Your travel preferences")],
65+
ctx: EnrichContext,
66+
) -> list[Destination]:
67+
"""Return three destinations that best match the given preferences."""
68+
69+
bullet_list = "\n".join(f"- {d.name}: {d.summary}" for d in DESTINATIONS)
70+
prompt = (
71+
"Select the three best destinations from the list below based on the "
72+
"given preferences. Reply with a JSON list of names only.\nPreferences: "
73+
f"{preferences}\n\n{bullet_list}"
74+
)
75+
result = await ctx.sampling(
76+
prompt,
77+
model_preferences=prefer_fast_model(),
78+
max_tokens=50,
79+
)
80+
try:
81+
names = json.loads(result.content.text)
82+
except Exception:
83+
return []
84+
return [d for d in DESTINATIONS if d.name in names]
85+
86+
87+
if __name__ == "__main__":
88+
app.run()

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ nav:
6363
- Home: index.md
6464
- Getting Started: getting-started.md
6565
- Core Concepts: concepts.md
66+
- Server-Side LLM: server_side_llm.md
6667
- Examples: examples.md
6768
- SQLAlchemy: sqlalchemy.md
6869
- API Reference:

src/enrichmcp/__init__.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,15 @@
2424
# Public exports
2525
from typing import TYPE_CHECKING
2626

27+
from mcp.types import ModelPreferences
28+
2729
from .app import EnrichMCP
2830
from .cache import MemoryCache, RedisCache
29-
from .context import EnrichContext
31+
from .context import (
32+
EnrichContext,
33+
prefer_fast_model,
34+
prefer_smart_model,
35+
)
3036
from .datamodel import (
3137
DataModelSummary,
3238
EntityDescription,
@@ -67,6 +73,7 @@
6773
"FieldDescription",
6874
"MemoryCache",
6975
"ModelDescription",
76+
"ModelPreferences",
7077
"PageResult",
7178
"PaginatedResult",
7279
"PaginationParams",
@@ -77,6 +84,8 @@
7784
"ToolKind",
7885
"__version__",
7986
"combine_lifespans",
87+
"prefer_fast_model",
88+
"prefer_smart_model",
8089
]
8190

8291
# Add SQLAlchemy to exports if available

src/enrichmcp/context.py

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,16 @@
44
Provides a thin wrapper over FastMCP's Context for request handling.
55
"""
66

7+
from typing import Literal
8+
79
from mcp.server.fastmcp import Context # pyright: ignore[reportMissingTypeArgument]
10+
from mcp.types import (
11+
CreateMessageResult,
12+
ModelHint,
13+
ModelPreferences,
14+
SamplingMessage,
15+
TextContent,
16+
)
817

918
from .cache import ContextCache
1019

@@ -36,3 +45,86 @@ def cache(self) -> ContextCache:
3645
if self._cache is None:
3746
raise ValueError("Cache is not configured")
3847
return self._cache
48+
49+
# ------------------------------------------------------------------
50+
# LLM Integration
51+
# ------------------------------------------------------------------
52+
53+
def _convert_messages(
54+
self, messages: str | list[str | SamplingMessage]
55+
) -> list[SamplingMessage]:
56+
"""Convert plain strings to ``SamplingMessage`` objects."""
57+
58+
if isinstance(messages, str):
59+
messages = [messages]
60+
61+
converted: list[SamplingMessage] = []
62+
for msg in messages:
63+
if isinstance(msg, SamplingMessage):
64+
converted.append(msg)
65+
elif isinstance(msg, str):
66+
converted.append(
67+
SamplingMessage(
68+
role="user",
69+
content=TextContent(type="text", text=msg),
70+
)
71+
)
72+
else:
73+
raise TypeError("messages must be str or SamplingMessage")
74+
return converted
75+
76+
async def ask_llm(
77+
self,
78+
messages: str | list[str | SamplingMessage],
79+
*,
80+
system_prompt: str | None = None,
81+
max_tokens: int = 1000,
82+
temperature: float | None = None,
83+
model_preferences: ModelPreferences | None = None,
84+
allow_tools: Literal["none", "thisServer", "allServers"] | None = "none",
85+
stop_sequences: list[str] | None = None,
86+
) -> CreateMessageResult:
87+
"""Request LLM sampling via the connected client."""
88+
89+
sampling_messages = self._convert_messages(messages)
90+
session = self._request_context.session # type: ignore[attr-defined]
91+
return await session.create_message(
92+
messages=sampling_messages,
93+
system_prompt=system_prompt,
94+
max_tokens=max_tokens,
95+
temperature=temperature,
96+
model_preferences=model_preferences,
97+
include_context=allow_tools,
98+
stop_sequences=stop_sequences,
99+
)
100+
101+
async def sampling(
102+
self,
103+
messages: str | list[str | SamplingMessage],
104+
**kwargs,
105+
) -> CreateMessageResult:
106+
"""Alias for :meth:`ask_llm`."""
107+
108+
return await self.ask_llm(messages, **kwargs)
109+
110+
111+
def prefer_fast_model() -> ModelPreferences:
112+
"""Model preferences optimized for speed and cost."""
113+
114+
return ModelPreferences(
115+
hints=[ModelHint(name="gpt-4o-mini"), ModelHint(name="claude-3-haiku")],
116+
costPriority=0.8,
117+
speedPriority=0.9,
118+
intelligencePriority=0.3,
119+
)
120+
121+
122+
def prefer_smart_model() -> ModelPreferences:
123+
"""Model preferences optimized for intelligence and capability."""
124+
125+
return ModelPreferences(
126+
hints=[ModelHint(name="gpt-4o"), ModelHint(name="claude-3-opus")],
127+
costPriority=0.2,
128+
speedPriority=0.3,
129+
intelligencePriority=0.9,
130+
)

0 commit comments

Comments
 (0)