Sometimes, you need the MCP Client and the MCP Server to collaborate to achieve a common goal. You might have a case where the Server requires the help of an LLM that sits on the client. For this situation, sampling is what you should use.
Let's explore some use cases and how to build a solution involving sampling.
In this lesson, we focus on explaining when and where to use Sampling and how to configure it.
In this chapter, we will:
- Explain what Sampling is and when to use it.
- Show how to configure Sampling in MCP.
- Provide examples of Sampling in action.
Sampling is an davanced features that works in the following way:
sequenceDiagram
participant User
participant MCP Client
participant LLM
participant MCP Server
User->>MCP Client: Author blog post
MCP Client->>MCP Server: Tool call (blog post draft)
MCP Server->>MCP Client: Sampling request (create summary)
MCP Client->>LLM: Generate blog post summary
LLM->>MCP Client: Summary result
MCP Client->>MCP Server: Sampling response (summary)
MCP Server->>MCP Client: Complete blog post (draft + summary)
MCP Client->>User: Blog post ready
Ok, now we have a mile high view of a credible scenario, let's talk about the sampling request the server sends back to the client. Here's what such a request can look like in JSON-RPC format:
{
"jsonrpc": "2.0",
"id": 1,
"method": "sampling/createMessage",
"params": {
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "Create a blog post summary of the following blog post: <BLOG POST>"
}
}
],
"modelPreferences": {
"hints": [
{
"name": "claude-3-sonnet"
}
],
"intelligencePriority": 0.8,
"speedPriority": 0.5
},
"systemPrompt": "You are a helpful assistant.",
"maxTokens": 100
}
}There's a few things here worth calling out:
-
Prompt, under content -> text, is our prompt that is an instruction for the LLM to summarize blog post content.
-
modelPreferences. This section is just that, a preference, a recommendation of what configuration to use with the LLM. The user can choose whether to go with these recommendations or change them. In this case there are recommendations on model to use and speed and intelligence priority.
-
systemPrompt, this is your normal system prompt that gives your LLM a personaly and contains guidance instructions.
-
maxTokens, this is another property that's used to say how many tokens is recommended to use for this task.
This response is what the MCP Client ends up sending back to the the MCP Server and is the result of the client calling the LLM, wait for that response and then construct this message. Here's what it can look like in JSON-RPC:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"role": "assistant",
"content": {
"type": "text",
"text": "Here's your abstract <ABSTRACT>"
},
"model": "gpt-5",
"stopReason": "endTurn"
}
}Note how the response is an abstract of the blog post just like we asked for. Also note how the used model isn't what we asked for but "gpt-5" over "claude-3-sonnet". This is to illustrate that the user can change their mind on what to use and that your sampling request is a recommendation.
Ok, now that we understand the main flow, and useful task to use it for "blog post creation + abstract", let's see what we need to do to get it to work.
Sampling messages aren't constrained to just text but you can also send, images and audio. Here's how the JSON-RPC looks different:
Text
{
"type": "text",
"text": "The message content"
}Image content
{
"type": "image",
"data": "base64-encoded-image-data",
"mimeType": "image/jpeg"
}Audio content
{
"type": "audio",
"data": "base64-encoded-audio-data",
"mimeType": "audio/wav"
}NOTE: for more detailed info on Sampling, check out the official docs
Note: if you're only building building a server, you don't need to do much here.
In a client, you need to specify the following feature like so:
{
"capabilities": {
"sampling": {}
}
}This will then be picked up when your chosen client initializes with the server.
Let's code a sampling server together, we will need to do the following:
- Create a tool on the Server.
- Said tool should create a sampling request
- Tool should wait for the clients sampling request to be answered.
- Then the tool result should be produced.
Let's see the code step by step:
python
@mcp.tool()
async def create_blog(title: str, content: str, ctx: Context[ServerSession, None]) -> str:
"""Create a blog post and generate a summary"""Extend your tool with the following code:
python
post = BlogPost(
id=len(posts) + 1,
title=title,
content=content,
abstract=""
)
prompt = f"Create an abstract of the following blog post: title: {title} and draft: {content} "
result = await ctx.session.create_message(
messages=[
SamplingMessage(
role="user",
content=TextContent(type="text", text=prompt),
)
],
max_tokens=100,
)python
post.abstract = result.content.text
posts.append(post)
# return the complete product
return json.dumps({
"id": post.title,
"abstract": post.abstract
})python
from starlette.applications import Starlette
from starlette.routing import Mount, Host
from mcp.server.fastmcp import Context, FastMCP
from mcp.server.session import ServerSession
from mcp.types import SamplingMessage, TextContent
import json
from uuid import uuid4
from typing import List
from pydantic import BaseModel
mcp = FastMCP("Blog post generator")
# app = FastAPI()
posts = []
class BlogPost(BaseModel):
id: int
title: str
content: str
abstract: str
posts: List[BlogPost] = []
@mcp.tool()
async def create_blog(title: str, content: str, ctx: Context[ServerSession, None]) -> str:
"""Create a blog post and generate a summary"""
post = BlogPost(
id=len(posts) + 1,
title=title,
content=content,
abstract=""
)
prompt = f"Create an abstract of the following blog post: title: {title} and draft: {content} "
result = await ctx.session.create_message(
messages=[
SamplingMessage(
role="user",
content=TextContent(type="text", text=prompt),
)
],
max_tokens=100,
)
post.abstract = result.content.text
posts.append(post)
# return the complete blog post
return json.dumps({
"id": post.title,
"abstract": post.abstract
})
if __name__ == "__main__":
print("Starting server...")
# mcp.run()
mcp.run(transport="streamable-http")
# run app with: python server.pyTo test this out in Visual Studio Code, do the following:
-
Start server in terminal
-
Add it to mcp.json (and ensure it's started) e.g something like so:
"servers": { "blog-server": { "type": "http", "url": "http://localhost:8000/mcp" } }
-
Type a prompt:
create a blog post named "Where Python comes from", the content is "Python is actually named after Monty Python Flying Circus" -
Allow sampling to happen. First time you test this you will be presented with an additional dialog you will need to accept, then you will see the normal dialog for asking you to run a tool
-
Inspect results. You will see the results both nicely rendered in GitHub Copilot Chat but you can also inspect the raw JSON response.
Bonus. Visual Studio Code tooling has great support for sampling. You can configure Sampling access on your installed server by navigating to it like so:
- Navigate to extension section.
- Select the cog icon for your installed server in the "MCP SERVERS - INSTALLED" section. 1 Select "Configure Model Access", here you can select which Models GitHub Copilot is allowed to use when performing sampling. You can also see all sampling requests that happened lately by selecting "Show Sampling requests".
In this assignment, you will build a slightly different Sampling namely a sampling integration that supports generating a product description. Here's your scenario:
Scenario: The back office worker at an e-commerce needs help, it takes way too much time to generate product descriptions. Therefore, you are to build a solution where you can call a tool "create_product" with "title" and "keywords" as argument and it should produce a complete product including a "description" field that should be populated by a client's LLM.
TIP: use what you learned earlier how to construct this server and its tool using a sampling request.
Sampling is a powerful feature that allows the server to delegate tasks to the client when it needs the help of an LLM.