Open
Description
Goal
Enable users to integrate their own Model Context Protocol (MCP) servers into the Note Companion chat functionality. This would allow users to extend the chat's capabilities with custom tools and data sources defined by the user.
Revised Approach & Constraints
Based on further discussion:
- Execution Context: MCP client initialization (
experimental_createMCPClient
) and the execution of tools provided by these clients MUST happen client-side, within the context of the Obsidian plugin environment. - Configuration: Users will define MCP server configurations in a standard JSON file named
mcp.json
, located in a dedicated Note Companion folder within their vault (e.g.,_NoteCompanion/Config/mcp.json
, similar to paths defined inpackages/plugin/settings.ts
). The plugin will be responsible for reading this file. - Server-Side LLM (Default Assumption): The core AI generation (
generateText
/streamText
) will likely remain server-side (e.g., inpackages/web/app/api/(new-ai)/route.ts
) due to API key management and model access.
Core Technical Challenge: Server-Client Tool Invocation
The central problem is enabling the server-side AI generation process to discover and invoke tools whose execution logic resides exclusively on the client (the Obsidian plugin).
When the server-side LLM decides to use a tool defined in the user's mcp.json
:
- How does the server signal to the specific user's client plugin that a tool needs to be run?
- How does the client execute the tool via its locally initialized MCP client?
- How does the client securely and reliably return the tool's result back to the server-side generation process to continue the LLM turn?
Potential Communication Mechanisms (Server-Client for Tools)
We need a robust mechanism for the server API route to communicate with the client plugin for tool calls:
- Plugin as Proxy/Listener:
- Idea: Plugin establishes a persistent WebSocket connection to the backend upon startup. Server sends tool requests over this socket; plugin executes and sends results back.
- Pros: Direct communication channel.
- Cons: Connection management complexity, scaling concerns on the backend, handling dropped connections/plugin restarts.
- Modified AI SDK Stream:
- Idea: Intercept the Vercel AI SDK's streaming response on the client. If a special "client-tool-request" message is detected, pause rendering, execute the tool via the plugin, send the result back to a dedicated server endpoint, which then resumes the original stream with the tool result injected.
- Pros: Leverages existing streaming connection.
- Cons: Complex interception logic, potentially fragile, requires careful coordination between client and server state. Needs a way for the server to 'pause' its generation.
- Dedicated RPC Channel:
- Idea: Use a simple HTTP endpoint on the plugin (if possible via Obsidian API?) or have the plugin poll a server endpoint for pending tool requests associated with its session/user.
- Pros: Simpler than WebSockets perhaps.
- Cons: Polling is inefficient; local HTTP server in plugin might be complex/restricted. Requires secure identification of the client.
Example MCP Client Usage (Client-Side)
// This code would run CLIENT-SIDE (e.g., within the plugin)
import { experimental_createMCPClient, type ToolSet } from 'ai';
import { Experimental_StdioMCPTransport } from 'ai/mcp-stdio'; // Or other transports
interface MCPConfig {
servers: Array<{
id: string; // Unique ID for this server config
transport: {
type: 'stdio' | 'sse' | string; // Add other types as needed
options?: any; // e.g., { command: string, args: string[] } for stdio
url?: string; // e.g., for sse
};
}>;
}
interface ActiveClient {
id: string;
client: ReturnType<typeof experimental_createMCPClient>;
tools: ToolSet;
}
const activeClients: Map<string, ActiveClient> = new Map();
async function initializeUserMCPClients(mcpConfig: MCPConfig): Promise<Map<string, ToolSet>> {
const allToolSchemas: Map<string, ToolSet> = new Map();
await shutdownUserMCPClients(); // Close existing first
for (const config of mcpConfig.servers) {
try {
let transport;
if (config.transport.type === 'stdio') {
transport = new Experimental_StdioMCPTransport(config.transport.options);
} else if (config.transport.type === 'sse') {
transport = { type: 'sse', url: config.transport.url };
} else {
console.warn('Unsupported MCP transport type:', config.transport.type);
continue;
}
const client = await experimental_createMCPClient({ transport });
const toolSet = await client.tools();
activeClients.set(config.id, { id: config.id, client, tools: toolSet });
allToolSchemas.set(config.id, toolSet); // Store schemas by server ID
console.log(`Initialized MCP client ${config.id} with tools:`, Object.keys(toolSet));
} catch (error) {
console.error('Failed to initialize MCP client:', config.id, error);
}
}
// TODO: Need a mechanism to send the SCHEMAS from `allToolSchemas` to the SERVER
// so the server-side LLM knows which tools are available.
return allToolSchemas;
}
async function shutdownUserMCPClients() {
const promises = [];
for (const activeClient of activeClients.values()) {
console.log(`Shutting down MCP client ${activeClient.id}`);
promises.push(activeClient.client.close().catch(e => console.error(`Error closing client ${activeClient.id}:`, e)));
}
await Promise.all(promises);
activeClients.clear();
}
// Later, when server requests tool execution via one of the communication mechanisms:
async function executeClientTool(serverId: string, toolName: string, args: any): Promise<any> {
const activeClient = activeClients.get(serverId);
if (!activeClient) {
throw new Error(`Client-side MCP server ${serverId} not found or not active.`);
}
if (!activeClient.tools[toolName]) {
throw new Error(`Tool ${toolName} not found in client-side MCP server ${serverId}.`);
}
try {
console.log(`Executing client-side tool: ${serverId}/${toolName} with args:`, args);
// Assuming the tool object itself has an execute method or similar
// This detail depends on the exact structure returned by experimental_createMCPClient and client.tools()
// It might be `activeClient.client.executeTool(toolName, args)` or similar pattern
const toolFunction = activeClient.tools[toolName].execute;
if (typeof toolFunction !== 'function') {
throw new Error(`Tool ${toolName} on server ${serverId} does not have an execute function.`);
}
const result = await toolFunction(args);
console.log(`Tool ${serverId}/${toolName} execution result:`, result);
return result;
} catch (error) {
console.error(`Error executing tool ${serverId}/${toolName}:`, error);
throw error; // Re-throw to be sent back to the server
}
}
Discussion Points (Expanded)
- Communication Mechanisms:
- Guess: WebSockets seem most robust for bidirectional, persistent communication but add backend complexity. Modified stream feels hacky. Polling/RPC might be simplest if plugin limitations allow a local server or background task.
- Security:
- Guess: Primary concern is the result transmission back to the server. Ensure results don't leak sensitive local data unintentionally. User needs clear warnings about what tools can access. Credentials for the MCP servers themselves remain client-side, which is good.
- Tool Discovery:
- Guess: Plugin needs to read
mcp.json
on startup/change, initialize clients, extract tool schemas (the definition, not the execution logic), and send these schemas to the backend API (e.g., via a dedicated endpoint or during chat session initiation). The server then includes these schemas in thetools
parameter forgenerateText
.
- Guess: Plugin needs to read
- Error Handling:
- Guess: Server needs timeouts for client tool calls. Client needs to handle MCP server unavailability or execution errors gracefully and report back to the server. What happens if the plugin isn't running or the communication channel is down when the server needs a tool? Server probably needs to report failure back to the LLM.
- Latency:
- Guess: Each client-side tool call adds a full server -> client -> server round trip during the LLM generation. This could significantly increase perceived latency compared to server-side tools.
- AI SDK Changes:
- Guess: Ideally, the AI SDK could offer a built-in mechanism or pattern for delegating specific tool calls to a registered client-side handler via the existing stream, simplifying the communication logic.
Alternative Path: Fully Client-Side Chat
This approach avoids the server-client tool communication problem by moving the entire generation process client-side.
- Core Idea: Instead of calling
/api/chat
(or similar), the chat UI component inpackages/web
would directly usestreamText
orgenerateText
from theai
package. - Plugin Interaction: The chat UI (running in Obsidian's webview or communicating heavily with the plugin) would:
- Request the LLM API Key (securely stored/provided by the user via plugin settings). This is the major security hurdle.
- Ask the plugin to read
_NoteCompanion/Config/mcp.json
. - Ask the plugin to initialize the
experimental_createMCPClient
instances. - Aggregate tools: Combine the tools from the user's MCP clients with any standard tools provided directly by the plugin (like file access, note creation etc., exposed via plugin's API).
- AI SDK Usage: The React component calls
streamText
directly, passing the aggregated tools. Tool execution happens directly in the client/plugin context. - Required Work:
- API Route (
packages/web/app/api/...
): Gut the AI generation logic. It might become unnecessary or only handle ancillary functions. - Chat UI (
packages/web/app/(chat)/...
):- Remove
useChat
hook (or equivalent using the API route). - Implement state management for messages, loading, errors directly.
- Add logic to fetch API key, load MCP config, initialize clients, aggregate tools.
- Call
streamText
directly within the component's event handlers.
- Remove
- Plugin (
packages/plugin
):- Provide secure way to store/retrieve user's LLM API key.
- Expose functions to read
mcp.json
, initialize/shutdown MCP clients, and potentially provide access to standard plugin actions as tools.
- Security: Solve the LLM API key exposure problem. Storing it client-side is generally insecure. Options: leverage Obsidian's storage (if deemed secure enough), require user input per session, proxy through a very minimal backend solely for key management (defeats the purpose?). This is the biggest blocker.
- Bundling: Ensure all necessary
@ai-sdk/*
,ai
, and MCP transport packages can be correctly bundled and run within the Obsidian/web client environment.
- API Route (
- Pros:
- Eliminates the server-client round trip during generation for tool calls.
- Conceptually simpler tool execution flow (all local).
- Cons:
- MAJOR security risk with LLM API key handling client-side.
- Significant refactoring of the current chat architecture.
- Loses server-side logging, control, and potential caching capabilities.
- Might be harder to implement complex multi-step tool interactions if all state is client-side.
- Increases client-side bundle size and complexity.