Skip to content

Feature: Enable User-Defined MCP Server Integration in Chat (Client-Side Execution) #387

Open
@benjaminshafii

Description

@benjaminshafii

Goal

Enable users to integrate their own Model Context Protocol (MCP) servers into the Note Companion chat functionality. This would allow users to extend the chat's capabilities with custom tools and data sources defined by the user.

Revised Approach & Constraints

Based on further discussion:

  1. Execution Context: MCP client initialization (experimental_createMCPClient) and the execution of tools provided by these clients MUST happen client-side, within the context of the Obsidian plugin environment.
  2. Configuration: Users will define MCP server configurations in a standard JSON file named mcp.json, located in a dedicated Note Companion folder within their vault (e.g., _NoteCompanion/Config/mcp.json, similar to paths defined in packages/plugin/settings.ts). The plugin will be responsible for reading this file.
  3. Server-Side LLM (Default Assumption): The core AI generation (generateText/streamText) will likely remain server-side (e.g., in packages/web/app/api/(new-ai)/route.ts) due to API key management and model access.

Core Technical Challenge: Server-Client Tool Invocation

The central problem is enabling the server-side AI generation process to discover and invoke tools whose execution logic resides exclusively on the client (the Obsidian plugin).

When the server-side LLM decides to use a tool defined in the user's mcp.json:

  1. How does the server signal to the specific user's client plugin that a tool needs to be run?
  2. How does the client execute the tool via its locally initialized MCP client?
  3. How does the client securely and reliably return the tool's result back to the server-side generation process to continue the LLM turn?

Potential Communication Mechanisms (Server-Client for Tools)

We need a robust mechanism for the server API route to communicate with the client plugin for tool calls:

  • Plugin as Proxy/Listener:
    • Idea: Plugin establishes a persistent WebSocket connection to the backend upon startup. Server sends tool requests over this socket; plugin executes and sends results back.
    • Pros: Direct communication channel.
    • Cons: Connection management complexity, scaling concerns on the backend, handling dropped connections/plugin restarts.
  • Modified AI SDK Stream:
    • Idea: Intercept the Vercel AI SDK's streaming response on the client. If a special "client-tool-request" message is detected, pause rendering, execute the tool via the plugin, send the result back to a dedicated server endpoint, which then resumes the original stream with the tool result injected.
    • Pros: Leverages existing streaming connection.
    • Cons: Complex interception logic, potentially fragile, requires careful coordination between client and server state. Needs a way for the server to 'pause' its generation.
  • Dedicated RPC Channel:
    • Idea: Use a simple HTTP endpoint on the plugin (if possible via Obsidian API?) or have the plugin poll a server endpoint for pending tool requests associated with its session/user.
    • Pros: Simpler than WebSockets perhaps.
    • Cons: Polling is inefficient; local HTTP server in plugin might be complex/restricted. Requires secure identification of the client.

Example MCP Client Usage (Client-Side)

// This code would run CLIENT-SIDE (e.g., within the plugin)
import { experimental_createMCPClient, type ToolSet } from 'ai';
import { Experimental_StdioMCPTransport } from 'ai/mcp-stdio'; // Or other transports

interface MCPConfig {
  servers: Array<{
    id: string; // Unique ID for this server config
    transport: {
      type: 'stdio' | 'sse' | string; // Add other types as needed
      options?: any; // e.g., { command: string, args: string[] } for stdio
      url?: string; // e.g., for sse
    };
  }>;
}

interface ActiveClient {
  id: string;
  client: ReturnType<typeof experimental_createMCPClient>;
  tools: ToolSet;
}

const activeClients: Map<string, ActiveClient> = new Map();

async function initializeUserMCPClients(mcpConfig: MCPConfig): Promise<Map<string, ToolSet>> {
  const allToolSchemas: Map<string, ToolSet> = new Map();
  await shutdownUserMCPClients(); // Close existing first

  for (const config of mcpConfig.servers) {
    try {
      let transport;
      if (config.transport.type === 'stdio') {
        transport = new Experimental_StdioMCPTransport(config.transport.options);
      } else if (config.transport.type === 'sse') {
        transport = { type: 'sse', url: config.transport.url };
      } else {
        console.warn('Unsupported MCP transport type:', config.transport.type);
        continue;
      }

      const client = await experimental_createMCPClient({ transport });
      const toolSet = await client.tools();

      activeClients.set(config.id, { id: config.id, client, tools: toolSet });
      allToolSchemas.set(config.id, toolSet); // Store schemas by server ID

      console.log(`Initialized MCP client ${config.id} with tools:`, Object.keys(toolSet));

    } catch (error) {
      console.error('Failed to initialize MCP client:', config.id, error);
    }
  }
  // TODO: Need a mechanism to send the SCHEMAS from `allToolSchemas` to the SERVER
  // so the server-side LLM knows which tools are available.
  return allToolSchemas;
}

async function shutdownUserMCPClients() {
  const promises = [];
  for (const activeClient of activeClients.values()) {
    console.log(`Shutting down MCP client ${activeClient.id}`);
    promises.push(activeClient.client.close().catch(e => console.error(`Error closing client ${activeClient.id}:`, e)));
  }
  await Promise.all(promises);
  activeClients.clear();
}


// Later, when server requests tool execution via one of the communication mechanisms:
async function executeClientTool(serverId: string, toolName: string, args: any): Promise<any> {
    const activeClient = activeClients.get(serverId);
    if (!activeClient) {
        throw new Error(`Client-side MCP server ${serverId} not found or not active.`);
    }
    if (!activeClient.tools[toolName]) {
        throw new Error(`Tool ${toolName} not found in client-side MCP server ${serverId}.`);
    }

    try {
        console.log(`Executing client-side tool: ${serverId}/${toolName} with args:`, args);
        // Assuming the tool object itself has an execute method or similar
        // This detail depends on the exact structure returned by experimental_createMCPClient and client.tools()
        // It might be `activeClient.client.executeTool(toolName, args)` or similar pattern
        const toolFunction = activeClient.tools[toolName].execute;
         if (typeof toolFunction !== 'function') {
             throw new Error(`Tool ${toolName} on server ${serverId} does not have an execute function.`);
         }
        const result = await toolFunction(args);
        console.log(`Tool ${serverId}/${toolName} execution result:`, result);
        return result;
    } catch (error) {
        console.error(`Error executing tool ${serverId}/${toolName}:`, error);
        throw error; // Re-throw to be sent back to the server
    }
}

Discussion Points (Expanded)

  • Communication Mechanisms:
    • Guess: WebSockets seem most robust for bidirectional, persistent communication but add backend complexity. Modified stream feels hacky. Polling/RPC might be simplest if plugin limitations allow a local server or background task.
  • Security:
    • Guess: Primary concern is the result transmission back to the server. Ensure results don't leak sensitive local data unintentionally. User needs clear warnings about what tools can access. Credentials for the MCP servers themselves remain client-side, which is good.
  • Tool Discovery:
    • Guess: Plugin needs to read mcp.json on startup/change, initialize clients, extract tool schemas (the definition, not the execution logic), and send these schemas to the backend API (e.g., via a dedicated endpoint or during chat session initiation). The server then includes these schemas in the tools parameter for generateText.
  • Error Handling:
    • Guess: Server needs timeouts for client tool calls. Client needs to handle MCP server unavailability or execution errors gracefully and report back to the server. What happens if the plugin isn't running or the communication channel is down when the server needs a tool? Server probably needs to report failure back to the LLM.
  • Latency:
    • Guess: Each client-side tool call adds a full server -> client -> server round trip during the LLM generation. This could significantly increase perceived latency compared to server-side tools.
  • AI SDK Changes:
    • Guess: Ideally, the AI SDK could offer a built-in mechanism or pattern for delegating specific tool calls to a registered client-side handler via the existing stream, simplifying the communication logic.

Alternative Path: Fully Client-Side Chat

This approach avoids the server-client tool communication problem by moving the entire generation process client-side.

  • Core Idea: Instead of calling /api/chat (or similar), the chat UI component in packages/web would directly use streamText or generateText from the ai package.
  • Plugin Interaction: The chat UI (running in Obsidian's webview or communicating heavily with the plugin) would:
    1. Request the LLM API Key (securely stored/provided by the user via plugin settings). This is the major security hurdle.
    2. Ask the plugin to read _NoteCompanion/Config/mcp.json.
    3. Ask the plugin to initialize the experimental_createMCPClient instances.
    4. Aggregate tools: Combine the tools from the user's MCP clients with any standard tools provided directly by the plugin (like file access, note creation etc., exposed via plugin's API).
  • AI SDK Usage: The React component calls streamText directly, passing the aggregated tools. Tool execution happens directly in the client/plugin context.
  • Required Work:
    • API Route (packages/web/app/api/...): Gut the AI generation logic. It might become unnecessary or only handle ancillary functions.
    • Chat UI (packages/web/app/(chat)/...):
      • Remove useChat hook (or equivalent using the API route).
      • Implement state management for messages, loading, errors directly.
      • Add logic to fetch API key, load MCP config, initialize clients, aggregate tools.
      • Call streamText directly within the component's event handlers.
    • Plugin (packages/plugin):
      • Provide secure way to store/retrieve user's LLM API key.
      • Expose functions to read mcp.json, initialize/shutdown MCP clients, and potentially provide access to standard plugin actions as tools.
    • Security: Solve the LLM API key exposure problem. Storing it client-side is generally insecure. Options: leverage Obsidian's storage (if deemed secure enough), require user input per session, proxy through a very minimal backend solely for key management (defeats the purpose?). This is the biggest blocker.
    • Bundling: Ensure all necessary @ai-sdk/*, ai, and MCP transport packages can be correctly bundled and run within the Obsidian/web client environment.
  • Pros:
    • Eliminates the server-client round trip during generation for tool calls.
    • Conceptually simpler tool execution flow (all local).
  • Cons:
    • MAJOR security risk with LLM API key handling client-side.
    • Significant refactoring of the current chat architecture.
    • Loses server-side logging, control, and potential caching capabilities.
    • Might be harder to implement complex multi-step tool interactions if all state is client-side.
    • Increases client-side bundle size and complexity.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions