Skip to content

feat: add async job mode for MCP server#558

Merged
buger merged 5 commits intomainfrom
feat/mcp-improvements
Mar 21, 2026
Merged

feat: add async job mode for MCP server#558
buger merged 5 commits intomainfrom
feat/mcp-improvements

Conversation

@buger
Copy link
Copy Markdown
Contributor

@buger buger commented Mar 21, 2026

Summary

  • Adds opt-in async job mode (start_job/get_job) for MCP server to work around Claude Code's long-running tool call timeout limitations
  • Instead of blocking on a single tool call, the client gets a job_id immediately and calls get_job which uses long polling (waits up to 59 seconds by default)
  • Jobs are backed by the existing TaskStore (SQLite), so they're visible via visor tasks list and visor tasks show
  • Works for both standalone (visor mcp-server --async) and runner (visor --mcp --mcp-async) modes

How It Works

  1. Client calls start_job with the message/workflow parameters
  2. Server creates a task, kicks off execution in the background, returns immediately with a job_id (8-char UUID prefix)
  3. Client calls get_job with the job_id — the server holds the connection open for up to 59 seconds (configurable), checking every 500ms
  4. If the job finishes during the wait, the result is returned immediately
  5. If the timeout is reached, a "still running" response is returned and the client calls get_job again

This drastically reduces round-trips compared to short-interval polling. A 10-minute job requires ~10 polls (59s each) instead of ~60 polls (10s each).

Response Shape

Every response from both start_job and get_job uses the same standardized shape:

{
  "job_id": "a1b2c3d4",
  "status": "queued|running|completed|failed|expired",
  "done": false,
  "progress": { "percent": 0, "step": "queued", "message": "Job accepted" },
  "polling": { "recommended_next_action": "get_job", "recommended_delay_seconds": 0 },
  "result": null,
  "error": null,
  "user_message": "The job has started.",
  "next_instruction_for_model": "Call get_job with this job_id..."
}

Changes

File Change
src/mcp-job-manager.ts (NEW) JobManager class — thin wrapper over TaskStore with long polling, 8-char job IDs
src/mcp-server.ts asyncMode + longPollTimeout options; registers start_job/get_job when enabled
src/runners/mcp-server-runner.ts Same async mode support for runner
src/runners/runner-factory.ts Passes asyncMode and longPollTimeout from CLI/config/env
src/cli.ts / src/types/cli.ts --mcp-async CLI flag
src/cli-main.ts --async and --poll-timeout flags for mcp-server subcommand
tests/unit/mcp-job-manager.test.ts (NEW) 14 unit tests

Configuration

# Standalone
visor mcp-server --config workflow.yaml --async
visor mcp-server --async --poll-timeout 30  # custom timeout (seconds)

# Runner mode
visor --mcp --mcp-auth-token secret --mcp-async

# Via env var
VISOR_MCP_ASYNC=true visor --mcp ...
VISOR_MCP_POLL_TIMEOUT=30 visor --mcp --mcp-async ...

# Via config
mcp_server:
  async_mode: true
  long_poll_timeout: 30  # seconds, default 59

Test plan

  • 14 unit tests for JobManager covering:
    • Start job returns immediately with running status
    • Long-poll returns result when job completes during wait
    • Immediate return for already-completed jobs
    • Failure handling (Error and non-Error thrown values)
    • Expired/unknown job IDs return immediately (no waiting)
    • 8-char hex job IDs (UUID prefix)
    • Short ID prefix lookup
    • Task visibility in TaskStore
    • workflowId passthrough
    • Plain string result extraction
    • Concurrent getJob calls on same job
    • Full response shape validation
    • Configurable long poll timeout
  • Full test suite passes
  • Manual test: start MCP server with --async, call start_job and get_job via curl
  • Manual test: verify Claude Code can poll get_job for long-running workflows

🤖 Generated with Claude Code

Claude Code and other MCP clients have timeout issues with long-running
tool calls. This adds an opt-in async mode that replaces blocking tools
with start_job (returns immediately with job_id) and get_job (polls for
status/result).

Usage:
  visor mcp-server --config workflow.yaml --async
  visor --mcp --mcp-auth-token secret --mcp-async

Features:
- JobManager with 6-char hex job IDs, idempotency keys, TTL cleanup
- Consistent response shape with progress, polling hints, and model instructions
- Works for both standalone (mcp-server) and runner (--mcp) modes
- Configurable via CLI flags, YAML config, or VISOR_MCP_ASYNC env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Mar 21, 2026

PR Overview: Async Job Mode for MCP Server

Summary

This PR introduces an opt-in async job mode for the MCP (Model Context Protocol) server to address timeout limitations in Claude Code and other MCP clients when handling long-running tool calls. The implementation uses a long polling pattern (start_job/get_job) that dramatically reduces round-trips compared to traditional short-interval polling.

Files Changed

New Files

  • src/mcp-job-manager.ts (+378 lines) — Core JobManager class implementing async job lifecycle, long polling, and TaskStore integration
  • tests/unit/mcp-job-manager.test.ts (+306 lines) — Comprehensive unit tests (14 test cases)

Modified Files

  • src/mcp-server.ts (+123/-4) — Adds asyncMode/longPollTimeout options, getOrCreateTaskStore(), and registerAsyncJobTools()
  • src/runners/mcp-server-runner.ts (+81/-16) — Integrates async mode for runner-based MCP servers with TaskStore initialization
  • src/runners/runner-factory.ts (+6) — Passes asyncMode and longPollTimeout from CLI/config/env
  • src/cli.ts (+2) — Adds --mcp-async CLI flag
  • src/cli-main.ts (+2) — Adds --async and --poll-timeout flags for mcp-server subcommand
  • src/types/cli.ts (+2) — Adds mcpAsync option to CliOptions interface

Architecture & Impact Assessment

What This PR Accomplishes

  1. Solves Timeout Problem: Claude Code and other MCP clients have timeouts for long-running tool calls. The async pattern returns immediately with a job ID, then uses long polling (waits up to 59 seconds per request) to retrieve results.

  2. TaskStore Integration: Jobs are persisted as regular Visor tasks using the existing SQLite-backed TaskStore, making them visible via visor tasks list and visor tasks show.

  3. Efficient Polling: A 10-minute job requires ~10 polls (59s each) instead of ~60 polls (10s each) — 6x reduction in round-trips.

  4. Dual Mode Support: Works for both standalone MCP server (visor mcp-server --async) and runner mode (visor --mcp --mcp-async).

Key Technical Changes

JobManager Class (src/mcp-job-manager.ts)

  • Wraps TaskStore for job persistence (SQLite-backed)
  • Uses 8-character hex job IDs (UUID prefix) for user-friendly identifiers
  • Background heartbeat timer (60s intervals) for task liveness
  • Maps TaskState → JobStatus for MCP responses
  • Extracts results from task status messages, artifacts, and engine results
  • Long polling with 500ms check interval (configurable timeout, default 59s)

Response Schema

interface JobResponse {
  job_id: string;           // 8-char UUID prefix
  status: 'queued' | 'running' | 'completed' | 'failed' | 'expired';
  done: boolean;
  progress: { percent: number | null; step: string; message: string };
  polling: {
    recommended_next_action: 'get_job' | 'none';
    recommended_delay_seconds: number;
  };
  result: any;
  error: { code: string; message: string; retryable: boolean } | null;
  user_message: string;
  next_instruction_for_model: string;
}

MCP Tool Registration

When asyncMode is enabled:

  • start_job (or start_{toolName}): Accepts workflow parameters, returns job response immediately
  • get_job: Accepts job_id, uses long polling to wait for completion

When asyncMode is disabled: Existing blocking tool behavior is preserved.

Affected System Components

  1. Standalone MCP Server (visor mcp-server --async)

    • Uses registerAsyncJobTools() in both createHttpMcpServer() and startMcpServer()
    • Creates TaskStore if not provided via options
  2. Runner-based MCP Server (visor --mcp --mcp-async)

    • McpServerRunner conditionally registers async tools when asyncMode is true
    • Integrates with existing handleMessage() workflow
  3. Configuration Chain

    • CLI flags → CliOptions.mcpAsyncRunnerFactoryMcpServerRunner/createHttpMcpServer
    • Environment variable: VISOR_MCP_ASYNC=true, VISOR_MCP_POLL_TIMEOUT=30
    • Config file: mcp_server: { async_mode: true, long_poll_timeout: 30 }

Component Flow Diagram

sequenceDiagram
    participant Client as Claude Code / MCP Client
    participant MCP as MCP Server
    participant JM as JobManager
    participant TS as "TaskStore (SQLite)"
    participant WF as Workflow Executor

    Client->>MCP: start_job(message, workflow)
    MCP->>JM: startJob(handler, opts)
    JM->>TS: createTask()
    JM->>TS: updateTaskState(working)
    JM-->>MCP: JobResponse {job_id, status: running}
    MCP-->>Client: Immediate response (job_id: a1b2c3d4)
    
    par Background Execution
        JM->>WF: execute workflow
        WF-->>JM: result/error
        JM->>TS: updateTaskState(completed/failed)
    and
        loop Every 60s
            JM->>TS: heartbeat()
        end
    end

    Client->>MCP: get_job(job_id)
    MCP->>JM: getJob(job_id)
    
    loop Long Poll (up to 59s)
        JM->>TS: getTask(job_id)
        alt Task completed/failed
            JM-->>MCP: JobResponse {done: true, result}
            MCP-->>Client: Final result
        else Still running after timeout
            JM-->>MCP: JobResponse {done: false, status: running}
            MCP-->>Client: Poll again
        end
    end
Loading

Configuration Options

Method Example
CLI (standalone) visor mcp-server --config workflow.yaml --async --poll-timeout 30
CLI (runner) visor --mcp --mcp-auth-token secret --mcp-async
Environment VISOR_MCP_ASYNC=true VISOR_MCP_POLL_TIMEOUT=30 visor --mcp
Config file mcp_server: { async_mode: true, long_poll_timeout: 30 }

Scope Discovery & Context Expansion

Directly Affected Modules

  • MCP Server Core (src/mcp-server.ts) — Tool registration, TaskStore integration
  • MCP Server Runner (src/runners/mcp-server-runner.ts) — Runner integration with TaskStore
  • CLI Layer (src/cli.ts, src/cli-main.ts) — Flag parsing
  • Type Definitions (src/types/cli.ts) — Interface updates

Related Components

  • TaskStore (src/agent-protocol/task-store) — SQLite persistence layer for jobs
  • Workflow Executor — Called by async job handlers
  • Session Managementsession_id parameter preserved in async mode
  • Agent Protocol — Task state management and lifecycle
  • A2A Frontend — Also uses TaskStore for HTTP task management

Testing Coverage

  • Unit Tests: 14 tests covering:
    • Job lifecycle (start → complete → get)
    • Long polling behavior (early return, timeout)
    • Failure handling (Error and non-Error thrown values)
    • Job ID format (8-char hex) and prefix lookup
    • TaskStore visibility and metadata
    • Concurrent getJob calls
    • Configurable timeout
    • Full response shape validation

Potential Review Focus Areas

  1. TaskStore Lifecycle: JobManager creates TaskStore if not provided — ensure proper initialization/shutdown
  2. Heartbeat Timer: Uses unref() to avoid blocking process exit
  3. Error Handling: Failed jobs include retryable: true flag; silent catch blocks for state transitions
  4. Backward Compatibility: Async mode is opt-in; existing blocking behavior unchanged
  5. Result Extraction: Complex logic to extract text from various result formats (MCP content, engine results)
  6. Security Considerations: No job ownership validation (any authenticated client can access any job_id), 8-char IDs have limited entropy (32 bits)

Usage Example

# Start MCP server in async mode
visor mcp-server --config workflow.yaml --async

# Client flow:
# 1. Call start_job with workflow parameters
# 2. Receive immediate response with job_id (8-char hex)
# 3. Poll get_job (waits up to 59s per call)
# 4. Extract result from final response when done=true

# Jobs are visible via:
visor tasks list      # Shows async jobs alongside regular tasks
visor tasks show <id>  # Full task details

References

Code Locations

  • src/mcp-job-manager.ts:1-378 — JobManager class implementation
  • src/mcp-job-manager.ts:237-343startJob() method with task creation and background execution
  • src/mcp-job-manager.ts:345-393getJob() method with long polling logic
  • src/mcp-job-manager.ts:395-410resolveTask() for short ID prefix lookup
  • src/mcp-server.ts:92-99McpServerOptions interface (asyncMode, longPollTimeout, taskStore)
  • src/mcp-server.ts:521-615registerAsyncJobTools() function
  • src/mcp-server.ts:515-519getOrCreateTaskStore() helper
  • src/mcp-server.ts:683-688 — Async mode conditional in createHttpMcpServer()
  • src/mcp-server.ts:921-928 — Async mode conditional in startMcpServer()
  • src/runners/mcp-server-runner.ts:29-35McpFrontendOptions interface
  • src/runners/mcp-server-runner.ts:111-189 — Async tool registration in runner
  • src/cli.ts:45--mcp-async CLI flag definition
  • src/cli-main.ts:1319-1320--async and --poll-timeout flag parsing
  • src/types/cli.ts:80-81CliOptions.mcpAsync type definition
  • src/runners/runner-factory.ts:153-161 — Async mode option passing
  • tests/unit/mcp-job-manager.test.ts:1-306 — Unit tests
Metadata
  • Review Effort: 3 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-03-21T17:43:57.734Z | Triggered by: pr_updated | Commit: bb362b2

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Mar 21, 2026

Security Issues (8)

Severity Location Issue
🟠 Error src/mcp-job-manager.ts:336
The resolveTask() method uses linear search through up to 200 tasks to find prefix matches, allowing unauthorized access to other users' jobs. An attacker can brute-force 8-character hex prefixes (16^8 = 4 billion combinations) to enumerate and access any job ID.
💡 SuggestionAdd authorization checks to ensure users can only access their own jobs. Consider using a cryptographically random job ID with higher entropy (at least 12-16 characters) or implement proper access control. Add rate limiting to prevent brute-force attacks.
🔧 Suggested Fix
private resolveTask(jobId: string, userId?: string): import('./agent-protocol/types').AgentTask | null {
    // Try direct lookup first (full UUID)
    let task = this.taskStore.getTask(jobId);
// Authorization check: only allow access to own tasks
if (task &amp;&amp; userId &amp;&amp; task.metadata?.user_id !== userId) {
  return null;
}

// If not found and looks like a short ID, search by prefix WITH authorization
if (!task &amp;&amp; jobId.length &lt; 36 &amp;&amp; jobId.length &gt;= 8) {
  const { tasks } = this.taskStore.listTasks({ limit: 200, metadata: userId ? { user_id: userId } : undefined });
  task = tasks.find(t =&gt; t.id.startsWith(jobId)) || null;
}

return task;

}

🟠 Error src/mcp-server.ts:595
The registerAsyncJobTools() function registers start_job and get_job tools without any authorization checks. Any authenticated client can start jobs and query any job_id.
💡 SuggestionImplement proper authorization to ensure users can only access their own jobs. Pass user context from the MCP session to JobManager.
🔧 Suggested Fix
async (args: { job_id: string }) => {
        // Extract user context from MCP session
        const userId = (server as any).getSessionUser?.(req) || 'anonymous';
    // Pass userId to enforce access control
    const response = await jobManager.getJob(args.job_id, userId);
    return { content: [{ type: &#39;text&#39; as const, text: JSON.stringify(response, null, 2) }] };
  }</code></pre>
🟠 Error src/runners/mcp-server-runner.ts:145
In async mode, all jobs are stored in a shared TaskStore without tenant isolation. Different users or workflow runs could potentially access each other's jobs via job ID enumeration.
💡 SuggestionImplement proper multi-tenancy by adding user/session context to job metadata and enforcing access control in resolveTask().
🔧 Suggested Fix
const response = jobManager.startJob(
            async () => this.handleMessage(args.message, args.session_id),
            {
              messageText: args.message,
              workflowId: allChecks.join(','),
              metadata: {
                user_id: 'mcp-user', // TODO: Extract from auth context
                session_id: args.session_id,
              }
            }
          );
🟡 Warning src/mcp-job-manager.ts:236
No input validation on job_id parameter in getJob(). Malicious or malformed job IDs could cause unexpected behavior or database errors.
💡 SuggestionValidate job_id format before processing. Ensure it's a valid hex string and reasonable length.
🔧 Suggested Fix
async getJob(jobId: string): Promise<JobResponse> {
    // Validate job_id format
    if (!jobId || typeof jobId !== 'string') {
      return buildExpiredResponse(jobId);
    }
// Ensure job_id is hex and reasonable length
if (!/^[0-9a-fA-F]{8,36}$/.test(jobId)) {
  return buildExpiredResponse(jobId);
}

const resolvedTask = this.resolveTask(jobId);</code></pre>
🟡 Warning src/mcp-job-manager.ts:287
Error objects are converted to strings and exposed to clients via error.message field, potentially revealing internal implementation details, file paths, or stack traces.
💡 SuggestionSanitize error messages before exposing them to clients. Log detailed errors server-side but return generic messages to clients.
🔧 Suggested Fix
.catch(err => {
    clearInterval(heartbeatTimer);
    // Sanitize error message for client exposure
    const errorText = err instanceof Error ? err.message : String(err);
    const sanitizedError = errorText.length > 500 ? errorText.slice(0, 500) + '...' : errorText;
// Log full error server-side for debugging
logger.error(`[MCP-AsyncJob] Job ${task.id.slice(0, 8)} failed: ${errorText}`);

const failMsg = {
  message_id: crypto.randomUUID(),
  role: &#39;agent&#39; as const,
  parts: [{ text: sanitizedError }],
};</code></pre>
🟡 Warning src/mcp-job-manager.ts:236
Long polling with configurable timeout (default 59s) and 500ms intervals allows resource exhaustion. An attacker can create many jobs and continuously poll getJob(), consuming server resources and database connections.
💡 SuggestionImplement rate limiting per client/IP, add maximum concurrent poll limit, and consider exponential backoff for repeated polls on the same job.
🔧 Suggested Fix
// Add rate limiting and connection tracking
private activePolls = new Map<string, number>(); // job_id -> count
private readonly MAX_CONCURRENT_POLLS = 100;
private readonly MAX_POLLS_PER_JOB = 10;

async getJob(jobId: string): Promise<JobResponse> {
// Enforce rate limits
const totalActivePolls = Array.from(this.activePolls.values()).reduce((a, b) => a + b, 0);
if (totalActivePolls > this.MAX_CONCURRENT_POLLS) {
throw new Error('Server busy - too many active polls');
}

const currentPolls = this.activePolls.get(jobId) || 0;
if (currentPolls > this.MAX_POLLS_PER_JOB) {
throw new Error('Too many polls on this job - please wait');
}

this.activePolls.set(jobId, currentPolls + 1);
try {
// ... existing polling logic
} finally {
this.activePolls.set(jobId, Math.max(0, (this.activePolls.get(jobId) || 0) - 1));
}

🟡 Warning src/mcp-job-manager.ts:236
The long polling loop checks every 500ms without any maximum iteration limit. A bug in task state transitions could cause infinite loops.
💡 SuggestionAdd a maximum number of poll iterations to prevent infinite loops.
🔧 Suggested Fix
// Long poll: check every 500ms for up to the configured timeout
    const POLL_INTERVAL_MS = 500;
    const MAX_WAIT_MS = this.longPollTimeoutMs;
    const MAX_ITERATIONS = Math.ceil(MAX_WAIT_MS / POLL_INTERVAL_MS) + 1;
    const deadline = Date.now() + MAX_WAIT_MS;
    let iterations = 0;
while (Date.now() &lt; deadline &amp;&amp; iterations &lt; MAX_ITERATIONS) {
  iterations++;
  await new Promise(resolve =&gt; setTimeout(resolve, POLL_INTERVAL_MS));</code></pre>
🟡 Warning src/cli-main.ts:1319
The --poll-timeout flag accepts any number without validation. Extremely large values could cause resource exhaustion.
💡 SuggestionValidate and clamp the poll timeout to a reasonable range (e.g., 1-300 seconds).
🔧 Suggested Fix
asyncMode: mcpArgs.includes('--async'),
        longPollTimeout: getArg('--poll-timeout') ? Math.min(300, Math.max(1, Number(getArg('--poll-timeout')))) : undefined,

Architecture Issues (11)

Severity Location Issue
🟠 Error src/mcp-job-manager.ts:1-327
The JobManager class duplicates significant functionality already provided by the existing TaskStore + trackExecution pattern. The codebase already has a mature async job system (TaskQueue, TaskStore, trackExecution) that handles task lifecycle, state transitions, persistence, heartbeats, and cleanup. JobManager reimplements these concerns (task creation, state updates, heartbeat timer, result extraction) instead of reusing the existing infrastructure.
💡 SuggestionInstead of creating a new JobManager, leverage the existing trackExecution function which already wraps engine execution with task lifecycle management. For MCP async mode, simply create a task via taskStore.createTask(), transition to 'working', and return the task ID. The client can then poll taskStore.getTask() directly. This eliminates 300+ lines of duplicated code and reuses the proven, well-tested task lifecycle infrastructure.
🔧 Suggested Fix
Replace JobManager with direct TaskStore usage:
  1. In start_job: Create task with taskStore.createTask(), transition to 'working', return task ID
  2. In get_job: Call taskStore.getTask() and build response from AgentTask
  3. Remove duplicate heartbeat, result extraction, and state management logic

The trackExecution function already handles heartbeats, state transitions, and result extraction. Use it for the actual execution.

🟠 Error src/mcp-job-manager.ts:189-267
The startJob method contains complex inline task creation and background execution logic that duplicates trackExecution. It manually creates tasks, transitions state, sets up heartbeats, extracts results from multiple formats, and handles errors - all of which trackExecution already does. This is a clear violation of DRY and creates maintenance burden.
💡 SuggestionUse trackExecution directly:

const { task } = await trackExecution(
{ taskStore, source: 'mcp', workflowId, messageText, metadata },
() => executeWorkflow(args)
);
return { job_id: task.id, ... };

This eliminates 80+ lines of manual lifecycle management and ensures consistent behavior across all frontends.

🔧 Suggested Fix
Replace entire startJob method with trackExecution call.
🟠 Error src/mcp-job-manager.ts:189-267
The JobManager.startJob method manually manages task lifecycle (create, transition to working, heartbeat, complete/fail) while the rest of the codebase uses trackExecution for this purpose. This creates inconsistent patterns - Slack, Telegram, Email, TUI, and A2A frontends all use trackExecution, but MCP async mode uses its own custom implementation.
💡 SuggestionFollow the established pattern used by all other frontends: use trackExecution to wrap the execution handler. This ensures consistent behavior, error handling, telemetry, and state transitions across all transport mechanisms.
🔧 Suggested Fix
Replace JobManager.startJob with trackExecution wrapper, same pattern as other frontends.
🟠 Error src/mcp-job-manager.ts:1-327
The entire JobManager class may be YAGNI (You Aren't Gonna Need It). The PR description states this is for Claude Code timeout issues, but the codebase already has async task execution via TaskQueue + TaskStore. Instead of building a new job system, MCP could simply expose the existing task endpoints (create task, poll task status) as MCP tools.
💡 SuggestionConsider whether MCP async mode could be implemented by exposing existing TaskStore operations as MCP tools: create_task (returns task ID), get_task (polls status). This would reuse the entire existing infrastructure without adding 327 lines of new code. The A2A frontend already demonstrates this pattern with its /tasks endpoints.
🔧 Suggested Fix
Remove JobManager entirely, expose TaskStore.createTask and TaskStore.getTask as MCP tools directly.
🟡 Warning src/mcp-job-manager.ts:189-267
The result extraction logic in startJob (lines 223-248) is a special-case implementation of what trackExecution already does generically. It tries multiple formats (MCP content, string, engine result with history) - the same logic already exists in track-execution.ts:197-223. This creates two divergent implementations of the same concern.
💡 SuggestionUse trackExecution which already has the proven result extraction logic. If the MCP-specific format is needed, add a transformation layer after trackExecution returns.
🔧 Suggested Fix
Let trackExecution handle result extraction, then transform to MCP format if needed.
🟡 Warning src/mcp-job-manager.ts:269-288
The getJob method performs an inefficient linear search through up to 200 tasks when looking up by short ID prefix. For every get_job call, it calls listTasks({ limit: 200 }) and iterates through all results. This is O(n) and doesn't scale.
💡 SuggestionEither: (1) Store short ID → full ID mapping in a Map for O(1) lookup, (2) Add a database index on task ID prefix, or (3) Require full UUIDs only. The current approach of scanning 200 tasks per poll is wasteful.
🔧 Suggested Fix
Add a Map<string, string> for shortId → fullId mapping, updated on task creation.
🟡 Warning src/mcp-server.ts:518-599
The registerAsyncJobTools function duplicates tool registration logic. It has two nearly identical branches for fixed vs generic workflow mode (lines 540-567 and 569-596), differing only in the schema passed. This is the same pattern already handled in the non-async code paths.
💡 SuggestionExtract the common tool registration pattern into a shared helper that takes the schema as a parameter. The async vs blocking mode should only affect the handler wrapper, not the registration structure.
🔧 Suggested Fix
Create a helper function that registers tools with a given schema and handler, reducing duplication between async and sync modes.
🟡 Warning src/runners/mcp-server-runner.ts:109-163
The async mode tool registration in McpServerRunner duplicates the same pattern as registerAsyncJobTools in mcp-server.ts. Both create start_job and get_job tools with nearly identical logic. This violates DRY and creates two implementations to maintain.
💡 SuggestionExtract the async job tool registration into a shared utility function that can be reused by both mcp-server.ts and mcp-server-runner.ts. The function should accept the job manager, workflow executor, and tool name as parameters.
🔧 Suggested Fix
Move async tool registration to a shared helper in mcp-server.ts, imported by both files.
🟡 Warning src/mcp-job-manager.ts:27-176
The buildResponse function and JobResponse type create an MCP-specific response format that duplicates information already present in AgentTask. AgentTask already has status, artifacts, history, and metadata. The transformation adds complexity without clear benefit - the MCP client could consume AgentTask directly with a simple formatter.
💡 SuggestionEither: (1) Return AgentTask directly from get_job and let the MCP layer format it, or (2) Make the transformation a simple view/adapter rather than a complex nested structure. The current 150-line response builder is over-engineered.
🔧 Suggested Fix
Simplify to a thin adapter that maps AgentTask fields to a flat MCP response structure.
🟡 Warning src/mcp-job-manager.ts:189-267
The heartbeat timer implementation (lines 207-212) duplicates heartbeat logic that should be handled by trackExecution. The existing trackExecution already manages heartbeats (track-execution.ts:177-182). Implementing it separately creates two heartbeat mechanisms that may behave differently.
💡 SuggestionUse trackExecution which already handles heartbeat management. If custom heartbeat timing is needed for MCP, add it as a parameter to trackExecution rather than reimplementing the mechanism.
🔧 Suggested Fix
Remove manual heartbeat timer, rely on trackExecution's built-in heartbeat.
🟡 Warning src/mcp-server.ts:518-599
The registerAsyncJobTools function is tightly coupled to JobManager implementation details. It imports JobManager from mcp-job-manager and calls its methods directly. This creates a dependency chain that makes it difficult to change the job management implementation or reuse the registration logic.
💡 SuggestionDefine an interface for job management (startJob, getJob) and have registerAsyncJobTools accept that interface rather than concrete JobManager class. This allows different implementations (e.g., direct TaskStore usage, mocked for tests) and improves testability.
🔧 Suggested Fix
Define JobManager interface, accept it as parameter rather than importing concrete class.

Quality Issues (27)

Severity Location Issue
🟠 Error src/mcp-job-manager.ts:267
If handler() throws before the heartbeat timer is set, the timer variable won't be defined but the catch block tries to clear it. This could cause a ReferenceError.
💡 SuggestionMove the heartbeat timer setup before the handler() call, or check if heartbeatTimer is defined before clearing it in the catch block.
🟠 Error tests/unit/mcp-job-manager.test.ts:137
Tests use arbitrary setTimeout delays (50ms, 100ms) to wait for async operations. This creates timing-dependent tests that may fail on slow systems or pass incorrectly on fast systems.
💡 SuggestionUse a more reliable synchronization mechanism such as: (1) Await a promise that resolves when the job actually completes, (2) Use a mock clock library, or (3) Poll the job status until done=true with a timeout.
🟠 Error src/mcp-job-manager.ts:267
The PR description mentions idempotency keys as a feature, but the implementation doesn't actually use them. The startJob method accepts an opts parameter but there's no idempotency_key handling.
💡 SuggestionEither implement the idempotency key feature (add parameter, check for existing jobs with same key, return existing job_id) or remove it from the PR description and documentation.
🟠 Error src/mcp-job-manager.ts:267
The PR description mentions automatic cleanup after 1 hour, but there's no TTL cleanup logic in the JobManager implementation. Jobs will persist indefinitely in the TaskStore.
💡 SuggestionEither implement the TTL cleanup feature (add a timer that calls taskStore.deleteExpiredTasks() or similar) or remove the 1-hour cleanup claim from the PR description.
🟠 Error src/mcp-job-manager.ts:267
PR description claims 6-character hex job IDs using crypto.randomBytes(3), but implementation uses task.id.slice(0, 8) which is the first 8 characters of a UUID (not 6 hex chars).
💡 SuggestionEither update the implementation to use 6-character hex IDs as documented, or update the PR description and comments to reflect the actual 8-character UUID prefix behavior.
🟡 Warning tests/unit/mcp-job-manager.test.ts:137
Test uses magic number 50 (setTimeout delay) without clear semantic meaning. This arbitrary timeout could cause flaky tests if the async handler takes longer than 50ms to complete.
💡 SuggestionReplace with a named constant or use a more reliable synchronization mechanism like awaiting the job completion promise or using a mock clock.
🟡 Warning tests/unit/mcp-job-manager.test.ts:137
Test uses magic number 100 (setTimeout delay) without clear semantic meaning. The test waits 100ms for async completion but this value is arbitrary - if the handler takes longer, tests will flake.
💡 SuggestionUse a named constant like TEST_ASYNC_DELAY or await a promise that resolves when the job actually completes. Consider using a mock clock.
🟡 Warning tests/unit/mcp-job-manager.test.ts:137
Test uses magic number 50 for setTimeout delay. This arbitrary timeout value should be a named constant to make test behavior explicit.
💡 SuggestionExtract to a named constant like TEST_ASYNC_COMPLETION_DELAY_MS = 50 to clarify its purpose in tests.
🟡 Warning tests/unit/mcp-job-manager.test.ts:137
Test uses magic number 8 for job_id length check. While this is documented as 'first 8 chars of UUID', it should be a named constant to make the intent explicit and avoid duplication.
💡 SuggestionExtract to a named constant like JOB_ID_PREFIX_LENGTH = 8 and reference it in both the implementation and tests.
🟡 Warning tests/unit/mcp-job-manager.test.ts:137
Test uses magic number 10 for task list limit. This arbitrary value should be a named constant to make its purpose clear.
💡 SuggestionExtract to a named constant like TEST_TASK_LIST_LIMIT = 10 to improve test readability.
🟡 Warning src/mcp-job-manager.ts:267
Magic number 60_000 (60000ms) for heartbeat interval should be a named constant to make the duration clear and maintainable.
💡 SuggestionExtract to a named constant like HEARTBEAT_INTERVAL_MS = 60_000 to improve readability.
🟡 Warning src/mcp-job-manager.ts:267
Magic number 10 for polling delay appears in multiple places and should be a named constant to ensure consistency across the codebase.
💡 SuggestionExtract to a named constant like DEFAULT_POLLING_DELAY_SECONDS = 10 in a shared constants file.
🟡 Warning src/mcp-job-manager.ts:267
Magic number 8 for job_id prefix length appears multiple times and should be a named constant to ensure consistency.
💡 SuggestionExtract to a named constant like JOB_ID_PREFIX_LENGTH = 8 and reference it throughout the codebase.
🟡 Warning src/mcp-job-manager.ts:292
Magic number 36 for UUID length check is an implementation detail that should be a named constant.
💡 SuggestionExtract to a named constant like UUID_LENGTH = 36 or use a UUID validation library instead of checking length.
🟡 Warning src/mcp-job-manager.ts:294
getJob() searches through up to 200 tasks to find a job by prefix. This is O(n) linear search that will degrade as the number of jobs grows.
💡 SuggestionConsider maintaining a separate index Map<string, string> mapping short IDs to full IDs, or use a prefix-based data structure like a trie for O(1) lookups.
🟡 Warning src/mcp-job-manager.ts:294
Magic number 200 for task search limit is arbitrary and should be a named constant.
💡 SuggestionExtract to a named constant like TASK_SEARCH_LIMIT = 200 to make the intent explicit.
🟡 Warning src/mcp-job-manager.ts:226
Empty catch block ignores errors during result text extraction. This could silently fail and return 'Execution completed' even when extraction fails.
💡 SuggestionAdd logging or restructure to handle extraction errors more explicitly. At minimum, add a comment explaining why the error is ignored.
🟡 Warning src/mcp-job-manager.ts:267
Heartbeat timer is created but there's no guarantee clearInterval will be called if the process exits abnormally. While unref() helps, the timer could leak in some error scenarios.
💡 SuggestionConsider adding a cleanup method or using process exit handlers to ensure timers are cleared. Document the lifetime of the timer.
🟡 Warning tests/unit/mcp-job-manager.test.ts:137
No tests for concurrent job scenarios - what happens when multiple jobs are started simultaneously or when get_job is called before the job completes?
💡 SuggestionAdd tests for concurrent job execution, race conditions between start and get operations, and polling during job execution.
🟡 Warning tests/unit/mcp-job-manager.test.ts:137
No tests for invalid inputs - what happens with empty job_id, special characters, or extremely long job_id strings?
💡 SuggestionAdd tests for edge cases like empty string, null/undefined, very long strings, and special characters in job_id parameter.
🟡 Warning tests/unit/mcp-job-manager.test.ts:137
No tests for handler that throws synchronously vs asynchronously - the error handling path may differ.
💡 SuggestionAdd tests for both synchronous and asynchronous errors in handlers to ensure error handling works correctly in both cases.
🟡 Warning src/mcp-job-manager.ts:226
Result text extraction logic is complex and nested with multiple fallback paths. This makes it hard to understand what value will actually be returned.
💡 SuggestionExtract the result extraction logic into a separate, well-tested function with clear documentation of the fallback priority.
🟡 Warning src/mcp-job-manager.ts:267
Empty catch block when clearing interval in heartbeat timer suppresses errors without logging. This could hide issues with the task store.
💡 SuggestionAdd at least a debug log statement to track when heartbeat failures occur, or remove the try-catch if errors should propagate.
🟡 Warning src/mcp-job-manager.ts:267
Empty catch block when updating task state to 'failed' suppresses all errors. This could leave the task in an inconsistent state.
💡 SuggestionAdd logging to track when state transitions fail, or consider rethrowing critical errors that prevent proper task state management.
🟡 Warning src/mcp-job-manager.ts:267
Empty catch block when updating task state to 'completed' suppresses errors. This could result in jobs completing successfully but the task store not being updated.
💡 SuggestionAdd logging to track when state transitions fail. Consider whether the job should be marked as failed if the state update fails.
🟡 Warning src/mcp-server.ts:518
getOrCreateTaskStore() creates a new SqliteTaskStore if none provided, but there's no corresponding cleanup/shutdown logic. This could lead to resource leaks when the server is created multiple times.
💡 SuggestionDocument who is responsible for shutting down the TaskStore, or add a shutdown method that cleans up any store created by getOrCreateTaskStore.
🟡 Warning src/runners/mcp-server-runner.ts:109
Async job tool registration logic is duplicated between mcp-server.ts (registerAsyncJobTools) and mcp-server-runner.ts. This creates maintenance burden and inconsistency risk.
💡 SuggestionExtract the common async job tool registration logic into a shared function in mcp-job-manager.ts or a new utility module, and call it from both places.

Powered by Visor from Probelabs

Last updated: 2026-03-21T17:37:12.081Z | Triggered by: pr_updated | Commit: bb362b2

💡 TIP: You can chat with Visor using /visor ask <your question>

buger and others added 4 commits March 21, 2026 16:50
Instead of a standalone in-memory Map, JobManager now delegates to the
existing TaskStore (SQLite-backed). Jobs are stored as regular Visor
tasks, visible via `visor tasks list` and `visor tasks show`.

- Removed duplicate storage, cleanup timers, and idempotency index
- Job IDs are first 8 chars of UUID (matching `visor tasks` CLI format)
- start_job creates a task, transitions to working, runs handler async
- get_job queries TaskStore by ID (supports short ID prefix lookup)
- MCP server creates a SqliteTaskStore if none is provided

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The get_job long poll timeout is now configurable via --poll-timeout CLI
flag, long_poll_timeout config option, or VISOR_MCP_POLL_TIMEOUT env var.
Defaults to 59 seconds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests for plain string results, non-Error thrown values, concurrent
getJob calls, and full response shape validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Log warning instead of silently ignoring state update failures in
  error handler
- Fix misleading test comment and tighten timing assertions
- Use named constant for timeout and validate bounds precisely

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@buger buger merged commit 67a3272 into main Mar 21, 2026
13 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant