feat: add async job mode for MCP server by buger · Pull Request #558 · probelabs/visor

buger · 2026-03-21T16:38:13Z

Summary

Adds opt-in async job mode (start_job/get_job) for MCP server to work around Claude Code's long-running tool call timeout limitations
Instead of blocking on a single tool call, the client gets a job_id immediately and calls get_job which uses long polling (waits up to 59 seconds by default)
Jobs are backed by the existing TaskStore (SQLite), so they're visible via visor tasks list and visor tasks show
Works for both standalone (visor mcp-server --async) and runner (visor --mcp --mcp-async) modes

How It Works

Client calls start_job with the message/workflow parameters
Server creates a task, kicks off execution in the background, returns immediately with a job_id (8-char UUID prefix)
Client calls get_job with the job_id — the server holds the connection open for up to 59 seconds (configurable), checking every 500ms
If the job finishes during the wait, the result is returned immediately
If the timeout is reached, a "still running" response is returned and the client calls get_job again

This drastically reduces round-trips compared to short-interval polling. A 10-minute job requires ~10 polls (59s each) instead of ~60 polls (10s each).

Response Shape

Every response from both start_job and get_job uses the same standardized shape:

{
  "job_id": "a1b2c3d4",
  "status": "queued|running|completed|failed|expired",
  "done": false,
  "progress": { "percent": 0, "step": "queued", "message": "Job accepted" },
  "polling": { "recommended_next_action": "get_job", "recommended_delay_seconds": 0 },
  "result": null,
  "error": null,
  "user_message": "The job has started.",
  "next_instruction_for_model": "Call get_job with this job_id..."
}

Changes

File	Change
`src/mcp-job-manager.ts` (NEW)	`JobManager` class — thin wrapper over TaskStore with long polling, 8-char job IDs
`src/mcp-server.ts`	`asyncMode` + `longPollTimeout` options; registers `start_job`/`get_job` when enabled
`src/runners/mcp-server-runner.ts`	Same async mode support for runner
`src/runners/runner-factory.ts`	Passes `asyncMode` and `longPollTimeout` from CLI/config/env
`src/cli.ts` / `src/types/cli.ts`	`--mcp-async` CLI flag
`src/cli-main.ts`	`--async` and `--poll-timeout` flags for `mcp-server` subcommand
`tests/unit/mcp-job-manager.test.ts` (NEW)	14 unit tests

Configuration

# Standalone
visor mcp-server --config workflow.yaml --async
visor mcp-server --async --poll-timeout 30  # custom timeout (seconds)

# Runner mode
visor --mcp --mcp-auth-token secret --mcp-async

# Via env var
VISOR_MCP_ASYNC=true visor --mcp ...
VISOR_MCP_POLL_TIMEOUT=30 visor --mcp --mcp-async ...

# Via config
mcp_server:
  async_mode: true
  long_poll_timeout: 30  # seconds, default 59

Test plan

14 unit tests for JobManager covering:
- Start job returns immediately with running status
- Long-poll returns result when job completes during wait
- Immediate return for already-completed jobs
- Failure handling (Error and non-Error thrown values)
- Expired/unknown job IDs return immediately (no waiting)
- 8-char hex job IDs (UUID prefix)
- Short ID prefix lookup
- Task visibility in TaskStore
- workflowId passthrough
- Plain string result extraction
- Concurrent getJob calls on same job
- Full response shape validation
- Configurable long poll timeout
Full test suite passes
Manual test: start MCP server with --async, call start_job and get_job via curl
Manual test: verify Claude Code can poll get_job for long-running workflows

🤖 Generated with Claude Code

Claude Code and other MCP clients have timeout issues with long-running tool calls. This adds an opt-in async mode that replaces blocking tools with start_job (returns immediately with job_id) and get_job (polls for status/result). Usage: visor mcp-server --config workflow.yaml --async visor --mcp --mcp-auth-token secret --mcp-async Features: - JobManager with 6-char hex job IDs, idempotency keys, TTL cleanup - Consistent response shape with progress, polling hints, and model instructions - Works for both standalone (mcp-server) and runner (--mcp) modes - Configurable via CLI flags, YAML config, or VISOR_MCP_ASYNC env var Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

probelabs · 2026-03-21T16:42:16Z

PR Overview: Async Job Mode for MCP Server

Summary

This PR introduces an opt-in async job mode for the MCP (Model Context Protocol) server to address timeout limitations in Claude Code and other MCP clients when handling long-running tool calls. The implementation uses a long polling pattern (start_job/get_job) that dramatically reduces round-trips compared to traditional short-interval polling.

Files Changed

New Files

src/mcp-job-manager.ts (+378 lines) — Core JobManager class implementing async job lifecycle, long polling, and TaskStore integration
tests/unit/mcp-job-manager.test.ts (+306 lines) — Comprehensive unit tests (14 test cases)

Modified Files

src/mcp-server.ts (+123/-4) — Adds asyncMode/longPollTimeout options, getOrCreateTaskStore(), and registerAsyncJobTools()
src/runners/mcp-server-runner.ts (+81/-16) — Integrates async mode for runner-based MCP servers with TaskStore initialization
src/runners/runner-factory.ts (+6) — Passes asyncMode and longPollTimeout from CLI/config/env
src/cli.ts (+2) — Adds --mcp-async CLI flag
src/cli-main.ts (+2) — Adds --async and --poll-timeout flags for mcp-server subcommand
src/types/cli.ts (+2) — Adds mcpAsync option to CliOptions interface

Architecture & Impact Assessment

What This PR Accomplishes

Solves Timeout Problem: Claude Code and other MCP clients have timeouts for long-running tool calls. The async pattern returns immediately with a job ID, then uses long polling (waits up to 59 seconds per request) to retrieve results.
TaskStore Integration: Jobs are persisted as regular Visor tasks using the existing SQLite-backed TaskStore, making them visible via visor tasks list and visor tasks show.
Efficient Polling: A 10-minute job requires ~10 polls (59s each) instead of ~60 polls (10s each) — 6x reduction in round-trips.
Dual Mode Support: Works for both standalone MCP server (visor mcp-server --async) and runner mode (visor --mcp --mcp-async).

Key Technical Changes

JobManager Class (`src/mcp-job-manager.ts`)

Wraps TaskStore for job persistence (SQLite-backed)
Uses 8-character hex job IDs (UUID prefix) for user-friendly identifiers
Background heartbeat timer (60s intervals) for task liveness
Maps TaskState → JobStatus for MCP responses
Extracts results from task status messages, artifacts, and engine results
Long polling with 500ms check interval (configurable timeout, default 59s)

Response Schema

interface JobResponse {
  job_id: string;           // 8-char UUID prefix
  status: 'queued' | 'running' | 'completed' | 'failed' | 'expired';
  done: boolean;
  progress: { percent: number | null; step: string; message: string };
  polling: {
    recommended_next_action: 'get_job' | 'none';
    recommended_delay_seconds: number;
  };
  result: any;
  error: { code: string; message: string; retryable: boolean } | null;
  user_message: string;
  next_instruction_for_model: string;
}

MCP Tool Registration

When asyncMode is enabled:

start_job (or start_{toolName}): Accepts workflow parameters, returns job response immediately
get_job: Accepts job_id, uses long polling to wait for completion

When asyncMode is disabled: Existing blocking tool behavior is preserved.

Affected System Components

Standalone MCP Server (visor mcp-server --async)
- Uses registerAsyncJobTools() in both createHttpMcpServer() and startMcpServer()
- Creates TaskStore if not provided via options
Runner-based MCP Server (visor --mcp --mcp-async)
- McpServerRunner conditionally registers async tools when asyncMode is true
- Integrates with existing handleMessage() workflow
Configuration Chain
- CLI flags → CliOptions.mcpAsync → RunnerFactory → McpServerRunner/createHttpMcpServer
- Environment variable: VISOR_MCP_ASYNC=true, VISOR_MCP_POLL_TIMEOUT=30
- Config file: mcp_server: { async_mode: true, long_poll_timeout: 30 }

Component Flow Diagram

sequenceDiagram
    participant Client as Claude Code / MCP Client
    participant MCP as MCP Server
    participant JM as JobManager
    participant TS as "TaskStore (SQLite)"
    participant WF as Workflow Executor

    Client->>MCP: start_job(message, workflow)
    MCP->>JM: startJob(handler, opts)
    JM->>TS: createTask()
    JM->>TS: updateTaskState(working)
    JM-->>MCP: JobResponse {job_id, status: running}
    MCP-->>Client: Immediate response (job_id: a1b2c3d4)
    
    par Background Execution
        JM->>WF: execute workflow
        WF-->>JM: result/error
        JM->>TS: updateTaskState(completed/failed)
    and
        loop Every 60s
            JM->>TS: heartbeat()
        end
    end

    Client->>MCP: get_job(job_id)
    MCP->>JM: getJob(job_id)
    
    loop Long Poll (up to 59s)
        JM->>TS: getTask(job_id)
        alt Task completed/failed
            JM-->>MCP: JobResponse {done: true, result}
            MCP-->>Client: Final result
        else Still running after timeout
            JM-->>MCP: JobResponse {done: false, status: running}
            MCP-->>Client: Poll again
        end
    end

Configuration Options

Method	Example
CLI (standalone)	`visor mcp-server --config workflow.yaml --async --poll-timeout 30`
CLI (runner)	`visor --mcp --mcp-auth-token secret --mcp-async`
Environment	`VISOR_MCP_ASYNC=true VISOR_MCP_POLL_TIMEOUT=30 visor --mcp`
Config file	`mcp_server: { async_mode: true, long_poll_timeout: 30 }`

Scope Discovery & Context Expansion

Directly Affected Modules

MCP Server Core (src/mcp-server.ts) — Tool registration, TaskStore integration
MCP Server Runner (src/runners/mcp-server-runner.ts) — Runner integration with TaskStore
CLI Layer (src/cli.ts, src/cli-main.ts) — Flag parsing
Type Definitions (src/types/cli.ts) — Interface updates

Related Components

TaskStore (src/agent-protocol/task-store) — SQLite persistence layer for jobs
Workflow Executor — Called by async job handlers
Session Management — session_id parameter preserved in async mode
Agent Protocol — Task state management and lifecycle
A2A Frontend — Also uses TaskStore for HTTP task management

Testing Coverage

Unit Tests: 14 tests covering:
- Job lifecycle (start → complete → get)
- Long polling behavior (early return, timeout)
- Failure handling (Error and non-Error thrown values)
- Job ID format (8-char hex) and prefix lookup
- TaskStore visibility and metadata
- Concurrent getJob calls
- Configurable timeout
- Full response shape validation

Potential Review Focus Areas

TaskStore Lifecycle: JobManager creates TaskStore if not provided — ensure proper initialization/shutdown
Heartbeat Timer: Uses unref() to avoid blocking process exit
Error Handling: Failed jobs include retryable: true flag; silent catch blocks for state transitions
Backward Compatibility: Async mode is opt-in; existing blocking behavior unchanged
Result Extraction: Complex logic to extract text from various result formats (MCP content, engine results)
Security Considerations: No job ownership validation (any authenticated client can access any job_id), 8-char IDs have limited entropy (32 bits)

Usage Example

# Start MCP server in async mode
visor mcp-server --config workflow.yaml --async

# Client flow:
# 1. Call start_job with workflow parameters
# 2. Receive immediate response with job_id (8-char hex)
# 3. Poll get_job (waits up to 59s per call)
# 4. Extract result from final response when done=true

# Jobs are visible via:
visor tasks list      # Shows async jobs alongside regular tasks
visor tasks show <id>  # Full task details

References

Code Locations

src/mcp-job-manager.ts:1-378 — JobManager class implementation
src/mcp-job-manager.ts:237-343 — startJob() method with task creation and background execution
src/mcp-job-manager.ts:345-393 — getJob() method with long polling logic
src/mcp-job-manager.ts:395-410 — resolveTask() for short ID prefix lookup
src/mcp-server.ts:92-99 — McpServerOptions interface (asyncMode, longPollTimeout, taskStore)
src/mcp-server.ts:521-615 — registerAsyncJobTools() function
src/mcp-server.ts:515-519 — getOrCreateTaskStore() helper
src/mcp-server.ts:683-688 — Async mode conditional in createHttpMcpServer()
src/mcp-server.ts:921-928 — Async mode conditional in startMcpServer()
src/runners/mcp-server-runner.ts:29-35 — McpFrontendOptions interface
src/runners/mcp-server-runner.ts:111-189 — Async tool registration in runner
src/cli.ts:45 — --mcp-async CLI flag definition
src/cli-main.ts:1319-1320 — --async and --poll-timeout flag parsing
src/types/cli.ts:80-81 — CliOptions.mcpAsync type definition
src/runners/runner-factory.ts:153-161 — Async mode option passing
tests/unit/mcp-job-manager.test.ts:1-306 — Unit tests

Metadata

Review Effort: 3 / 5
Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-03-21T17:43:57.734Z | Triggered by: pr_updated | Commit: bb362b2

💡 TIP: You can chat with Visor using /visor ask <your question>

probelabs · 2026-03-21T16:45:00Z

Security Issues (8)

Severity	Location	Issue
🟠 Error	`src/mcp-job-manager.ts:336`	The resolveTask() method uses linear search through up to 200 tasks to find prefix matches, allowing unauthorized access to other users' jobs. An attacker can brute-force 8-character hex prefixes (16^8 = 4 billion combinations) to enumerate and access any job ID. 💡 Suggestion Add authorization checks to ensure users can only access their own jobs. Consider using a cryptographically random job ID with higher entropy (at least 12-16 characters) or implement proper access control. Add rate limiting to prevent brute-force attacks. 🔧 Suggested Fix private resolveTask(jobId: string, userId?: string): import('./agent-protocol/types').AgentTask \| null { // Try direct lookup first (full UUID) let task = this.taskStore.getTask(jobId); // Authorization check: only allow access to own tasks if (task && userId && task.metadata?.user_id !== userId) { return null; } // If not found and looks like a short ID, search by prefix WITH authorization if (!task && jobId.length < 36 && jobId.length >= 8) { const { tasks } = this.taskStore.listTasks({ limit: 200, metadata: userId ? { user_id: userId } : undefined }); task = tasks.find(t => t.id.startsWith(jobId)) \|\| null; } return task; `}`
🟠 Error	`src/mcp-server.ts:595`	The registerAsyncJobTools() function registers start_job and get_job tools without any authorization checks. Any authenticated client can start jobs and query any job_id. 💡 Suggestion Implement proper authorization to ensure users can only access their own jobs. Pass user context from the MCP session to JobManager. 🔧 Suggested Fix `async (args: { job_id: string }) => { // Extract user context from MCP session const userId = (server as any).getSessionUser?.(req) \|\| 'anonymous'; // Pass userId to enforce access control const response = await jobManager.getJob(args.job_id, userId); return { content: [{ type: 'text' as const, text: JSON.stringify(response, null, 2) }] }; }</code></pre>`
🟠 Error	`src/runners/mcp-server-runner.ts:145`	In async mode, all jobs are stored in a shared TaskStore without tenant isolation. Different users or workflow runs could potentially access each other's jobs via job ID enumeration. 💡 Suggestion Implement proper multi-tenancy by adding user/session context to job metadata and enforcing access control in resolveTask(). 🔧 Suggested Fix `const response = jobManager.startJob( async () => this.handleMessage(args.message, args.session_id), { messageText: args.message, workflowId: allChecks.join(','), metadata: { user_id: 'mcp-user', // TODO: Extract from auth context session_id: args.session_id, } } );`
🟡 Warning	`src/mcp-job-manager.ts:236`	No input validation on job_id parameter in getJob(). Malicious or malformed job IDs could cause unexpected behavior or database errors. 💡 Suggestion Validate job_id format before processing. Ensure it's a valid hex string and reasonable length. 🔧 Suggested Fix `async getJob(jobId: string): Promise<JobResponse> { // Validate job_id format if (!jobId \|\| typeof jobId !== 'string') { return buildExpiredResponse(jobId); } // Ensure job_id is hex and reasonable length if (!/^[0-9a-fA-F]{8,36}$/.test(jobId)) { return buildExpiredResponse(jobId); } const resolvedTask = this.resolveTask(jobId);</code></pre>`
🟡 Warning	`src/mcp-job-manager.ts:287`	Error objects are converted to strings and exposed to clients via error.message field, potentially revealing internal implementation details, file paths, or stack traces. 💡 Suggestion Sanitize error messages before exposing them to clients. Log detailed errors server-side but return generic messages to clients. 🔧 Suggested Fix .catch(err => { clearInterval(heartbeatTimer); // Sanitize error message for client exposure const errorText = err instanceof Error ? err.message : String(err); const sanitizedError = errorText.length > 500 ? errorText.slice(0, 500) + '...' : errorText; // Log full error server-side for debugging logger.error(`[MCP-AsyncJob] Job ${task.id.slice(0, 8)} failed: ${errorText}`); const failMsg = { message_id: crypto.randomUUID(), role: 'agent' as const, parts: [{ text: sanitizedError }], };</code></pre>
🟡 Warning	`src/mcp-job-manager.ts:236`	Long polling with configurable timeout (default 59s) and 500ms intervals allows resource exhaustion. An attacker can create many jobs and continuously poll getJob(), consuming server resources and database connections. 💡 Suggestion Implement rate limiting per client/IP, add maximum concurrent poll limit, and consider exponential backoff for repeated polls on the same job. 🔧 Suggested Fix // Add rate limiting and connection tracking private activePolls = new Map<string, number>(); // job_id -> count private readonly MAX_CONCURRENT_POLLS = 100; private readonly MAX_POLLS_PER_JOB = 10; async getJob(jobId: string): Promise<JobResponse> { // Enforce rate limits const totalActivePolls = Array.from(this.activePolls.values()).reduce((a, b) => a + b, 0); if (totalActivePolls > this.MAX_CONCURRENT_POLLS) { throw new Error('Server busy - too many active polls'); } const currentPolls = this.activePolls.get(jobId) \|\| 0; if (currentPolls > this.MAX_POLLS_PER_JOB) { throw new Error('Too many polls on this job - please wait'); } `this.activePolls.set(jobId, currentPolls + 1); try { // ... existing polling logic } finally { this.activePolls.set(jobId, Math.max(0, (this.activePolls.get(jobId) \|\| 0) - 1)); }`
🟡 Warning	`src/mcp-job-manager.ts:236`	The long polling loop checks every 500ms without any maximum iteration limit. A bug in task state transitions could cause infinite loops. 💡 Suggestion Add a maximum number of poll iterations to prevent infinite loops. 🔧 Suggested Fix `// Long poll: check every 500ms for up to the configured timeout const POLL_INTERVAL_MS = 500; const MAX_WAIT_MS = this.longPollTimeoutMs; const MAX_ITERATIONS = Math.ceil(MAX_WAIT_MS / POLL_INTERVAL_MS) + 1; const deadline = Date.now() + MAX_WAIT_MS; let iterations = 0; while (Date.now() < deadline && iterations < MAX_ITERATIONS) { iterations++; await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL_MS));</code></pre>`
🟡 Warning	`src/cli-main.ts:1319`	The --poll-timeout flag accepts any number without validation. Extremely large values could cause resource exhaustion. 💡 Suggestion Validate and clamp the poll timeout to a reasonable range (e.g., 1-300 seconds). 🔧 Suggested Fix `asyncMode: mcpArgs.includes('--async'), longPollTimeout: getArg('--poll-timeout') ? Math.min(300, Math.max(1, Number(getArg('--poll-timeout')))) : undefined,`

Architecture Issues (11)

Severity	Location	Issue
🟠 Error	`src/mcp-job-manager.ts:1-327`	The JobManager class duplicates significant functionality already provided by the existing TaskStore + trackExecution pattern. The codebase already has a mature async job system (TaskQueue, TaskStore, trackExecution) that handles task lifecycle, state transitions, persistence, heartbeats, and cleanup. JobManager reimplements these concerns (task creation, state updates, heartbeat timer, result extraction) instead of reusing the existing infrastructure. 💡 Suggestion Instead of creating a new JobManager, leverage the existing trackExecution function which already wraps engine execution with task lifecycle management. For MCP async mode, simply create a task via taskStore.createTask(), transition to 'working', and return the task ID. The client can then poll taskStore.getTask() directly. This eliminates 300+ lines of duplicated code and reuses the proven, well-tested task lifecycle infrastructure. 🔧 Suggested Fix `Replace JobManager with direct TaskStore usage: In start_job: Create task with taskStore.createTask(), transition to 'working', return task ID In get_job: Call taskStore.getTask() and build response from AgentTask Remove duplicate heartbeat, result extraction, and state management logic` `The trackExecution function already handles heartbeats, state transitions, and result extraction. Use it for the actual execution.`
🟠 Error	`src/mcp-job-manager.ts:189-267`	The startJob method contains complex inline task creation and background execution logic that duplicates trackExecution. It manually creates tasks, transitions state, sets up heartbeats, extracts results from multiple formats, and handles errors - all of which trackExecution already does. This is a clear violation of DRY and creates maintenance burden. 💡 Suggestion Use trackExecution directly: const { task } = await trackExecution( { taskStore, source: 'mcp', workflowId, messageText, metadata }, () => executeWorkflow(args) ); return { job_id: task.id, ... }; This eliminates 80+ lines of manual lifecycle management and ensures consistent behavior across all frontends. 🔧 Suggested Fix `Replace entire startJob method with trackExecution call.`
🟠 Error	`src/mcp-job-manager.ts:189-267`	The JobManager.startJob method manually manages task lifecycle (create, transition to working, heartbeat, complete/fail) while the rest of the codebase uses trackExecution for this purpose. This creates inconsistent patterns - Slack, Telegram, Email, TUI, and A2A frontends all use trackExecution, but MCP async mode uses its own custom implementation. 💡 Suggestion Follow the established pattern used by all other frontends: use trackExecution to wrap the execution handler. This ensures consistent behavior, error handling, telemetry, and state transitions across all transport mechanisms. 🔧 Suggested Fix `Replace JobManager.startJob with trackExecution wrapper, same pattern as other frontends.`
🟠 Error	`src/mcp-job-manager.ts:1-327`	The entire JobManager class may be YAGNI (You Aren't Gonna Need It). The PR description states this is for Claude Code timeout issues, but the codebase already has async task execution via TaskQueue + TaskStore. Instead of building a new job system, MCP could simply expose the existing task endpoints (create task, poll task status) as MCP tools. 💡 Suggestion Consider whether MCP async mode could be implemented by exposing existing TaskStore operations as MCP tools: create_task (returns task ID), get_task (polls status). This would reuse the entire existing infrastructure without adding 327 lines of new code. The A2A frontend already demonstrates this pattern with its /tasks endpoints. 🔧 Suggested Fix `Remove JobManager entirely, expose TaskStore.createTask and TaskStore.getTask as MCP tools directly.`
🟡 Warning	`src/mcp-job-manager.ts:189-267`	The result extraction logic in startJob (lines 223-248) is a special-case implementation of what trackExecution already does generically. It tries multiple formats (MCP content, string, engine result with history) - the same logic already exists in track-execution.ts:197-223. This creates two divergent implementations of the same concern. 💡 Suggestion Use trackExecution which already has the proven result extraction logic. If the MCP-specific format is needed, add a transformation layer after trackExecution returns. 🔧 Suggested Fix `Let trackExecution handle result extraction, then transform to MCP format if needed.`
🟡 Warning	`src/mcp-job-manager.ts:269-288`	The getJob method performs an inefficient linear search through up to 200 tasks when looking up by short ID prefix. For every get_job call, it calls listTasks({ limit: 200 }) and iterates through all results. This is O(n) and doesn't scale. 💡 Suggestion Either: (1) Store short ID → full ID mapping in a Map for O(1) lookup, (2) Add a database index on task ID prefix, or (3) Require full UUIDs only. The current approach of scanning 200 tasks per poll is wasteful. 🔧 Suggested Fix `Add a Map<string, string> for shortId → fullId mapping, updated on task creation.`
🟡 Warning	`src/mcp-server.ts:518-599`	The registerAsyncJobTools function duplicates tool registration logic. It has two nearly identical branches for fixed vs generic workflow mode (lines 540-567 and 569-596), differing only in the schema passed. This is the same pattern already handled in the non-async code paths. 💡 Suggestion Extract the common tool registration pattern into a shared helper that takes the schema as a parameter. The async vs blocking mode should only affect the handler wrapper, not the registration structure. 🔧 Suggested Fix `Create a helper function that registers tools with a given schema and handler, reducing duplication between async and sync modes.`
🟡 Warning	`src/runners/mcp-server-runner.ts:109-163`	The async mode tool registration in McpServerRunner duplicates the same pattern as registerAsyncJobTools in mcp-server.ts. Both create start_job and get_job tools with nearly identical logic. This violates DRY and creates two implementations to maintain. 💡 Suggestion Extract the async job tool registration into a shared utility function that can be reused by both mcp-server.ts and mcp-server-runner.ts. The function should accept the job manager, workflow executor, and tool name as parameters. 🔧 Suggested Fix `Move async tool registration to a shared helper in mcp-server.ts, imported by both files.`
🟡 Warning	`src/mcp-job-manager.ts:27-176`	The buildResponse function and JobResponse type create an MCP-specific response format that duplicates information already present in AgentTask. AgentTask already has status, artifacts, history, and metadata. The transformation adds complexity without clear benefit - the MCP client could consume AgentTask directly with a simple formatter. 💡 Suggestion Either: (1) Return AgentTask directly from get_job and let the MCP layer format it, or (2) Make the transformation a simple view/adapter rather than a complex nested structure. The current 150-line response builder is over-engineered. 🔧 Suggested Fix `Simplify to a thin adapter that maps AgentTask fields to a flat MCP response structure.`
🟡 Warning	`src/mcp-job-manager.ts:189-267`	The heartbeat timer implementation (lines 207-212) duplicates heartbeat logic that should be handled by trackExecution. The existing trackExecution already manages heartbeats (track-execution.ts:177-182). Implementing it separately creates two heartbeat mechanisms that may behave differently. 💡 Suggestion Use trackExecution which already handles heartbeat management. If custom heartbeat timing is needed for MCP, add it as a parameter to trackExecution rather than reimplementing the mechanism. 🔧 Suggested Fix `Remove manual heartbeat timer, rely on trackExecution's built-in heartbeat.`
🟡 Warning	`src/mcp-server.ts:518-599`	The registerAsyncJobTools function is tightly coupled to JobManager implementation details. It imports JobManager from mcp-job-manager and calls its methods directly. This creates a dependency chain that makes it difficult to change the job management implementation or reuse the registration logic. 💡 Suggestion Define an interface for job management (startJob, getJob) and have registerAsyncJobTools accept that interface rather than concrete JobManager class. This allows different implementations (e.g., direct TaskStore usage, mocked for tests) and improves testability. 🔧 Suggested Fix `Define JobManager interface, accept it as parameter rather than importing concrete class.`

Quality Issues (27)

Severity	Location	Issue
🟠 Error	`src/mcp-job-manager.ts:267`	If handler() throws before the heartbeat timer is set, the timer variable won't be defined but the catch block tries to clear it. This could cause a ReferenceError. 💡 Suggestion Move the heartbeat timer setup before the handler() call, or check if heartbeatTimer is defined before clearing it in the catch block.
🟠 Error	`tests/unit/mcp-job-manager.test.ts:137`	Tests use arbitrary setTimeout delays (50ms, 100ms) to wait for async operations. This creates timing-dependent tests that may fail on slow systems or pass incorrectly on fast systems. 💡 Suggestion Use a more reliable synchronization mechanism such as: (1) Await a promise that resolves when the job actually completes, (2) Use a mock clock library, or (3) Poll the job status until done=true with a timeout.
🟠 Error	`src/mcp-job-manager.ts:267`	The PR description mentions idempotency keys as a feature, but the implementation doesn't actually use them. The startJob method accepts an opts parameter but there's no idempotency_key handling. 💡 Suggestion Either implement the idempotency key feature (add parameter, check for existing jobs with same key, return existing job_id) or remove it from the PR description and documentation.
🟠 Error	`src/mcp-job-manager.ts:267`	The PR description mentions automatic cleanup after 1 hour, but there's no TTL cleanup logic in the JobManager implementation. Jobs will persist indefinitely in the TaskStore. 💡 Suggestion Either implement the TTL cleanup feature (add a timer that calls taskStore.deleteExpiredTasks() or similar) or remove the 1-hour cleanup claim from the PR description.
🟠 Error	`src/mcp-job-manager.ts:267`	PR description claims 6-character hex job IDs using crypto.randomBytes(3), but implementation uses task.id.slice(0, 8) which is the first 8 characters of a UUID (not 6 hex chars). 💡 Suggestion Either update the implementation to use 6-character hex IDs as documented, or update the PR description and comments to reflect the actual 8-character UUID prefix behavior.
🟡 Warning	`tests/unit/mcp-job-manager.test.ts:137`	Test uses magic number 50 (setTimeout delay) without clear semantic meaning. This arbitrary timeout could cause flaky tests if the async handler takes longer than 50ms to complete. 💡 Suggestion Replace with a named constant or use a more reliable synchronization mechanism like awaiting the job completion promise or using a mock clock.
🟡 Warning	`tests/unit/mcp-job-manager.test.ts:137`	Test uses magic number 100 (setTimeout delay) without clear semantic meaning. The test waits 100ms for async completion but this value is arbitrary - if the handler takes longer, tests will flake. 💡 Suggestion Use a named constant like TEST_ASYNC_DELAY or await a promise that resolves when the job actually completes. Consider using a mock clock.
🟡 Warning	`tests/unit/mcp-job-manager.test.ts:137`	Test uses magic number 50 for setTimeout delay. This arbitrary timeout value should be a named constant to make test behavior explicit. 💡 Suggestion Extract to a named constant like TEST_ASYNC_COMPLETION_DELAY_MS = 50 to clarify its purpose in tests.
🟡 Warning	`tests/unit/mcp-job-manager.test.ts:137`	Test uses magic number 8 for job_id length check. While this is documented as 'first 8 chars of UUID', it should be a named constant to make the intent explicit and avoid duplication. 💡 Suggestion Extract to a named constant like JOB_ID_PREFIX_LENGTH = 8 and reference it in both the implementation and tests.
🟡 Warning	`tests/unit/mcp-job-manager.test.ts:137`	Test uses magic number 10 for task list limit. This arbitrary value should be a named constant to make its purpose clear. 💡 Suggestion Extract to a named constant like TEST_TASK_LIST_LIMIT = 10 to improve test readability.
🟡 Warning	`src/mcp-job-manager.ts:267`	Magic number 60_000 (60000ms) for heartbeat interval should be a named constant to make the duration clear and maintainable. 💡 Suggestion Extract to a named constant like HEARTBEAT_INTERVAL_MS = 60_000 to improve readability.
🟡 Warning	`src/mcp-job-manager.ts:267`	Magic number 10 for polling delay appears in multiple places and should be a named constant to ensure consistency across the codebase. 💡 Suggestion Extract to a named constant like DEFAULT_POLLING_DELAY_SECONDS = 10 in a shared constants file.
🟡 Warning	`src/mcp-job-manager.ts:267`	Magic number 8 for job_id prefix length appears multiple times and should be a named constant to ensure consistency. 💡 Suggestion Extract to a named constant like JOB_ID_PREFIX_LENGTH = 8 and reference it throughout the codebase.
🟡 Warning	`src/mcp-job-manager.ts:292`	Magic number 36 for UUID length check is an implementation detail that should be a named constant. 💡 Suggestion Extract to a named constant like UUID_LENGTH = 36 or use a UUID validation library instead of checking length.
🟡 Warning	`src/mcp-job-manager.ts:294`	getJob() searches through up to 200 tasks to find a job by prefix. This is O(n) linear search that will degrade as the number of jobs grows. 💡 Suggestion Consider maintaining a separate index Map<string, string> mapping short IDs to full IDs, or use a prefix-based data structure like a trie for O(1) lookups.
🟡 Warning	`src/mcp-job-manager.ts:294`	Magic number 200 for task search limit is arbitrary and should be a named constant. 💡 Suggestion Extract to a named constant like TASK_SEARCH_LIMIT = 200 to make the intent explicit.
🟡 Warning	`src/mcp-job-manager.ts:226`	Empty catch block ignores errors during result text extraction. This could silently fail and return 'Execution completed' even when extraction fails. 💡 Suggestion Add logging or restructure to handle extraction errors more explicitly. At minimum, add a comment explaining why the error is ignored.
🟡 Warning	`src/mcp-job-manager.ts:267`	Heartbeat timer is created but there's no guarantee clearInterval will be called if the process exits abnormally. While unref() helps, the timer could leak in some error scenarios. 💡 Suggestion Consider adding a cleanup method or using process exit handlers to ensure timers are cleared. Document the lifetime of the timer.
🟡 Warning	`tests/unit/mcp-job-manager.test.ts:137`	No tests for concurrent job scenarios - what happens when multiple jobs are started simultaneously or when get_job is called before the job completes? 💡 Suggestion Add tests for concurrent job execution, race conditions between start and get operations, and polling during job execution.
🟡 Warning	`tests/unit/mcp-job-manager.test.ts:137`	No tests for invalid inputs - what happens with empty job_id, special characters, or extremely long job_id strings? 💡 Suggestion Add tests for edge cases like empty string, null/undefined, very long strings, and special characters in job_id parameter.
🟡 Warning	`tests/unit/mcp-job-manager.test.ts:137`	No tests for handler that throws synchronously vs asynchronously - the error handling path may differ. 💡 Suggestion Add tests for both synchronous and asynchronous errors in handlers to ensure error handling works correctly in both cases.
🟡 Warning	`src/mcp-job-manager.ts:226`	Result text extraction logic is complex and nested with multiple fallback paths. This makes it hard to understand what value will actually be returned. 💡 Suggestion Extract the result extraction logic into a separate, well-tested function with clear documentation of the fallback priority.
🟡 Warning	`src/mcp-job-manager.ts:267`	Empty catch block when clearing interval in heartbeat timer suppresses errors without logging. This could hide issues with the task store. 💡 Suggestion Add at least a debug log statement to track when heartbeat failures occur, or remove the try-catch if errors should propagate.
🟡 Warning	`src/mcp-job-manager.ts:267`	Empty catch block when updating task state to 'failed' suppresses all errors. This could leave the task in an inconsistent state. 💡 Suggestion Add logging to track when state transitions fail, or consider rethrowing critical errors that prevent proper task state management.
🟡 Warning	`src/mcp-job-manager.ts:267`	Empty catch block when updating task state to 'completed' suppresses errors. This could result in jobs completing successfully but the task store not being updated. 💡 Suggestion Add logging to track when state transitions fail. Consider whether the job should be marked as failed if the state update fails.
🟡 Warning	`src/mcp-server.ts:518`	getOrCreateTaskStore() creates a new SqliteTaskStore if none provided, but there's no corresponding cleanup/shutdown logic. This could lead to resource leaks when the server is created multiple times. 💡 Suggestion Document who is responsible for shutting down the TaskStore, or add a shutdown method that cleans up any store created by getOrCreateTaskStore.
🟡 Warning	`src/runners/mcp-server-runner.ts:109`	Async job tool registration logic is duplicated between mcp-server.ts (registerAsyncJobTools) and mcp-server-runner.ts. This creates maintenance burden and inconsistency risk. 💡 Suggestion Extract the common async job tool registration logic into a shared function in mcp-job-manager.ts or a new utility module, and call it from both places.

Powered by Visor from Probelabs

Last updated: 2026-03-21T17:37:12.081Z | Triggered by: pr_updated | Commit: bb362b2

💡 TIP: You can chat with Visor using /visor ask <your question>

Instead of a standalone in-memory Map, JobManager now delegates to the existing TaskStore (SQLite-backed). Jobs are stored as regular Visor tasks, visible via `visor tasks list` and `visor tasks show`. - Removed duplicate storage, cleanup timers, and idempotency index - Job IDs are first 8 chars of UUID (matching `visor tasks` CLI format) - start_job creates a task, transitions to working, runs handler async - get_job queries TaskStore by ID (supports short ID prefix lookup) - MCP server creates a SqliteTaskStore if none is provided Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The get_job long poll timeout is now configurable via --poll-timeout CLI flag, long_poll_timeout config option, or VISOR_MCP_POLL_TIMEOUT env var. Defaults to 59 seconds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add tests for plain string results, non-Error thrown values, concurrent getJob calls, and full response shape validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Log warning instead of silently ignoring state update failures in error handler - Fix misleading test comment and tighten timing assertions - Use named constant for timeout and validate bounds precisely Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

buger and others added 4 commits March 21, 2026 16:50

feat: add configurable long poll timeout for async job mode

b16ee54

The get_job long poll timeout is now configurable via --poll-timeout CLI flag, long_poll_timeout config option, or VISOR_MCP_POLL_TIMEOUT env var. Defaults to 59 seconds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

test: add comprehensive tests for async job mode

ba7342a

Add tests for plain string results, non-Error thrown values, concurrent getJob calls, and full response shape validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

buger merged commit 67a3272 into main Mar 21, 2026
13 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add async job mode for MCP server#558

feat: add async job mode for MCP server#558
buger merged 5 commits intomainfrom
feat/mcp-improvements

buger commented Mar 21, 2026 •

edited

Loading

Uh oh!

probelabs Bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

probelabs Bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buger commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How It Works

Response Shape

Changes

Configuration

Test plan

Uh oh!

probelabs Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview: Async Job Mode for MCP Server

Summary

Files Changed

New Files

Modified Files

Architecture & Impact Assessment

What This PR Accomplishes

Key Technical Changes

JobManager Class (src/mcp-job-manager.ts)

Response Schema

MCP Tool Registration

Affected System Components

Component Flow Diagram

Configuration Options

Scope Discovery & Context Expansion

Directly Affected Modules

Related Components

Testing Coverage

Potential Review Focus Areas

Usage Example

References

Code Locations

Uh oh!

probelabs Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Issues (8)

Architecture Issues (11)

Quality Issues (27)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

buger commented Mar 21, 2026 •

edited

Loading

probelabs Bot commented Mar 21, 2026 •

edited

Loading

JobManager Class (`src/mcp-job-manager.ts`)

probelabs Bot commented Mar 21, 2026 •

edited

Loading