fix: add response_schema guidance to agent system prompt (#5078)

RafaelPo · github-actions[bot] · commit 10c43e976b97 · 2026-03-25T16:47:12.000Z
## Summary - Adds a dedicated **"Specifying the response schema"** subsection to both the MCP server instructions (`app.py`, used by Claude Desktop / claude.ai) and the everyrow-cc agent system prompt (`sdk_manager.py`) - Explicitly documents the `response_schema` parameter for Agent, Rank, and Single Agent operations with a concrete JSON Schema example - Calls out the anti-pattern of describing output fields only in the task prompt (which causes fallback to `DefaultAgentResponse` — a single `answer` string) ## Context Investigated session `a1f34a56` (G7 Average Temperatures) on Claude Desktop where Claude described three structured output fields (`avg_temp_celsius`, `temp_description`, `data_source`) in the task prompt but didn't pass a `response_schema`, so all rows got `{"answer": "..."}` with inconsistent formats. ## Test plan - [ ] Deploy MCP server to staging and run a multi-field agent operation via Claude Desktop — verify Claude passes `response_schema` with typed fields - [ ] Run same test via everyrow-cc to verify both code paths - [ ] Verify the schema example matches validation in `models.py:_validate_response_schema` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Sourced from commit d003e1fa451df96c2decf8f0675d3132da3b160f
diff --git a/futuresearch-mcp/src/futuresearch_mcp/app.py b/futuresearch-mcp/src/futuresearch_mcp/app.py
@@ -99,11 +99,25 @@ async def no_auth_http_lifespan(_server: FastMCP):
 3. **Rank** — quantitative rating. Prefer an objective metric with units when possible. \
 Use a subjective 0-100 score only if necessary.
 4. **Agent** — open-ended web research when Classify, Rank, and Forecast don't fit. \
-Specify a response schema with descriptive column names (include units, e.g. \
-`population_millions`). Don't add reasoning/justification fields — users can \
-inspect the research behind each row.
+Pass `response_schema` for multi-field output (see below). Don't add reasoning/justification fields — \
+users can inspect the research behind each row.
 5. **Dedupe / Merge** — data consolidation.
 
+## Specifying the response schema (Agent, Rank, Single Agent)
+
+Agent, Rank, and Single Agent accept a `response_schema` parameter. If omitted, results \
+default to a single `{{"answer": string}}` field. Pass it whenever you need typed or \
+multi-field output. The schema is a JSON Schema object:
+
+```json
+{{"type": "object", "properties": {{"avg_temp_celsius": {{"type": "number", "description": "Average annual temperature in Celsius"}}, "data_source": {{"type": "string", "description": "Source of the data"}}}}, "required": ["avg_temp_celsius", "data_source"]}}
+```
+
+- Use descriptive field names with units (e.g. `population_millions`, `revenue_usd`)
+- Mark fields as `required` unless the data may legitimately be unavailable for some rows
+- Do NOT describe the desired output columns only in the `task` prompt — the schema \
+is what controls the output structure
+
 ## Workflow
 1. **Ingest data** — pass `data` (inline list of dicts) or an `artifact_id` \
 (from `futuresearch_upload_data` or `futuresearch_request_upload_url`) to any processing tool.
diff --git a/futuresearch-mcp/src/futuresearch_mcp/tools.py b/futuresearch-mcp/src/futuresearch_mcp/tools.py
@@ -244,6 +244,12 @@ async def futuresearch_agent(
     requested research fields for each row. Agents run in parallel to save
     time and are optimized to find accurate answers at minimum cost.
 
+    `task` describes WHAT to research in natural language. `response_schema`
+    defines the OUTPUT STRUCTURE as a JSON Schema. If omitted, results default
+    to a single {"answer": string} field. Pass it whenever you need typed or
+    multi-field output. Do NOT describe desired output columns only in `task`
+    — the schema is what controls the output structure.
+
     Examples:
     - "Find this company's latest funding round and lead investors"
     - "Research the CEO's background and previous companies"
@@ -323,6 +329,12 @@ async def futuresearch_single_agent(
     to research a single question. The agent can search the web, read pages, and
     return structured results.
 
+    `task` describes WHAT to research in natural language. `response_schema`
+    defines the OUTPUT STRUCTURE as a JSON Schema. If omitted, results default
+    to a single {"answer": string} field. Pass it whenever you need typed or
+    multi-field output. Do NOT describe desired output columns only in `task`
+    — the schema is what controls the output structure.
+
     Examples:
     - "Find the current CEO of Apple and their background"
     - "Research the latest funding round for this company" (with input_data: {"company": "Stripe"})
@@ -401,6 +413,11 @@ async def futuresearch_rank(
     table. Conducts research, and can also apply judgment to the results if the
     criteria are qualitative.
 
+    `task` describes WHAT to score in natural language. `response_schema`
+    optionally defines extra output columns as a JSON Schema. If omitted,
+    only the score column is returned. Do NOT describe desired output columns
+    only in `task` — the schema is what controls the output structure.
+
     Examples:
     - "Estimate this drug's peak annual sales in billions of dollars"
     - "What is this country's 5-year GDP growth rate as a percentage?"
@@ -409,12 +426,6 @@ async def futuresearch_rank(
     This function submits the task and returns immediately with a task_id.
 
     Then immediately follow the instructions in the response to monitor progress.
-
-    Args:
-        params: RankInput
-
-    Returns:
-        Success message containing task_id for monitoring progress
     """
     logger.info(
         "futuresearch_rank: task=%.80s rows=%s",