Skip to content

Conversation

@sawa3030
Copy link
Contributor

Motivation

This PR introduces structured output support to enhance the safety and clarity of communication between the MCP server and clients.

Description of the Changes

  • Introduced two new schemas: StudyInfo and TrialInfo, and applied them as structured return types when the output conforms to these formats.
  • Updated pyproject.toml to enable structured output.

@toshihikoyanase toshihikoyanase self-assigned this Sep 12, 2025
Copy link
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on the structured output support. The PR is not ready for review, but I have an early comment on the class names.

pyproject.toml Outdated
"plotly>=6.0.1",
"torch>=2.7.0",
"bottle>=0.13.4",
"fastmcp>=2.12.2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Do we need to migrate the mcp library from the official mcp library to fastmcp to support structured output? mcp==1.14.0 seems to support the structured output feature.

https://pypi.org/project/mcp/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing it out. I’ve updated the MCP version to mcp[cli]>=1.10.0 and removed fastmcp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your update. I've confirmed that the mcp package supports the structured output since 1.10.0 in https://pypi.org/project/mcp/1.10.0/ and https://github.com/modelcontextprotocol/python-sdk/releases/tag/v1.10.0. Since the current implementation uses Pydantic, we may need to apply the fix modelcontextprotocol/python-sdk#1099, which was included in mcp==1.11.0. I'll check the behavior and report results soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notes] Ah, the change does not contain any field aliases. I believe that modelcontextprotocol/python-sdk#1099 is not required for optuna mcp.

@sawa3030
Copy link
Contributor Author

sawa3030 commented Sep 16, 2025

Deisgn Choices

  • I'm unsure whether the return types of the MCP tools in this PR are appropriate. To keep the implementation simple, I introduced two new return types, StudyResponse and TrialResponse, and replaced some existing return values with these types, while leaving others as simple string returns.
  • As far as I have tested, structured output does not currently accept the Image type.

@sawa3030 sawa3030 force-pushed the add/structured-output3 branch from a1abca0 to e001fd5 Compare September 16, 2025 08:23
@sawa3030 sawa3030 marked this pull request as ready for review September 16, 2025 08:25
@toshihikoyanase
Copy link
Member

To keep the implementation simple, I introduced two new return types, StudyResponse and TrialResponse, and replaced some existing return values with these types, while leaving others as simple string returns.

According to the official Python SDK document, defining return types in Pydantic models is supported. This is also demonstrated in the example code, so using this approach seems reasonable.

An alternative method mentioned is using dataclasses, as the existing TrialToAdd class is implemented as a dataclass. However, considering factors such as code cleanliness (e.g., whether to place descriptions in docstrings or Field descriptions) and the fact that this approach doesn't introduce additional dependencies, using Pydantic models appears to be the better choice. Converting TrialToAdd to a Pydantic model might be a follow-up to this PR.

Regarding the change in return values, since optuna-mcp is still in alpha version, such changes are acceptable. I will make sure to verify if existing examples continue to work as expected.

As far as I have tested, structured output does not currently accept the Image type.

According to the specification 2025-06-18, the image content is distinguished from structured content. So, the current signature for the plot_ function can be maintained.

@sawa3030
Copy link
Contributor Author

Thank you for the discussions. Everything should now be updated. PTAL.

@sawa3030
Copy link
Contributor Author

I’ve also tested this PR locally with some existing examples using Copilot (Optimizing the 2D-Sphere function, Optuna dashboard, Cookie Recipe, and Matplotlib Configuration). It would be greatly appreciated if the PR could be verified it in a different environment as well.

Copy link
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your update and confirmation using Copilot.
I also checked the behavior using Claude Desktop with some examples such as sphere2d, cookie recipes, and they worked as expected.

I'd like to discuss error handling methods and the types of return values. Please take a look.

Comment on lines 123 to 125
return CallToolResult(
content=[TextContent(type="text", text="No storage specified.")], isError=True
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using CallToolResult to return error information to MCP clients, we may be able to use the MCP SDK's error-handling feature.
For instance, the following PR adds implementation that catches mcp.McpError and returns them as responses:
modelcontextprotocol/python-sdk#62

Based on this approach, the code will be like this:

Suggested change
return CallToolResult(
content=[TextContent(type="text", text="No storage specified.")], isError=True
)
raise McpError(ErrorData(code=INTERNAL_ERROR, message="No storage specified."))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of return values will be simplified from list[StudyResponse] | CallToolResult to list[StudyResponse].

This will result in reducing the OutputSchema returned to the client significantly.

This PR(569 lines): get_all_study_name_output_schema_call_tool_result.json
McpError(66 lines): get_all_study_name_output_schema_mcp_error.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed investigation. I have some concerns regarding the intended usage of MCPError. Based on my understanding, MCPError is primarily designed for protocol-level errors (as suggested here), and may not be appropriate in this context. Since using CallToolResult for the error makes the return type ambiguous, I started to think that simply raising a RuntimeError would be sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion. I checked the current implementation of the Python SDK regarding the RuntimeError handling.
https://github.com/modelcontextprotocol/python-sdk/blob/71889d7387f070cd872cab7c9aa3d1ff1fa5a5d2/src/mcp/server/lowlevel/server.py#L687-L704

It seems to handle exceptions other than McpError by automatically returning an ErrorData response as follows:

response = types.ErrorData(code=0, message=str(err), data=None)
await message.respond(response)

In this case, error code 0 is used. However, the JSON-RPC 2.0 specification does not assign code 0; it is considered an unassigned value.

When catching a McpError, the ErrorData object contained within it is returned directly. For example:

err = McpError(ErrorData(code=INTERNAL_ERROR, message="No storage specified."))
response = err.error
await message.respond(response)

So, McpError allows servers to return specific error codes, making it a more client-friendly implementation.

Regarding the documentation in your message, it seems to mention JSON-RPC Error Codes, and InternalError (-32603) appears to be a protocol-specified internal error.

The following server implementation also uses the McpError exception:
https://dev.to/yigit-konur/error-handling-in-mcp-typescript-sdk-2ol7#server-error-handling

      throw new McpError(
        ErrorCode.InternalError,
        `Error validating elicitation response: ${error}`,
      );

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the clarification. I understand now and will proceed with the necessary revisions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When revising the tests, I noticed FastMCP wraps all errors in ToolError (ref), so in Optuna-MCP usage, there seems to be no practical difference between a McpError and a regular exception.
Still, assuming FastMCP may expose error codes in the future, I’ve implemented errors with McpError. I’d appreciate your thoughts. PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When revising the tests, I noticed FastMCP wraps all errors in ToolError (ref), so in Optuna-MCP usage, there seems to be no practical difference between a McpError and a regular exception.

Thank you for your detailed investigation. I didn't realize that FastMCP included such additional error handling.

Still, assuming FastMCP may expose error codes in the future, I’ve implemented errors with McpError.

I agree. While FastMCP's error handling may currently have some potential issues (e.g., modelcontextprotocol/python-sdk#698), I believe these will be addressed in future improvements. We can revisit the error handling design after some updates to the official SDK.

Copy link
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your update. I found the current PR might have several items that likely require design discussion.
I don't think we need to resolve them in this PR, so could you add your feedback to the comments, please?

class StudyResponse(BaseModel):
study_name: str
sampler_name: (
typing.Literal["TPESampler", "NSGAIISampler", "RandomSampler", "GPSampler"] | None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notes] A nit, but supported samplers are listed in the definition of set_sampler as well. We may define SamplerName=typing.Literal["TPESampler", "NSGAIISampler", "RandomSampler", "GPSampler"] to prevent the inconsistency.

But we can work on it in a follow-up PR.

sampler_name: (
typing.Literal["TPESampler", "NSGAIISampler", "RandomSampler", "GPSampler"] | None
) = Field(default=None, description="The name of the sampler used in the study.")
directions: list[str] | None = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type of the directions attribute is not consistent with the argument of create_study.

Suggested change
directions: list[str] | None = Field(
directions: list[typing.Literal["minimize", "maximize"]] | None = Field(

)

@mcp.tool()
@mcp.tool(structured_output=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notes] Since metric names correspond to the study's directions, we might be able to include the response in StudyResponse as follows:

StudyResponse:
    ...
    metric_names: | None = Field(default=None, description="...")

But this new approach may need further discussion, so we can work on it in a follow-up PR.

return f"metric_names set to {json.dumps(metric_names)}"

@mcp.tool()
@mcp.tool(structured_output=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

)

@mcp.tool()
@mcp.tool(structured_output=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative approach is to return the list of TrialResponse.
So, how about keeping the structured_output=False for future updates?

Suggested change
@mcp.tool(structured_output=True)
@mcp.tool()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of MCP sets the default value of structured_output to None, which automatically chose the return type (either structured or unstructured). In the case of get_trials, structured output is returned if structured_output is not specified.

To preserve the current behavior of main for future updates, I explicitly set structured_output=False. This also simplifies the test cases, as mentioned in this discussion.

Comment on lines +351 to +357
TrialResponse(
trial_number=trial.number,
params=trial.params,
values=trial.values,
user_attrs=trial.user_attrs,
system_attrs=trial.system_attrs,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I found that the proposed best_trial returns more information, such as user_attrs and system_attrs, than the previous implementation, while the proposed best_trials returns less information, such as distributions, datetime_start, datetime_end, and intermediate_values.

This PR did not introduce this gap between best_trial and best_trials, and we can address this issue in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing that out. At the moment, I don’t have a strong preference regarding what should be returned in best_trial and best_trials. This issue will be addressed in a follow-up PR.

return Image(data=plotly.io.to_image(fig), format="png")

@mcp.tool()
@mcp.tool(structured_output=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we can define the response type for the optuna dashboard, such as:

class OptunaDashboardResponse(BaseModel):
    url: str = Field(description="The URL of the Optuna dashboard.")

But this lacks information on whether the server has been newly started or not, so we can discuss this approach in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining a response type for the Optuna dashboard sounds like a good idea. This will also be addressed in a follow-up PR.

Comment on lines +97 to +99
assert len(result) == 2
assert isinstance(result, Sequence)
assert isinstance(result[0], list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notes] These lines for the type checking are necessary but verbose. Perhaps we could explore ways to simplify the test cases in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with your point. But to keep this PR as focused and simple as possible, this issue will be addressed in a follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That makes sense.

Comment on lines 216 to 219
for user_attrs in (user_attrs_from_text, user_attrs_from_dict):
r = json.loads(":".join(user_attrs.split(":")[1:]).strip())
assert isinstance(r, list)
assert r == metric_names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I felt the output data was not structured when I saw these lines for the result parsing. Not a strong opinion, but we might include the metric names in StudyResponse as I mentioned above.

Comment on lines 261 to 268
lines_from_text = result[0][0].text.strip().split("\n")
lines_from_dict = result[1]["result"].strip().split("\n")

for lines in (lines_from_text, lines_from_dict):
assert len(lines) == 3
assert lines[0] == ("Trials: ")
assert lines[1].startswith(",number")
assert lines[2].startswith("0,0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling the structured_output at mcp.tool might make the output more complicated than the current main.

@sawa3030 sawa3030 force-pushed the add/structured-output3 branch from a35a6d6 to 6518c7b Compare October 2, 2025 03:15
@sawa3030
Copy link
Contributor Author

sawa3030 commented Oct 2, 2025

Thank you for all the comments and suggestions. Some points have been left as they are and will be discussed in follow-up PRs, while others have been updated in this PR. PTAL.

Copy link
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your update. I confirmed that the structured output worked as expected. We found some unresolved issues during the review process, so we'll work on them in follow-up PRs. LGTM!

@toshihikoyanase toshihikoyanase merged commit b7f097f into optuna:main Oct 2, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants