GH-5884: Reuse JsonSchemaGenerator in BeanOutputConverter#5897
GH-5884: Reuse JsonSchemaGenerator in BeanOutputConverter#5897ChoMinGi wants to merge 4 commits intospring-projects:mainfrom
Conversation
Signed-off-by: Mingi Cho <81455273+ChoMinGi@users.noreply.github.com>
|
@ThomasVitale Could your please review this PR? |
|
I'll have a look later today, thanks |
| configBuilder.with(new KotlinModule()); | ||
| String schemaString = JsonSchemaGenerator.generateForType(this.type); | ||
| JsonNode jsonNode; | ||
| try { |
There was a problem hiding this comment.
I'm a bit unsure about this part. I understand it's meant to maintain support for postProcessSchema(). I'm thinking if we should take this opportunity for the 2.0.0 release to remove the postProcessSchema() entirely. Until now it was necessary because the entire JSON Schema generation logic was included in this class and we needed a way to make it customizable that wouldn't require re-inventing the whole logic. And I know it's been used to work around some of the issues in the logic (e.g. the cumbersome handling of required fields or missing support for annotations) which this PR solved once and for all.
Now the whole logic is one call to JsonSchemaGenerator.generateForType(this.type). We could remove the postProcessSchema() method, make the BeanOutputConverter.generateSchema() method protected, and identify that one as the extension point to customize the JSON Schema generation logic. That would make this even more flexible since developers would still have the option to apply post-processing logic, but they would also have the option to completely change the generation logic if they wanted to.
My main concern with the suggested solution is the extra Jackson deserialisation+serialization operations that are always performed, even though there's no post-processing logic defined (which is probably most of the times). Jackson operations are not cheap. With the additional risk of exceptions being thrown performing an operation that is often not needed.
There was a problem hiding this comment.
@ThomasVitale
Thanks for raising this. I agree with that direction.
I kept the deserialize/post-process/serialize flow only to preserve the existing postProcessSchema() extension point.
But since this is for 2.0.0 and schema generation is now delegated to JsonSchemaGenerator.generateForType(this.type), removing postProcessSchema() and making generateSchema() the protected extension point seems cleaner and more flexible.
That keeps the same construction-time extension lifecycle as today, while avoiding the extra Jackson roundtrip in the default path. Subclasses can still post-process the generated schema by calling super.generateSchema(), or replace the generation logic entirely.
I'll update the PR accordingly and add the migration note under a new Upgrading to 2.0.0-M6 section in upgrade-notes.adoc.
There was a problem hiding this comment.
Updated in the latest push.
postProcessSchema() has been removed, generateSchema() is now protected, and BeanOutputConverter now uses the JsonSchemaGenerator.generateForType(this.type) output directly without the Jackson deserialize/serialize roundtrip.
I also updated the affected schema format assertions, removed the obsolete line separator tests, and added the 2.0.0-M6 upgrade notes for the required-property behavior changes and the migration from postProcessSchema() to generateSchema().
Tests: BeanOutputConverterTest, JsonSchemaGeneratorTests, and KotlinBeanOutputConverterTests pass locally.
|
@ChoMinGi thanks so much for the PR. About your open question. The
The logic handling that is in the I think it's correct to apply this same logic to structured output conversion.
So I think that's an acceptable change as it's actually about fixing existing bugs in the Still, we're changing the behaviour, so it's important to make people aware of it. In the context of this PR, I would suggest mentioning the two behaviour differences in the upgrades notes for the upcoming release here in a new "Upgrading to 2.0.0-M6" section. And then I would perhaps suggest creating a separate task for reviewing and consolidating the documentation about JSON Schema generation between Tool Calling and Structured Output Conversion. |
…hema() as extension point Signed-off-by: Mingi Cho <81455273+ChoMinGi@users.noreply.github.com>
…nges Signed-off-by: Mingi Cho <81455273+ChoMinGi@users.noreply.github.com>
Signed-off-by: Mingi Cho <81455273+ChoMinGi@users.noreply.github.com>
|
@ThomasVitale I agree the JSON Schema documentation consolidation between Tool Calling and Structured Output Conversion is better handled as a follow-up. I can open a separate issue for that. |
|
Follow-up issue created: #5915 |
|
This appears to introduce a behavioral change in how Previously, With the current delegation to This is visible in the updated test expectations where I think this may effectively relax schema strictness for nested objects, which could impact
If this change is intentional, it would be helpful to document it in the upgrade notes |
Fixes #5884
Replace
BeanOutputConverter's internal victools schema generation withJsonSchemaGenerator.generateForType()delegation to ensure consistent JSON Schema behavior between tool calling and structured outputs.RESPECT_JSONPROPERTY_ORDERand conditionalKotlinModuletoJsonSchemaGeneratorBeanOutputConverterpostProcessSchema(JsonNode)extension pointprotected generateSchema()method as the new extension point — subclasses can post-process by callingsuper.generateSchema()or replace the generation logic entirelyJsonSchemaGenerator(Jackson's default pretty printer)Behavior changes:
@JsonPropertywithout explicitrequired = trueno longer forced intorequiredarrayString?) no longer forced intorequiredThis also addresses the root cause behind #5797 and #5341, as both stem from
BeanOutputConvertermaintaining a separate schema generation setup.Open question(resolved)
requiredbehavior changes for fields annotated with@JsonProperty(without explicitrequired = true). Is this acceptable, or should backward compatibility be preserved?