Skip to content

AIJsonUtilities.CreateJsonSchema does not support complex objects with nested references #113698

Open
@jamesmcroft

Description

@jamesmcroft

Description

When using the AIJsonUtilities.CreateJsonSchema method with OpenAI's ChatResponseFormat.CreateJsonSchemaFormat (with jsonSchemaIsStrict set to true), complex objects that contain nested objects and lists are not handled correctly. The generated JSON schema creates internal references that are not defined at the top level, which violates the requirements for the OpenAI schema.

Reproduction Steps

  1. Define the following C# models with nesting and lists:
    namespace ConsoleTests.Models;
    
    public class SimpleObject
    {
        public string? Name { get; set; }
        public List<SimpleItem>? Items { get; set; }
    }
    
    public class SimpleItem
    {
        public int Id { get; set; }
        public float? Value { get; set; }
        public SimpleItem SubItem { get; set; }
    }
  2. Call the ChatResponseFormat.CreateJsonSchemaFormat<T> method with the above model, ensuring jsonSchemaIsStrict is set to true:
    public static ChatResponseFormat CreateJsonSchemaFormat<T>(
            string jsonSchemaFormatName,
            string? jsonSchemaFormatDescription = null,
            bool? jsonSchemaIsStrict = null)
    {
        var formatObjectType = typeof(T);
        var type = formatObjectType.IsGenericType && formatObjectType.GetGenericTypeDefinition() == typeof(Nullable<>) ? Nullable.GetUnderlyingType(formatObjectType)! : formatObjectType;
    
        var jsonSchema = AIJsonUtilities.CreateJsonSchema(type, jsonSchemaFormatDescription, serializerOptions: JsonSerializerOptions.Default, inferenceOptions: new AIJsonSchemaCreateOptions()
        {
            IncludeSchemaKeyword = false,
            IncludeTypeInEnumSchemas = true,
            DisallowAdditionalProperties = true,
            RequireAllProperties = true
        }).ToString();
    
        return ChatResponseFormat.CreateJsonSchemaFormat(
            jsonSchemaFormatName,
            jsonSchema: BinaryData.FromString(jsonSchema),
            jsonSchemaFormatDescription: jsonSchemaFormatDescription,
            jsonSchemaIsStrict: jsonSchemaIsStrict
        );
    }
  3. Notice the generated JSON schema contains nested $ref pointers, e.g.:
    {
      "type": "object",
      "properties": {
        "Name": { "type": ["string", "null"] },
        "Items": {
          "type": ["array", "null"],
          "items": {
            "type": "object",
            "properties": {
              "Id": { "type": "integer" },
              "Value": { "type": ["number", "null"] },
              "SubItem": {
                "type": "object",
                "properties": {
                  "Id": { "type": "integer" },
                  "Value": { "type": ["number", "null"] },
                  "SubItem": {
                    "$ref": "#/properties/Items/items/properties/SubItem"
                  }
                },
                "additionalProperties": false,
                "required": ["Id", "Value", "SubItem"]
              }
            },
            "additionalProperties": false,
            "required": ["Id", "Value", "SubItem"]
          }
        }
      },
      "additionalProperties": false,
      "required": ["Name", "Items"]
    }
  4. Use the generated schema with the OpenAI SDK's ChatCompletionOptions. The following error is thrown:
    System.ClientModel.ClientResultException: 'HTTP 400 (invalid_request_error: )
    Parameter: response_format
    
    Invalid schema for response_format 'simpleObject': In context=('properties', 'Items', 'type', '0', 'items', 'properties', 'SubItem', 'properties', 'SubItem'), reference can only point to definitions defined at the top level of the schema.'
    

Expected behavior

The output JSON schema should define complex objects, such as SimpleItem, at the top level (using a $defs section) so that all $ref pointers are valid. For instance, the schema should be structured similar to:

{
    "type": "object",
    "$defs": {
        "SimpleItem": {
            "type": "object",
            "properties": {
                "Id": { "type": "integer" },
                "Value": {
                    "anyOf": [
                        { "type": "number" },
                        { "type": "null" }
                    ]
                },
                "SubItem": {
                    "$ref": "#/$defs/SimpleItem"
                }
            },
            "additionalProperties": false,
            "required": ["Id", "Value", "SubItem"]
        }
    },
    "properties": {
        "Name": {
            "anyOf": [
                { "type": "string" },
                { "type": "null" }
            ]
        },
        "Items": {
            "anyOf": [
                {
                    "items": { "$ref": "#/$defs/SimpleItem" },
                    "type": "array"
                },
                { "type": "null" }
            ]
        }
    },
    "additionalProperties": false,
    "required": ["Name", "Items"]
}

This format correctly places the definitions at the top level and adheres to the restrictions of the OpenAI SDK.

Actual behavior

The generated schema places nested $ref references within the property hierarchy rather than in a top-level definitions section. This leads to an invalid schema error when used with the ChatCompletionOptions in the OpenAI SDK, which only allows $ref pointers that reference definitions at the top level of the schema.

Regression?

I don't believe this has ever worked.

Known Workarounds

The workaround for this is challenging to implement, requiring developers to override the TransformSchemaNode to resolve the invalid JSON schema.

Configuration

.NET SDK:
Version: 9.0.200
Commit: 90e8b202f2
Workload version: 9.0.200-manifests.a3a1a094
MSBuild version: 17.13.8+cbc39bea8

Runtime Environment:
OS Name: Windows
OS Version: 10.0.26100
OS Platform: Windows
RID: win-x64
Base Path: C:\Program Files\dotnet\sdk\9.0.200\

I don't believe this is OS or .NET version specific.

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-System.Text.JsonenhancementProduct code improvement that does NOT require public API changes/additions

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions