[Bug]: SchemaLLMPathExtractor generates incompatible JSON schemas for GPT and Gemini APIs

### Bug Description


## Summary
`SchemaLLMPathExtractor` generates Pydantic schemas with `"additionalProperties": true` in nested `anyOf` schemas. This causes errors when using structured output with:
- **GPT-5-mini-2025-08-07**: `BadRequestError` - requires `additionalProperties: false`
- **Gemini-3-flash-preview**: `ValueError` - doesn't support `additionalProperties` at all

## Environment
- **Python**: 3.12.12
- **Platform**: macOS (Darwin 25.0.0)
- **UV**: 0.9.2
- **LlamaIndex Core**: 0.14.13
- **Pydantic**: 2.11.7
- **Pydantic Core**: 2.33.2
- **OpenAI**: 1.109.1
- **Google GenAI**: 1.59.0
- **LlamaIndex LLM Integrations**:
  - llama-index-llms-openai: 0.6.13
  - llama-index-llms-google-genai: 0.8.4

## Reproduction Steps

### 1. Create a SchemaLLMPathExtractor with Literal types
```python
from typing import Literal
from llama_index.core.indices.property_graph.transformations.schema_llm import SchemaLLMPathExtractor
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-5-mini-2025-08-07", temperature=0.0)

entities = Literal["PERSON", "ORG", "LOCATION"]
relations = Literal["FOUNDED", "LOCATED_IN", "MANUFACTURES"]
entity_props = ["description"]

validation_schema = [
    ("PERSON", "FOUNDED", "ORG"),
    ("PERSON", "LOCATED_IN", "LOCATION"),
    ("ORG", "LOCATED_IN", "LOCATION"),
]

extractor = SchemaLLMPathExtractor(
    llm=llm,
    possible_entities=entities,
    possible_relations=relations,
    possible_entity_props=entity_props,
    kg_validation_schema=validation_schema,
    strict=True
)
```

### 2. Inspect the generated schema
```python
schema_cls = extractor.kg_schema_cls
json_schema = schema_cls.model_json_schema()

# Find additionalProperties in nested schemas
import json
print(json.dumps(json_schema, indent=2))
```

### 3. Observe the problematic schema structure
```json
{
  "$defs": {
    "Entity": {
      "properties": {
        "properties": {
          "anyOf": [
            {
              "additionalProperties": true,
              "type": "object"
            },
            {
              "type": "null"
            }
          ]
        }
      }
    }
  }
}
```

### 4. Attempt extraction with GPT-5-mini
```python
from llama_index.core.schema import TextNode

node = TextNode(text="Elon Musk founded SpaceX in 2002.")
results = await extractor.acall([node])  # ← Fails with BadRequestError
```

**Error Output (GPT-5-mini)**:
```
BadRequestError: Error code: 400 - {'error': {'message': "Invalid schema for response_format 'KGSchema': In context=('properties', 'properties', 'anyOf', '0'), 'additionalProperties' is required to be supplied and to be false.", 'type': 'invalid_request_error', 'param': 'response_format'}}
```

**Error Output (Gemini-3-flash-preview)**:
```
ValueError: additionalProperties is not supported in the Gemini API.
```

## Root Cause Analysis

The issue occurs in `SchemaLLMPathExtractor.__init__()` at the schema generation stage:

**File**: `llama_index/core/indices/property_graph/transformations/schema_llm.py`
**Line**: ~74 (schema creation with `create_model()`)

When creating the schema for optional entity properties:
```python
entity_cls = create_model(
    "Entity",
    type=(...),
    name=(...),
    properties=(
        Optional[Dict[str, Any]],  # ← This causes the problem
        Field(...)
    ),
)
```

Pydantic automatically generates an `anyOf` schema for the optional dict:
```json
"properties": {
  "anyOf": [
    { "additionalProperties": true, "type": "object" },
    { "type": "null" }
  ]
}
```

The value `true` is incompatible with both OpenAI's structured output requirements and Gemini's API constraints.

## Current Behavior
- ✅ Works with default Gemini model (no structured output enforcement)
- ❌ Fails with GPT-5-mini (BadRequestError)
- ❌ Fails with Gemini-3-flash-preview (ValueError)
- ❌ Works with older models but causes compatibility warnings

## Expected Behavior
The generated schema should have `"additionalProperties": false` in all nested `anyOf` object schemas to satisfy both API requirements:
- **GPT**: Explicitly requires `false`
- **Gemini**: Accepts `false` as a valid constraint

## Proposed Solution

Add a `ConfigDict` parameter to `create_model()` calls in `SchemaLLMPathExtractor.__init__()` to post-process the schema:

```python
from pydantic import ConfigDict

def clean_schema(schema, info=None):
    """Clean additionalProperties in nested anyOf schemas for API compatibility."""
    def fix_props(obj):
        if isinstance(obj, dict):
            if 'anyOf' in obj:
                for alt in obj['anyOf']:
                    if isinstance(alt, dict) and alt.get('type') == 'object':
                        alt['additionalProperties'] = False
            for value in obj.values():
                fix_props(value)
        elif isinstance(obj, list):
            for item in obj:
                fix_props(item)

    fix_props(schema)
    return schema

# When creating models, pass json_schema_extra:
entity_cls = create_model(
    "Entity",
    type=(...),
    name=(...),
    properties=(...),
    __config__=ConfigDict(json_schema_extra=clean_schema)
)
```

**OR** post-process the schema in a cleaner way by patching `model_json_schema()` after creation.

## Impact
- **Severity**: High - Blocks usage with latest GPT and Gemini models
- **Affected Users**: Anyone using `SchemaLLMPathExtractor` with structured output APIs
- **Workaround**: Override in subclass (see implementation below)

## Workaround (Temporary)

Until this is fixed in LlamaIndex, override `SchemaLLMPathExtractor` in your code:

```python
from llama_index.core.indices.property_graph.transformations.schema_llm import SchemaLLMPathExtractor

def _clean_schema_for_apis(schema, info=None):
    """Fix additionalProperties for API compatibility."""
    def fix_props(obj):
        if isinstance(obj, dict):
            if 'anyOf' in obj:
                for alt in obj['anyOf']:
                    if isinstance(alt, dict) and alt.get('type') == 'object':
                        alt['additionalProperties'] = False
            for value in obj.values():
                fix_props(value)
        elif isinstance(obj, list):
            for item in obj:
                fix_props(item)
    fix_props(schema)
    return schema

class FixedSchemaLLMPathExtractor(SchemaLLMPathExtractor):
    """SchemaLLMPathExtractor with API compatibility fix."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        # Patch the schema class to clean additionalProperties
        schema_cls = self.kg_schema_cls
        original_method = schema_cls.model_json_schema

        def patched_model_json_schema(*a, **kw):
            schema = original_method(*a, **kw)
            _clean_schema_for_apis(schema)
            return schema

        schema_cls.model_json_schema = patched_model_json_schema
```

## Test Case

Create a test file demonstrating the issue and fix:

```python
import asyncio
from typing import Literal
from llama_index.core.indices.property_graph.transformations.schema_llm import SchemaLLMPathExtractor
from llama_index.core.schema import TextNode
from llama_index.llms.openai import OpenAI

async def test_schema_compatibility():
    llm = OpenAI(model="gpt-5-mini-2025-08-07", temperature=0.0)

    extractor = SchemaLLMPathExtractor(
        llm=llm,
        possible_entities=Literal["PERSON", "ORG"],
        possible_relations=Literal["FOUNDED"],
        possible_entity_props=["description"],
        kg_validation_schema=[("PERSON", "FOUNDED", "ORG")],
        strict=True
    )

    node = TextNode(text="Elon Musk founded SpaceX.")

    # This should not raise an error
    results = await extractor.acall([node])
    assert len(results) > 0

if __name__ == "__main__":
    asyncio.run(test_schema_compatibility())
```

## Additional Context

### Schema Analysis Output
```
Root cause location: $defs.Entity.properties.properties.anyOf[0]
Problem: additionalProperties = true
Requirement (GPT-5-mini): additionalProperties must be false
Requirement (Gemini): additionalProperties must not exist
Solution: Set additionalProperties = false in all nested object schemas
```

### Related Issues
- Affects both `SchemaLLMPathExtractor` and potentially other schema generation in llama-index
- Similar issues may occur with other optional Dict/object fields

## Requested Action
Please implement the proposed solution to ensure `SchemaLLMPathExtractor` generates schemas compatible with:
1. OpenAI's GPT-5-mini structured output requirements
2. Google Gemini's API constraints
3. All other LLM providers using structured output

---

**Created**: 2025-02-05
**Environment**: xcert project
**Reproduction**: Confirmed with llama-index-core 0.14.13


### Version

0.14.13

### Steps to Reproduce

.

### Relevant Logs/Tracbacks

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: SchemaLLMPathExtractor generates incompatible JSON schemas for GPT and Gemini APIs #20629

Bug Description

Summary

Environment

Reproduction Steps

1. Create a SchemaLLMPathExtractor with Literal types

2. Inspect the generated schema

3. Observe the problematic schema structure

4. Attempt extraction with GPT-5-mini

Root Cause Analysis

Current Behavior

Expected Behavior

Proposed Solution

Impact

Workaround (Temporary)

Test Case

Additional Context

Schema Analysis Output

Related Issues

Requested Action

Version

Steps to Reproduce

Relevant Logs/Tracbacks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: SchemaLLMPathExtractor generates incompatible JSON schemas for GPT and Gemini APIs #20629

Description

Bug Description

Summary

Environment

Reproduction Steps

1. Create a SchemaLLMPathExtractor with Literal types

2. Inspect the generated schema

3. Observe the problematic schema structure

4. Attempt extraction with GPT-5-mini

Root Cause Analysis

Current Behavior

Expected Behavior

Proposed Solution

Impact

Workaround (Temporary)

Test Case

Additional Context

Schema Analysis Output

Related Issues

Requested Action

Version

Steps to Reproduce

Relevant Logs/Tracbacks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions