Currently, schema generation for different LLM providers requires models to inherit from OpenAISchema or be wrapped with the @openai_schema decorator. This creates an unnecessary inheritance requirement and couples schema generation to class-based patterns.
We should refactor the schema generation logic into standalone, provider-agnostic functions.
Current usage pattern: response_model.openai_schema (where response_model inherits from OpenAISchema)
Affected files with usage counts:
instructor/utils/(12 calls across cerebras.py, writer.py, fireworks.py, openai.py, mistral.py)instructor/process_response.py(11 calls)instructor/dsl/parallel.py(3 calls - handles parallel tools)instructor/distil.py(1 call)instructor/function_calls.py(13 calls - method definitions and internal usage)instructor/utils/core.py(1 call - decorator application)instructor/utils/anthropic.py(1 call - anthropic_schema)instructor/utils/google.py(1 call - gemini_schema)- Examples and tests (20+ calls)
Total: ~60 usages across codebase
from __future__ import annotations
import functools
from typing import Any, Type
from docstring_parser import parse
from pydantic import BaseModel
@functools.lru_cache(maxsize=256)
def generate_openai_schema(model: Type[BaseModel]) -> dict[str, Any]:
"""Generate OpenAI function schema from Pydantic model."""
# Move logic from OpenAISchema.openai_schema here
def generate_anthropic_schema(model: Type[BaseModel]) -> dict[str, Any]:
"""Generate Anthropic tool schema from Pydantic model."""
# Move logic from OpenAISchema.anthropic_schema here
def generate_gemini_schema(model: Type[BaseModel]) -> Any:
"""Generate Gemini function schema from Pydantic model."""
# Move logic from OpenAISchema.gemini_schema hereclass OpenAISchema(BaseModel):
@classproperty
def openai_schema(cls):
return generate_openai_schema(cls)
@classproperty
def anthropic_schema(cls):
return generate_anthropic_schema(cls)
@classproperty
def gemini_schema(cls):
return generate_gemini_schema(cls)Phase 1: Add new functions, maintain backward compatibility
- All existing
response_model.openai_schemacalls continue working - New code can use
generate_openai_schema(response_model)directly
Phase 2: Internal migration
- Replace internal usage in utils/ and process_response.py
- Update parallel tools handling in dsl/parallel.py
Phase 3: Deprecation
- Mark
@openai_schemadecorator as deprecated - Encourage users to migrate to standalone functions
- No inheritance requirement - Any Pydantic model can generate schemas
- Provider-agnostic - Clean separation of schema generation logic
- Better testability - Functions are easier to unit test
- Performance - LRU cache maintains current performance characteristics
- Backward compatibility - Zero breaking changes during transition
- Cleaner API - More functional approach vs class-based inheritance
- Create
instructor/schema_utils.pywith standalone functions - Update
OpenAISchemaclass to delegate to new functions - Add comprehensive tests comparing old vs new output
- Update internal usage in utils/ (12 locations)
- Update process_response.py (11 locations)
- Update parallel tools handling in dsl/parallel.py
- Update distil.py usage
- Mark decorator as deprecated with warning
- Update documentation and examples
- Run full test suite to ensure no regressions
- Parallel tools:
dsl/parallel.pyuses bothopenai_schema(model).openai_schemaandopenai_schema(model).anthropic_schemapatterns - Caching: Current
@classpropertyprovides implicit memoization - maintain with@lru_cache - Error handling: Preserve current validation and error behavior
- Provider compatibility: Ensure schema output remains identical for all providers
This refactoring will modernize the schema generation approach while maintaining full backward compatibility.