Skip to content

feat(v2): PR 1/7 - Core infrastructure with registry-based architecture#2018

Closed
jxnl wants to merge 2 commits intomainfrom
feat/v2-pr1-core-infrastructure
Closed

feat(v2): PR 1/7 - Core infrastructure with registry-based architecture#2018
jxnl wants to merge 2 commits intomainfrom
feat/v2-pr1-core-infrastructure

Conversation

@jxnl
Copy link
Collaborator

@jxnl jxnl commented Jan 18, 2026

Description

This is PR 1 of 7 in the stacked PR series implementing Instructor V2 registry-based architecture.

Core Infrastructure

This PR introduces the foundational v2 infrastructure:

ModeRegistry (instructor/v2/core/registry.py)

  • O(1) handler lookups via (Provider, Mode) tuples
  • Lazy loading: handlers register on import via decorators
  • Queryable API: discover available modes, list providers for a mode
  • Mode normalization: converts provider-specific modes to generic modes

Handler System (instructor/v2/core/handler.py, protocols.py)

  • ModeHandler abstract base class
  • Type-safe protocol interfaces: RequestHandler, ResponseParser, ReaskHandler
  • @register_mode_handler decorator for registration

Patch Mechanism (instructor/v2/core/patch.py)

  • Unified patch_v2() function
  • Validates mode registration at patch time (fail fast)
  • Auto-detects sync/async
  • Integrates with registry handlers

Retry Logic (instructor/v2/core/retry.py)

  • Registry-based retry system
  • Uses handler's handle_reask() for retry preparation
  • Preserves full attempt context

Exception Handling (instructor/v2/core/exceptions.py)

  • RegistryError: Mode not registered or handler lookup failure
  • ValidationContextError: Conflicting context/validation_context parameters
  • InstructorRetryException: Max retries exceeded

Mode System (instructor/mode.py)

  • DEPRECATED_TO_CORE mapping for legacy mode normalization
  • Deprecation warnings for provider-specific modes

Changes

  • Add instructor/v2/core/ with 7 new files
  • Add instructor/v2/__init__.py with core exports
  • Update instructor/mode.py with deprecation mapping
  • Add tests/v2/test_registry.py and conftest.py

Testing

uv run pytest tests/v2/test_registry.py -v

Stacked PRs

  • PR 1 (this): Core infrastructure
  • PR 2: Mode normalization docs + tests
  • PR 3: Anthropic + OpenAI providers
  • PR 4: GenAI + Cohere + Mistral providers
  • PR 5: Remaining providers
  • PR 6: Unified test infrastructure
  • PR 7: Documentation + cleanup

This PR was written by Cursor


Note

Establishes the foundational v2 architecture centered on a queryable, lazy-loaded mode registry and handler contracts.

  • Mode registry: Adds ModeRegistry with O(1) (Provider, Mode) lookups, lazy registration on import, discovery APIs, and normalize_mode for legacy-to-core mapping
  • Handler system: Defines ModeHandler base class plus type-safe RequestHandler/ReaskHandler/ResponseParser protocols and a @register_mode_handler decorator
  • Patching: Adds patch_v2() with sync/async wrappers, default model injection, templating integration, and validated mode dispatch via registry
  • Retry: Implements v2 retry for sync/async using registry reask and response_parser, with streaming short-circuit for iterable/partial models
  • Exceptions: Centralizes registry/context validation errors and merges/deprecates validation_context in favor of context
  • Modes: Updates Mode with GENAI alias, adds single-shot deprecation warnings and DEPRECATED_TO_CORE mapping; exposes reset helpers
  • Tests: Adds registry tests and pytest config for provider/env-based skips

Written by Cursor Bugbot for commit 8b77563. Configure here.

@github-actions github-actions bot added enhancement New feature or request python Pull requests that update python code size:L This PR changes 100-499 lines, ignoring generated files. status:pending-merge Related PR is pending merge labels Jan 18, 2026
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 18, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
instructor 64b7a3e Commit Preview URL

Branch Preview URL
Jan 18 2026, 06:48 PM

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

)
+ "\n"
)
# endregion agent log
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug logging writes to hardcoded local file path

High Severity

Debug logging code writes to a hardcoded local path /Users/jasonliu/dev/instructor/.cursor/debug.log. This will cause FileNotFoundError on any machine where this directory doesn't exist, breaking all async retry operations. The blocks are marked with # region agent log comments indicating they were meant to be temporary debugging statements.

Additional Locations (1)

Fix in Cursor Fix in Web

GENAI_TOOLS = "genai_tools"
GENAI_STRUCTURED_OUTPUTS = "genai_structured_outputs"
GENAI_JSON = "genai_json"
GENAI_STRUCTURED_OUTPUTS = "genai_json" # Backwards compatibility alias
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum alias changes value breaking backwards compatibility

Medium Severity

GENAI_STRUCTURED_OUTPUTS and GENAI_JSON share the same value "genai_json", making GENAI_STRUCTURED_OUTPUTS an alias for GENAI_JSON in Python's enum. This changes Mode.GENAI_STRUCTURED_OUTPUTS.value from "genai_structured_outputs" to "genai_json" and Mode.GENAI_STRUCTURED_OUTPUTS.name returns "GENAI_JSON". Code serializing or comparing mode values will break.

Fix in Cursor Fix in Web

response_model=response_model,
validation_context=context,
strict=strict,
stream=False,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync retry ignores stream parameter causing parse mismatch

Medium Severity

The retry_sync_v2 function hardcodes stream=False when calling response_parser, ignoring any stream=True passed in kwargs. The async version correctly extracts stream = kwargs.get("stream", False) and passes it through. If a caller uses the sync wrapper with stream=True, the API will return a streaming response, but the parser will be told it's non-streaming, causing incorrect parsing behavior or failures.

Fix in Cursor Fix in Web

- Add ModeRegistry for O(1) handler lookups via (Provider, Mode) tuples
- Add ModeHandler base class and protocol interfaces
- Add patch_v2() function for unified provider patching
- Add registry-based retry logic with handler integration
- Add exception hierarchy (RegistryError, ValidationContextError)
- Add mode normalization with deprecation warnings
- Add @register_mode_handler decorator for handler registration
- Add registry unit tests

This PR was written by [Cursor](https://cursor.com)
@jxnl jxnl force-pushed the feat/v2-pr1-core-infrastructure branch from 8b77563 to d0a96e2 Compare January 18, 2026 18:41
- Remove debug logging blocks in retry.py that wrote to hardcoded local path
- Fix GENAI_STRUCTURED_OUTPUTS enum value to avoid alias collision
- Fix sync retry to extract stream parameter from kwargs like async version
@jxnl jxnl closed this Jan 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python Pull requests that update python code size:L This PR changes 100-499 lines, ignoring generated files. status:pending-merge Related PR is pending merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant