Skip to content

feat(v2): PR 4/7 - GenAI, Cohere, and Mistral providers#2021

Closed
jxnl wants to merge 2 commits intofeat/v2-pr3-anthropic-openaifrom
feat/v2-pr4-genai-cohere-mistral
Closed

feat(v2): PR 4/7 - GenAI, Cohere, and Mistral providers#2021
jxnl wants to merge 2 commits intofeat/v2-pr3-anthropic-openaifrom
feat/v2-pr4-genai-cohere-mistral

Conversation

@jxnl
Copy link
Collaborator

@jxnl jxnl commented Jan 18, 2026

Description

This is PR 4 of 7 in the stacked PR series implementing Instructor V2 registry-based architecture.

Base: PR 3 (feat/v2-pr3-anthropic-openai)

Providers Added

GenAI (instructor/v2/providers/genai/)

  • Supported Modes: TOOLS, JSON
  • Special Features: Handles safety_settings correctly, supports cached content

Cohere (instructor/v2/providers/cohere/)

  • Supported Modes: TOOLS, JSON_SCHEMA, MD_JSON
  • Legacy Mode Mapping: COHERE_TOOLS -> TOOLS, COHERE_JSON_SCHEMA -> JSON_SCHEMA

Mistral (instructor/v2/providers/mistral/)

  • Supported Modes: TOOLS, JSON_SCHEMA, MD_JSON
  • Legacy Mode Mapping: MISTRAL_TOOLS -> TOOLS, MISTRAL_STRUCTURED_OUTPUTS -> JSON_SCHEMA
  • Special Features: Handles dict or string tool call arguments

Changes

  • Add instructor/v2/providers/genai/ (3 files)
  • Add instructor/v2/providers/cohere/ (3 files)
  • Add instructor/v2/providers/mistral/ (3 files)
  • Update instructor/v2/__init__.py with provider exports
  • Add provider-specific test files

Testing

uv run pytest tests/v2/test_provider_modes.py::test_mode_is_registered -v

Stacked PRs

  • PR 1: Core infrastructure
  • PR 2: Mode normalization docs + tests
  • PR 3: Anthropic + OpenAI providers
  • PR 4 (this): GenAI + Cohere + Mistral providers
  • PR 5: Remaining providers
  • PR 6: Unified test infrastructure
  • PR 7: Documentation + cleanup

This PR was written by Cursor


Note

Adds three new providers to the v2 registry architecture with client factories and mode handlers, plus exports and tests.

  • Providers: Implement from_genai, from_cohere, from_mistral with sync/async wrappers, streaming support, and default model injection where applicable; update instructor/v2/__init__.py exports
  • Mode handlers:
    • GenAI: TOOLS, JSON with system/config mapping, multimodal content conversion, streaming Partial support, and legacy mode mapping
    • Cohere: TOOLS, JSON_SCHEMA, MD_JSON with V1 (chat_history/message) and V2 (messages) format handling, response parsing, and re-ask flows
    • Mistral: TOOLS, JSON_SCHEMA, MD_JSON adapting to chat.complete/stream, tool_choice="any", schema/markdown parsing, and dict/string tool args
  • Compatibility: Normalize legacy provider-specific modes to generic modes via registry checks
  • Tests: Add focused unit/integration tests for handlers and client factories (including SDK-absent paths)

Written by Cursor Bugbot for commit d0469c0. Configure here.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 18, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
instructor bf8f011 Commit Preview URL

Branch Preview URL
Jan 18 2026, 06:55 PM

@github-actions github-actions bot added dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files. labels Jan 18, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

)
+ "\n"
)
# endregion agent log
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug logging code writes to hardcoded local path

High Severity

The from_cohere function contains debug logging code that writes to a hardcoded local path /Users/jasonliu/dev/instructor/.cursor/debug.log. This code attempts to open a file on every call to create a Cohere client. On any machine without this exact directory structure, this will raise a FileNotFoundError, causing the entire from_cohere function to fail. The code block is marked with # region agent log comments indicating it was temporary debugging code that was accidentally included in the commit.

Fix in Cursor Fix in Web

# Add instruction based on client version
if "messages" in new_kwargs:
# V2 format: prepend to messages
new_kwargs["messages"].insert(0, {"role": "user", "content": instruction})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shallow copy causes mutation of caller's kwargs messages

Medium Severity

The handlers use kwargs.copy() (shallow copy) then mutate nested objects like kwargs["messages"].insert(), append(), or extend(). Since shallow copy shares references to nested objects, these mutations affect the original kwargs passed by the caller. The test file acknowledges this with the comment "messages list is modified in place though". If callers reuse kwargs across multiple requests, instructions and messages will accumulate unexpectedly.

Additional Locations (2)

Fix in Cursor Fix in Web

elif messages and isinstance(messages[0]["content"], str):
messages[0]["content"] += f"\n\n{message}"
elif messages and isinstance(messages[0]["content"], list):
messages[0]["content"][0]["text"] += f"\n\n{message}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing empty list check causes IndexError on multimodal content

Low Severity

In MistralMDJSONHandler.prepare_request, the code checks isinstance(messages[0]["content"], list) but doesn't verify the list is non-empty before accessing messages[0]["content"][0]["text"]. If a message has an empty content list (valid in multimodal formats), this raises an IndexError when trying to access index 0 of an empty list.

Fix in Cursor Fix in Web

Copy link
Collaborator Author

jxnl commented Jan 18, 2026

- Add instructor/v2/providers/genai/ with handlers for TOOLS, JSON modes
- Add instructor/v2/providers/cohere/ with handlers for TOOLS, JSON_SCHEMA, MD_JSON modes
- Add instructor/v2/providers/mistral/ with handlers for TOOLS, JSON_SCHEMA, MD_JSON modes
- Update instructor/v2/__init__.py with from_genai, from_cohere, from_mistral exports
- Add tests/v2/test_genai_integration.py
- Add tests/v2/test_cohere_handlers.py
- Add tests/v2/test_mistral_client.py and test_mistral_handlers.py

This PR was written by [Cursor](https://cursor.com)
@jxnl jxnl force-pushed the feat/v2-pr4-genai-cohere-mistral branch from df648e9 to 9e715eb Compare January 18, 2026 18:50
- Remove debug logging in Cohere client
- Fix shallow copy mutation in Cohere handlers (copy messages list)
- Add empty list check in Mistral MD_JSON handler
@jxnl jxnl closed this Jan 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant