Skip to content

Conversation

@actuallyrizzn
Copy link

@actuallyrizzn actuallyrizzn commented Nov 25, 2025

Venice AI Integration

Overview

This PR adds Venice AI as a new LLM provider to Letta, enabling users to use Venice AI models for chat completions, streaming, tool calling, and embeddings generation.

Changes

Core Implementation

  • letta/llm_api/venice_client.py (NEW): Venice AI LLM client implementation

    • Extends LLMClientBase with all required abstract methods
    • Supports synchronous and asynchronous requests
    • Implements streaming with Server-Sent Events (SSE) parsing
    • Handles embeddings generation
    • Comprehensive error handling with retry logic
    • Tool/function calling support
  • letta/llm_api/venice.py (NEW): Helper function for Venice API model listing

    • venice_get_model_list_async(): Queries Venice API for available models
  • letta/schemas/providers/venice.py (NEW): Venice provider implementation

    • VeniceProvider: Dynamically lists models from Venice API
    • Auto-registers when venice_api_key is configured

Configuration

  • letta/schemas/enums.py: Added venice to ProviderType enum
  • letta/llm_api/llm_client.py: Registered Venice in LLM client factory
  • letta/settings.py: Added venice_api_key to ModelSettings
  • letta/server/server.py: Auto-registers VeniceProvider when API key is set
  • letta/schemas/providers/__init__.py: Exports VeniceProvider
  • letta/schemas/llm_config.py: Added "venice" to model_endpoint_type Literal
  • letta/schemas/model.py: Added "venice" to model_endpoint_type Literal (deprecated schema)
  • letta/schemas/embedding_config.py: Added "venice" to embedding_endpoint_type Literal

Testing

  • tests/test_venice_client.py: Comprehensive unit tests for VeniceClient (100% coverage)
  • tests/test_venice_provider.py: Unit tests for VeniceProvider (100% coverage)
  • tests/test_venice_helper.py: Unit tests for helper functions (100% coverage)
  • tests/test_venice_coverage_comprehensive.py: Additional coverage tests for edge cases
  • tests/test_venice_live_api.py: Live API integration tests (6/6 passing with real API key)
  • tests/integration_test_venice.py: E2E tests (requires database setup)

Features

Chat Completions: Synchronous and asynchronous requests
Streaming: Server-Sent Events (SSE) streaming support
Tool Calling: OpenAI-compatible function/tool calling
Embeddings: Batch embeddings generation
Error Handling: Comprehensive error mapping with retry logic
Dynamic Model Listing: Models discovered from Venice API
BYOK Support: Bring Your Own Key provider support
100% Test Coverage: Unit, integration, and E2E tests

API Compatibility

Venice AI uses OpenAI-compatible API format, so integration is seamless:

  • Request format: OpenAI-compatible messages, tools, parameters
  • Response format: OpenAI-compatible chat completion format
  • Streaming: Server-Sent Events (SSE) format
  • Embeddings: OpenAI-compatible embeddings endpoint

Configuration

Users can configure Venice AI by setting:

export VENICE_API_KEY="your-api-key"

Or in settings:

venice_api_key = "your-api-key"

Models are automatically discovered and available as venice/{model_id}.

Testing

  • 100% code coverage for all Venice implementation files
  • 97 unit tests passing
  • 6 live API tests passing with real Venice API key
  • Performance: Request/response times < 2s (tested)

Documentation

User documentation has been prepared (see VENICE_USER_DOCS.md and VENICE_SETUP_GUIDE.md in workspace root - not committed to repo).

Breaking Changes

None. This is a purely additive change.

Checklist

  • All tests passing (97 unit tests, 6 live API tests)
  • 100% test coverage achieved
  • Code follows Letta patterns (matches OpenAI/Anthropic client style)
  • Documentation complete (docstrings, type hints, inline comments)
  • No external dependencies added (extracted SDK code directly)
  • Backward compatible (no breaking changes)
  • Error handling comprehensive
  • Performance acceptable (< 2s response times)
  • Live API tests verified with real API key

Related Issues

Notes

  • Reasoning model detection: is_reasoning_model() queries the Venice API to check model traits. Models with reasoning capabilities (indicated by traits like "reasoning", "reasoner", "thinking", "o1", "o3", "o4") are correctly identified as reasoning models.
  • Venice does not support inner thoughts in kwargs or developer role
  • LLM model list is dynamically fetched from Venice API (not hardcoded)
  • Embedding models: Venice API's /models endpoint doesn't return embedding models, but embeddings work via the /embeddings endpoint. We hardcode common embedding models (text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large, text-embedding-bge-m3) similar to OpenAI's approach.

- Introduced Venice API key in ModelSettings.
- Updated LLMClient to handle Venice as a provider.
- Added Venice to ProviderType enum.
- Registered VeniceProvider in the providers module.
- Integrated VeniceProvider into the SyncServer for API key management.
- Added support for the Venice endpoint in embedding and LLM configurations.
- Updated VeniceClient to include a default context window for embeddings.
- Improved test coverage for VeniceClient, including error handling and API interactions.
- Refactored mock responses in tests for better clarity and reliability.
- Add comprehensive docstrings to all methods matching Letta codebase style
- Add inline comments for complex logic (SSE parsing, error mapping, retry)
- Document Venice-specific parameters and error handling approach
- Clean up unused imports (ErrorCode, Dict, Tuple)
- Improve error message clarity and consistency
- Updated is_reasoning_model() to query Venice API and check model_spec.traits
- Models with reasoning traits (reasoning, reasoner, thinking, o1, o3, o4) are correctly identified
- Added comprehensive tests for reasoning model detection (5 new tests)
- Updated PR description to reflect correct behavior
… doesn't list them

- Venice API /models endpoint only returns text models, not embedding models
- Embeddings work via /embeddings endpoint but models aren't discoverable
- Hardcode common embedding models (ada-002, embedding-3-small/large, bge-m3)
- All return 1024 dimensions (verified with live API)
- Updated test to match new hardcoded implementation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant