-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Summary
Add Qwen (Alibaba Cloud's large language model) as a new LLM provider option in Letta, enabling users to use Qwen series models for chat completions, streaming, tool calling, and embeddings generation.
Motivation
Qwen provides access to a series of high-performance LLM models (including Qwen-2, Qwen-1.5, etc.) through a unified and standardized API. Integrating Qwen as a provider would give Letta users:
-
Access to a diverse set of high-quality, Chinese-optimized models through a single provider
-
Standardized API format that aligns with industry common practices for seamless integration
-
Dynamic model discovery (models listed from Qwen API, not hardcoded), adapting to new model releases in real-time
-
Comprehensive support for core LLM capabilities: chat completions, streaming, tool calling, and embeddings generation
Proposed Solution
Implement a Qwen provider that:
-
Extends Letta's existing LLMClientBase following the established provider patterns
-
Supports all core LLM operations: chat completions, streaming responses, embeddings generation, and tool calling
-
Dynamically fetches and lists available Qwen models from the official Qwen API
-
Implements robust error handling and retry mechanisms for API stability
-
Maintains 100% test coverage (unit, integration, and E2E tests)
Benefits
-
Rich Model Ecosystem: Users gain access to Qwen's full model catalog, including general-purpose models, specialized models, and different parameter scales (e.g., Qwen-2-7B, Qwen-2-72B, Qwen-1.5-Chat)
-
Seamless Integration: Aligns with Letta's existing agent and LLM infrastructure, requiring no major adjustments to user workflows
-
Chinese-Optimized Support: Qwen models have excellent performance in Chinese language understanding and generation, expanding Letta's applicability in Chinese-speaking scenarios
-
Dynamic Model Discovery: Automatically synchronizes new Qwen models via API, eliminating the need for manual code updates to add new models
-
Full Feature Parity: Supports all core capabilities required by Letta, including streaming and tool calling, ensuring a consistent user experience across providers
Implementation Details
The implementation would include:
-
QwenClient: An LLM client class extending LLMClientBase, responsible for interacting with Qwen's API
-
QwenProvider: A provider class for dynamic model listing and provider registration
-
Configuration support via environment variables or Letta's settings file (for API key management)
-
Comprehensive test coverage: unit tests for client logic, integration tests for API interaction, and E2E tests for end-to-end user workflows
-
Detailed documentation: configuration guide, supported models list, and usage examples
Configuration
Users would configure Qwen by setting the API key via environment variable or Letta's settings:
Via environment variable:
export QWEN_API_KEY="your-api-key"
Or in Letta's settings:
qwen_api_key = "your-api-key"
Models would be automatically discovered and available in the formatqwen/{model_id} (e.g., qwen/qwen-2-7b-chat, qwen/qwen-1.5-14b-chat).
Additional Context
Qwen provides a standardized RESTful API that supports industry-common request/response formats, which means:
-
Chat completion and tool calling request/response formats are compatible with Letta's existing adapter patterns
-
Streaming responses use Server-Sent Events (SSE) format, consistent with Letta's existing streaming handling logic
-
Embeddings generation uses a dedicated, easy-to-integrate endpoint with clear input/output specifications
These characteristics make the integration of Qwen straightforward, low-cost, and maintainable for future iterations.
Thank you!