feat: add Forge embedders integration#3385
Conversation
Adds a Forge embedders integration that mirrors the Perplexity and Mistral OpenAI-compatible embedder integrations. ForgeTextEmbedder and ForgeDocumentEmbedder subclass Haystack's built-in OpenAI embedders and default to the Forge OpenAI-compatible API (https://api.voxell.ai/v1, FORGE_API_KEY, model forge-pro). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
The from_dict overrides on ForgeTextEmbedder and ForgeDocumentEmbedder called default_from_dict directly, leaving the serialized api_key as a plain dict. On older Haystack versions (tested floor 2.22.0, exercised by the lowest-direct-dependencies CI job), default_from_dict does not auto-deserialize secrets, so the parent OpenAI embedder __init__ then called .resolve_value() on a dict and raised AttributeError. Deserialize the api_key Secret in place before default_from_dict, matching the standard Haystack integration pattern. Fixes the 4 failing from_dict / round-trip unit tests across all supported Haystack versions.
|
@JCorners68 Thank you for opening this pull request. Great to see that you used the scaffolding script! However, without the CLA agreed to we can't merge this PR. Either way, we recommend to publish and release this integration yourself using our repository template if you believe it benefits a larger group of users. https://github.com/deepset-ai/custom-component For example, as a repo under https://github.com/VoxellInc/ For the moment, Haystack users can use OpenAITextEmbedder class as follows from haystack.components.embedders import OpenAITextEmbedder
from haystack.utils import Secret
OpenAITextEmbedder(
api_key=Secret.from_env_var("FORGE_API_KEY"),
api_base_url="https://api.voxell.ai/v1",
model="forge-pro",
)We'll monitor requests from the community and if there is an increase in demand for a dedicated integration, potentially with additional features, we will reconsider. |
What this adds
A new
forgeembedders integration (forge-haystack) underintegrations/forge/, providing:ForgeTextEmbedder(OpenAITextEmbedder)ForgeDocumentEmbedder(OpenAIDocumentEmbedder)Forge serves an OpenAI-compatible embeddings API, so both components subclass Haystack's built-in OpenAI embedders and default to the Forge endpoint:
api_base_url="https://api.voxell.ai/v1"api_key=Secret.from_env_var("FORGE_API_KEY")model="forge-pro"(other accepted strings:forge-turbo,forge-ultra-4k, plus OpenAI-compatible aliasestext-embedding-3-small,text-embedding-3-large,text-embedding-ada-002)A
dimensionsparameter is exposed (passed through to the underlying OpenAI embedder) since Forge models support Matryoshka representation learning.to_dict()/from_dict()are implemented viadefault_to_dict/default_from_dict, and aSUPPORTED_MODELSClassVar lists the accepted model strings.This mirrors the existing OpenAI-compatible embedder integrations — the merged Perplexity integration (#3262) and
integrations/mistral/— using the same thin-subclass pattern. No new SDK dependency is added;haystack-aialready ships the OpenAI client.How it was tested
From
integrations/forge/:hatch run fmt-check— passeshatch run test:types— mypy clean (3 source files)hatch run test:unit-cov-retry— 16 unit tests pass, 100% coverage of both modules (integration tests requiring a realFORGE_API_KEYskip, as expected)Unit tests mirror the Perplexity/Mistral embedder tests: they assert init defaults/metadata and
to_dict/from_dictround-trips, and do not make live API calls.The scaffold was generated with
scripts/create_new_integration.py --name forge --type embedders, which also added the GitHub workflow, labeler entry, coverage-comment workflow entry, and the root README row.I authored 100% of this contribution and have the right to submit it.