Releases: etalab-ia/OpenGateLLM
0.4.0post1
v0.4.0post1 is a patch release with bug fixes and refactoring improvements.
Refactoring
- Refactored routers to better align with clean architecture principles, improving maintainability and code organization (#658)
- Improved formatting of configuration errors for clearer and more actionable feedback (#672)
Bug fixes
- Fixed an issue affecting streaming chat responses in models to ensure proper real-time output delivery (#692)
Version0.4.0introduced a streaming bug that has now been fixed. The handling of streaming and its related errors has been completely revised to useaiter_linesinstead ofaiter_raw, in order to ensure proper streaming formatting as reliably as possible. - Added proper threshold handling in the search module to improve result filtering behavior (#684)
Full Changelog: 0.4.0...0.4.0post1
0.4.0
With the release of OpenGateLLM version 0.4.0 (previous version 0.3.7), we have decided to revise our approach based on Elasticsearch's recommended best practices. The main changes are as follows:
- Deprecation of Qdrant support in favor of Elasticsearch
- Consolidation of Elasticsearch indices into a single index
- Convert document metadata into a single field with constraints
These major changes require a data migration of your vector store. We provide a migration script here to help you update your instance.
Introduction
Currently, we support two vector store technologies: Qdrant and Elasticsearch. A few months ago, we decided to focus on Elasticsearch for managing our document collections. We revisit the reasons for this choice here.
However, over the past few weeks, we have encountered scalability issues with Elasticsearch. These problems stem from how we implemented Elasticsearch in OpenGateLLM. To resolve these issues, we decided to revise our approach, which involves major changes and a data migration.
To this end, we detail the modifications we have made and provide a migration script to help you update your instance.
Why Elasticsearch over Qdrant?
In Retrieval-Augmented Generation (RAG), there are 3 classic search methods:
- Lexical search with BM25 (TF-IDF)
- Semantic search with vector similarity
- Hybrid search combining both (using the Reciprocal Rank Fusion (RRF) algorithm to combine results)
We sought to offer these 3 search methods to our users. Initially, we implemented Qdrant for semantic search due to its scalability.
However, when wanting to add lexical and hybrid search, we found that Qdrant does not natively support these methods. Their approach is based on deploying a model alongside the vector store.
Additionally, Elasticsearch excels at lexical search with BM25 and natively enables complex filtering on specific fields. For these reasons, Elasticsearch seems like a better solution for RAG search.
OpenGateLLM's goal is to support multiple vector store solutions to give you the choice of the technology that best suits your needs.
Major Changes
End of Qdrant Support
To focus on Elasticsearch support, we have decided to deprecate Qdrant support. This decision was made after consulting the community on the subject. Indeed, it turns out that currently, no one has chosen Qdrant for their OpenGateLLM instance.
Additionally, with the OpenGateLLM team having limited resources, we cannot afford to maintain two vector store solutions at this time.
We do not rule out revisiting this decision in the future if the community requests it. Moreover, the goal remains to support multiple vector store solutions in the long term, once we have the necessary resources.
Consolidation of Elasticsearch Indices into a Single Index
Currently, OpenGateLLM creates an Elasticsearch index for each collection. This approach allows collections to be managed independently. However, this is not optimal for scalability. Indeed, by default, Elasticsearch limits the number of shards (for each index, Elasticsearch creates at least one shard). The multiplication of indices can quickly become a performance bottleneck in this context.
A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. Source: How many shards should I have in my Elasticsearch cluster?
To solve this problem, we decided to consolidate all Elasticsearch indices into a single index. To migrate your data, we provide a migration script (see Migration Script).
Convert document metadata into a single field with constraints
Currently, when creating a document, users can define metadata for the document. They are free to define any metadata they want with the following types: int, str, float, datetime, or bool. Each metadata is stored in a separate field in the Elasticsearch index. This dynamic addition of metadata will quickly become problematic with the consolidation of indices into a single index. Indeed, Elasticsearch is not designed to optimally support thousands of fields on an index. This risks creating performance and scalability issues.
To address this issue, we decided to convert the metadata field into a single field of type flattened. However, this solution limits filtering actions on these fields (they are then stored in a single field and interpreted as str), see Elasticsearch documentation. However, the tests we have performed have shown that the filtering capabilities on a flattened field seem sufficient for RAG search operations.
Additionally, at the Pydantic level, we have added constraints on the types of data that can be stored in the metadata field.
From now on, the metadata field must comply with the following constraints:
MIN_NUMBER, MAX_NUMBER = -9999999999999999, 9999999999999999
MetadataStr = Annotated[str, StringConstraints(strip_whitespace=True, min_length=1, max_length=255)]
MetadataInt = Annotated[int, Field(ge=MIN_NUMBER, le=MAX_NUMBER)]
MetadataFloat = Annotated[float, Field(ge=MIN_NUMBER, le=MAX_NUMBER)]
MetadataList = Annotated[list[MetadataStr | MetadataInt | MetadataFloat | bool | None], Field(max_length=8)]
ChunkMetadata = Annotated[dict[MetadataStr, MetadataStr | MetadataInt | MetadataFloat | MetadataList | bool | None], Field(description="Extra metadata for the source", min_length=1, max_length=8)]One possible solution would have been to define fields with the flattened type. However, this solution only partially solves the performance problem and limits filtering actions on these fields (they are then stored in a single field and interpreted as str).
To ensure the scalability of the Elasticsearch index, we decided to pre-define metadata for documents. This approach avoids overloading the index with metadata while maintaining type-based filtering capabilities.
Other Changes
Fixes
-
Fixed minor bugs in the Playground:
- User expiration date formatting
- Removal of the old collection ID type
- Sorting and filters on Router and Provider pages
- Removal of all roles and organizations for user creation
-
Fixed support for the
languageparameter for audio transcription models with vLLM and Albert API so it can be empty. -
The
collectionsparameter in the search endpoint is now correctly typed aslist[int]. -
The
rff_kproperty when using the hybrid search now accepts values between 0 (included) and 16384 (included). This fix enhance the readability of the endpoint and fix a division by zero error.
Improvements
- Improved code readability for form data request declarations.
- Return of the
usagekey in stream responses from/v1/chat/completionseven if the stream does not end with the[DONE]token. - The
collectionsparameter in the search endpoint now has a maximum length of 100 to avoid overwhelming the Elasticsearch index. collection_idanddocument_idhave been moved to the chunk level. Previously, they were part of the chunk's metadata field, which could have led users to believe these values were editable.
Migration Script
If you are running OpenGateLLM on an existing Elasticsearch instance, we invite you to use the migration script to migrate your data. Find the migration script in the GitHub repository.
Full Changelog: 0.3.7...0.4.0
- fix(playground): expiration user date format when user creation by @leoguillaume in #663
- fix(search): remove old collection ID type by @leoguillaume in #662
- fix(playground): router and provider pages sort and filters by @leoguillaume in #664
- fix(playground): remove all roles and all organizations for user creation by @leoguillaume in #666
- fix(audio): fix request_format for Albert integration by @leoguillaume in #665
- feat(data): consolidate elasticsearch indices into a single index by @leoguillaume in #667
- doc(adr): elasticsearch scaling by @leoguillaume in #668
- feat(elastiscearch): add healthcheck to migration script and complete release note by @leoguillaume in #669
- feat(documents): change default metadata by @leoguillaume in #685
- remove document name from es index by @leoguillaume in #686
- feat(search): fix rff_k division by 0 plus tests by @tibo-pdn in #687
- feat(chunks): change chunk schema by @leoguillaume in #688
0.3.7
What's Changed
- minor improvment in doc by @leoguillaume in #557
- Mise à jour des liens API Reference et API Swagger by @moscaale in #600
- Documentation queuing by @blanch0t in #536
- feat(api): remove web search references (brave, duckduckgo) by @tibo-pdn in #601
- chore(deps): bump qs and express in /docs by @dependabot[bot] in #610
- Update feature_request.md by @leoguillaume in #607
- fix(docs): make quickstart work directly and match documentation by @natoromano in #605
- Clean archi - model endpoint by @moscaale in #522
- feat(github): add PR template with most of the useful sections by @tibo-pdn in #606
- feat(api): remove carbon footprint prefix in provider parameters by @tibo-pdn in #603
- Correct typo in OCR tutorial documentation by @cyrillay in #629
- fix(ocr-beta): forward_request for ocr-beta by @moscaale in #613
- feat(collections): add desc filter on collections creation date by @tibo-pdn in #631
- feat(rerank): change signature of v1rerank endpoint to cohere standard by @tibo-pdn in #611
- feat(models): add request content to basemodelprovider by @leoguillaume in #640
- 608 core fix error when redis key of rate limit as no ttl by @tibo-pdn in #654
New Contributors
- @natoromano made their first contribution in #605
Full Changelog: 0.3.6...0.3.7
0.3.6
What's Changed
- fix(playground): fix app title wrap when too long by @tibo-pdn in #584
- fix(playground): user creation when budget is empty by @leoguillaume in #586
- fix(albert): increase playground timeout by @leoguillaume in #587
- fix(models): disable router and provider pagination for interne funct… by @leoguillaume in #588
- fix: order by router name in limits when displaying roles by @tibo-pdn in #589
- fix(qdrant): /chunks offset issue for Qdrant database by @tibo-pdn in #595
- fix(playground): pagination state shared between classes and components by @tibo-pdn in #591
- fix(playground): never expirred key in playground by @leoguillaume in #599
Full Changelog: 0.3.5...0.3.6
0.3.5post2
What's Changed
- fix(models): disable router and provider pagination for interne funct… by @leoguillaume in #588
Full Changelog: 0.3.5post1...0.3.5post2
0.3.5post1
What's Changed
- fix(playground): fix app title wrap when too long by @tibo-pdn in #584
- fix(playground): user creation when budget is empty by @leoguillaume in #586
- fix(albert): increase playground timeout by @leoguillaume in #587
Full Changelog: 0.3.5...0.3.5post1
0.3.5
What's Changed
- 556 support mistral ocr api by @leoguillaume in #559
- Update issue templates by @leoguillaume in #569
- feat(audio): add support Mistral API for audio transcription by @leoguillaume in #578
- feat(playground): add pagination on router and provider pages by @tibo-pdn in #575
- fix(models): conflict with name and aliases for router creation by @leoguillaume in #579
- feat(playground): create unified header on the app by @tibo-pdn in #580
- feat(playground): sort filter of user page by @leoguillaume in #583
- feat(models): add update providers endpoint by @leoguillaume in #581
Full Changelog: 0.3.4...0.3.5
0.3.4
What's Changed
- minor improvment on playground by @leoguillaume in #552
- fix: adding check collections for all /collections endpoints by @FaheemBEG in #504
- plaground-fix by @leoguillaume in #560
- fix header playground by @leoguillaume in #562
- fix urls by @leoguillaume in #563
- Feat/add unique constraint on organization by @tibo-pdn in #561
New Contributors
Full Changelog: 0.3.3...0.3.4
0.3.3
What's Changed
- fix role page by @leoguillaume in #545
- hotfix: tei format request by @leoguillaume in #548
- hotfix: hidden models by @leoguillaume in #549
- hotfix: update user playground by @leoguillaume in #550
- feat: add search email bar by @leoguillaume in #551
Full Changelog: 0.3.2...0.3.3
0.3.2post3
What's Changed
- hotfix: hidden models by @leoguillaume in #549
Full Changelog: 0.3.2post2...0.3.2post3