Skip to content

Releases: etalab-ia/OpenGateLLM

0.4.0post1

12 Feb 18:56
c619776

Choose a tag to compare

v0.4.0post1 is a patch release with bug fixes and refactoring improvements.

Refactoring

  • Refactored routers to better align with clean architecture principles, improving maintainability and code organization (#658)
  • Improved formatting of configuration errors for clearer and more actionable feedback (#672)

Bug fixes

  • Fixed an issue affecting streaming chat responses in models to ensure proper real-time output delivery (#692)
    Version 0.4.0 introduced a streaming bug that has now been fixed. The handling of streaming and its related errors has been completely revised to use aiter_lines instead of aiter_raw, in order to ensure proper streaming formatting as reliably as possible.
  • Added proper threshold handling in the search module to improve result filtering behavior (#684)

Full Changelog: 0.4.0...0.4.0post1

0.4.0

09 Feb 17:45
620de23

Choose a tag to compare

With the release of OpenGateLLM version 0.4.0 (previous version 0.3.7), we have decided to revise our approach based on Elasticsearch's recommended best practices. The main changes are as follows:

  • Deprecation of Qdrant support in favor of Elasticsearch
  • Consolidation of Elasticsearch indices into a single index
  • Convert document metadata into a single field with constraints

These major changes require a data migration of your vector store. We provide a migration script here to help you update your instance.

Introduction

Currently, we support two vector store technologies: Qdrant and Elasticsearch. A few months ago, we decided to focus on Elasticsearch for managing our document collections. We revisit the reasons for this choice here.

However, over the past few weeks, we have encountered scalability issues with Elasticsearch. These problems stem from how we implemented Elasticsearch in OpenGateLLM. To resolve these issues, we decided to revise our approach, which involves major changes and a data migration.

To this end, we detail the modifications we have made and provide a migration script to help you update your instance.

Why Elasticsearch over Qdrant?

In Retrieval-Augmented Generation (RAG), there are 3 classic search methods:

  • Lexical search with BM25 (TF-IDF)
  • Semantic search with vector similarity
  • Hybrid search combining both (using the Reciprocal Rank Fusion (RRF) algorithm to combine results)

We sought to offer these 3 search methods to our users. Initially, we implemented Qdrant for semantic search due to its scalability.

However, when wanting to add lexical and hybrid search, we found that Qdrant does not natively support these methods. Their approach is based on deploying a model alongside the vector store.

Additionally, Elasticsearch excels at lexical search with BM25 and natively enables complex filtering on specific fields. For these reasons, Elasticsearch seems like a better solution for RAG search.

OpenGateLLM's goal is to support multiple vector store solutions to give you the choice of the technology that best suits your needs.

Major Changes

End of Qdrant Support

To focus on Elasticsearch support, we have decided to deprecate Qdrant support. This decision was made after consulting the community on the subject. Indeed, it turns out that currently, no one has chosen Qdrant for their OpenGateLLM instance.

Additionally, with the OpenGateLLM team having limited resources, we cannot afford to maintain two vector store solutions at this time.

We do not rule out revisiting this decision in the future if the community requests it. Moreover, the goal remains to support multiple vector store solutions in the long term, once we have the necessary resources.

Consolidation of Elasticsearch Indices into a Single Index

Currently, OpenGateLLM creates an Elasticsearch index for each collection. This approach allows collections to be managed independently. However, this is not optimal for scalability. Indeed, by default, Elasticsearch limits the number of shards (for each index, Elasticsearch creates at least one shard). The multiplication of indices can quickly become a performance bottleneck in this context.

A good rule-of-thumb is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. Source: How many shards should I have in my Elasticsearch cluster?

To solve this problem, we decided to consolidate all Elasticsearch indices into a single index. To migrate your data, we provide a migration script (see Migration Script).

Convert document metadata into a single field with constraints

Currently, when creating a document, users can define metadata for the document. They are free to define any metadata they want with the following types: int, str, float, datetime, or bool. Each metadata is stored in a separate field in the Elasticsearch index. This dynamic addition of metadata will quickly become problematic with the consolidation of indices into a single index. Indeed, Elasticsearch is not designed to optimally support thousands of fields on an index. This risks creating performance and scalability issues.

To address this issue, we decided to convert the metadata field into a single field of type flattened. However, this solution limits filtering actions on these fields (they are then stored in a single field and interpreted as str), see Elasticsearch documentation. However, the tests we have performed have shown that the filtering capabilities on a flattened field seem sufficient for RAG search operations.

Additionally, at the Pydantic level, we have added constraints on the types of data that can be stored in the metadata field.

From now on, the metadata field must comply with the following constraints:

MIN_NUMBER, MAX_NUMBER = -9999999999999999, 9999999999999999

MetadataStr = Annotated[str, StringConstraints(strip_whitespace=True, min_length=1, max_length=255)]
MetadataInt = Annotated[int, Field(ge=MIN_NUMBER, le=MAX_NUMBER)]
MetadataFloat = Annotated[float, Field(ge=MIN_NUMBER, le=MAX_NUMBER)]
MetadataList = Annotated[list[MetadataStr | MetadataInt | MetadataFloat | bool | None], Field(max_length=8)]

ChunkMetadata = Annotated[dict[MetadataStr, MetadataStr | MetadataInt | MetadataFloat | MetadataList | bool | None], Field(description="Extra metadata for the source", min_length=1, max_length=8)]

One possible solution would have been to define fields with the flattened type. However, this solution only partially solves the performance problem and limits filtering actions on these fields (they are then stored in a single field and interpreted as str).

To ensure the scalability of the Elasticsearch index, we decided to pre-define metadata for documents. This approach avoids overloading the index with metadata while maintaining type-based filtering capabilities.

Other Changes

Fixes

  • Fixed minor bugs in the Playground:

    • User expiration date formatting
    • Removal of the old collection ID type
    • Sorting and filters on Router and Provider pages
    • Removal of all roles and organizations for user creation
  • Fixed support for the language parameter for audio transcription models with vLLM and Albert API so it can be empty.

  • The collections parameter in the search endpoint is now correctly typed as list[int].

  • The rff_k property when using the hybrid search now accepts values between 0 (included) and 16384 (included). This fix enhance the readability of the endpoint and fix a division by zero error.

Improvements

  • Improved code readability for form data request declarations.
  • Return of the usage key in stream responses from /v1/chat/completions even if the stream does not end with the [DONE] token.
  • The collections parameter in the search endpoint now has a maximum length of 100 to avoid overwhelming the Elasticsearch index.
  • collection_id and document_id have been moved to the chunk level. Previously, they were part of the chunk's metadata field, which could have led users to believe these values were editable.

Migration Script

If you are running OpenGateLLM on an existing Elasticsearch instance, we invite you to use the migration script to migrate your data. Find the migration script in the GitHub repository.


Full Changelog: 0.3.7...0.4.0

0.3.7

19 Jan 09:52
759ccfd

Choose a tag to compare

What's Changed

  • minor improvment in doc by @leoguillaume in #557
  • Mise à jour des liens API Reference et API Swagger by @moscaale in #600
  • Documentation queuing by @blanch0t in #536
  • feat(api): remove web search references (brave, duckduckgo) by @tibo-pdn in #601
  • chore(deps): bump qs and express in /docs by @dependabot[bot] in #610
  • Update feature_request.md by @leoguillaume in #607
  • fix(docs): make quickstart work directly and match documentation by @natoromano in #605
  • Clean archi - model endpoint by @moscaale in #522
  • feat(github): add PR template with most of the useful sections by @tibo-pdn in #606
  • feat(api): remove carbon footprint prefix in provider parameters by @tibo-pdn in #603
  • Correct typo in OCR tutorial documentation by @cyrillay in #629
  • fix(ocr-beta): forward_request for ocr-beta by @moscaale in #613
  • feat(collections): add desc filter on collections creation date by @tibo-pdn in #631
  • feat(rerank): change signature of v1rerank endpoint to cohere standard by @tibo-pdn in #611
  • feat(models): add request content to basemodelprovider by @leoguillaume in #640
  • 608 core fix error when redis key of rate limit as no ttl by @tibo-pdn in #654

New Contributors

Full Changelog: 0.3.6...0.3.7

0.3.6

20 Dec 16:46
c837ec0

Choose a tag to compare

What's Changed

  • fix(playground): fix app title wrap when too long by @tibo-pdn in #584
  • fix(playground): user creation when budget is empty by @leoguillaume in #586
  • fix(albert): increase playground timeout by @leoguillaume in #587
  • fix(models): disable router and provider pagination for interne funct… by @leoguillaume in #588
  • fix: order by router name in limits when displaying roles by @tibo-pdn in #589
  • fix(qdrant): /chunks offset issue for Qdrant database by @tibo-pdn in #595
  • fix(playground): pagination state shared between classes and components by @tibo-pdn in #591
  • fix(playground): never expirred key in playground by @leoguillaume in #599

Full Changelog: 0.3.5...0.3.6

0.3.5post2

17 Dec 14:38
dc74f1f

Choose a tag to compare

What's Changed

  • fix(models): disable router and provider pagination for interne funct… by @leoguillaume in #588

Full Changelog: 0.3.5post1...0.3.5post2

0.3.5post1

17 Dec 13:06
ce93719

Choose a tag to compare

What's Changed

Full Changelog: 0.3.5...0.3.5post1

0.3.5

17 Dec 10:26
0119d94

Choose a tag to compare

What's Changed

Full Changelog: 0.3.4...0.3.5

0.3.4

11 Dec 15:26
51a6b54

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.3.3...0.3.4

0.3.3

05 Dec 11:11
9a30913

Choose a tag to compare

What's Changed

Full Changelog: 0.3.2...0.3.3

0.3.2post3

04 Dec 17:39
12a98f8

Choose a tag to compare

What's Changed

Full Changelog: 0.3.2post2...0.3.2post3