Skip to content

Consider limiting the context that is used for the topic derivation and language identification #232

@adubovik

Description

@adubovik

Two expensive ML computations are happening in the analytics service:

  1. Topic derivation
  2. Language identification

Both are CPU-bound.
Both take as input the whole text of the incoming chat completion or embedding requests.

To ease the CPU load and make it more predictable, we suggest taking only the last, let's say, 10K characters from the request for the analysis.

The last messages of a chat completion request, plus its system message, usually have the most weight for the LLM, so it should be for the analytics service too.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions