Consider limiting the context that is used for the topic derivation and language identification

Two expensive ML computations are happening in the analytics service:
1. Topic derivation
2. Language identification

Both are CPU-bound.
Both take as input the whole text of the incoming chat completion or embedding requests.

To ease the CPU load and make it more predictable, we suggest taking only the last, let's say, 10K characters from the request for the analysis.

The last messages of a chat completion request, plus its system message, usually have the most weight for the LLM, so it should be for the analytics service too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider limiting the context that is used for the topic derivation and language identification #232

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Consider limiting the context that is used for the topic derivation and language identification #232

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions