Skip to content

GuideLLM DS Generation Engine #26 #134

Open
@SharonGil

Description

@SharonGil

Motivation

GuideLLM currently enables to receive HF originated DSs, path to local DSs, or synthetic DSs.
In order to demonstrate the strengths of KVCache-aware routing, we need to be able to easily create DSs that represent use-cases that bring the highest value and leverage the advantages of this kind of routing, such as RAG-based apps and agentic apps.

The Plan

Create a DS generation engine that will receive use-case requirements as parameters and return a guideLLM-ready full DS that matches the use case. The DS will be ready to fed as the --data parameter in GuideLLM Benchmark without any changes.

High-level steps of implementation

The engine will include 2 consecutive layers:

  1. First layer will receive requirements as parameters (i.e. number of different apps, system prompt length, tools length, RAG docs length, RAG docs number per app etc.), and eventually will return a json file, containing all the Apps required by the user, in a textual human-understandable form, where all length are completely configurable
    simplified example:

{ "systemPrompt": "8DzB0vXMMDO1ihCpCNsEBDH2FrHfmnR", "tools": "iSvQglvUQgoapyEWuYjNvgrqRR8DeX6zH6vQfQoC0OSSzcafs1XHHHLnxYS9O", "ragDocs": [ "Iwat4dvnPdrmsLhYEP8RTsR9Es1kc4MI0wIfsFG55" "0xYplap6ennnt6nlhBFMjlJTHNU8kW68JhaHY6TK" ] }

  1. Second layer will receive the output json of the 1st layer as input, along with use-case related parameters (i.e. number of users, number of request per user-session etc, num of users that share the same App, num of documents per user etc.), and then compress and flatten it to guideLLM-ready prompt based DS, in a way that will take the use-case into consideration.
    i.e. 10 users, each 2 users use the same App from which they use 2 documents - the layer will create couples of consecutive prompts sharing the same App's system-prompt an tools, and differing only in the RAG docs chosen (maybe) and in the user-prompt.

Link to issue in Distributed-KV-Cache repo - https://github.com/neuralmagic/llm-d-kv-cache-manager/issues/4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions