Description
Motivation
GuideLLM currently enables to receive HF originated DSs, path to local DSs, or synthetic DSs.
In order to demonstrate the strengths of KVCache-aware routing, we need to be able to easily create DSs that represent use-cases that bring the highest value and leverage the advantages of this kind of routing, such as RAG-based apps and agentic apps.
The Plan
Create a DS generation engine that will receive use-case requirements as parameters and return a guideLLM-ready full DS that matches the use case. The DS will be ready to fed as the --data parameter in GuideLLM Benchmark without any changes.
High-level steps of implementation
The engine will include 2 consecutive layers:
- First layer will receive requirements as parameters (i.e. number of different apps, system prompt length, tools length, RAG docs length, RAG docs number per app etc.), and eventually will return a json file, containing all the Apps required by the user, in a textual human-understandable form, where all length are completely configurable
simplified example:
{ "systemPrompt": "8DzB0vXMMDO1ihCpCNsEBDH2FrHfmnR", "tools": "iSvQglvUQgoapyEWuYjNvgrqRR8DeX6zH6vQfQoC0OSSzcafs1XHHHLnxYS9O", "ragDocs": [ "Iwat4dvnPdrmsLhYEP8RTsR9Es1kc4MI0wIfsFG55" "0xYplap6ennnt6nlhBFMjlJTHNU8kW68JhaHY6TK" ] }
- Second layer will receive the output json of the 1st layer as input, along with use-case related parameters (i.e. number of users, number of request per user-session etc, num of users that share the same App, num of documents per user etc.), and then compress and flatten it to guideLLM-ready prompt based DS, in a way that will take the use-case into consideration.
i.e. 10 users, each 2 users use the same App from which they use 2 documents - the layer will create couples of consecutive prompts sharing the same App's system-prompt an tools, and differing only in the RAG docs chosen (maybe) and in the user-prompt.
Link to issue in Distributed-KV-Cache repo - https://github.com/neuralmagic/llm-d-kv-cache-manager/issues/4
Metadata
Metadata
Assignees
Labels
Type
Projects
Status