It would be good to discuss and document and improve how the recommendation system currently works, including the matching logic and limitations.
Points to Clarify
- Does the system require an exact match for prompt_length and output_length?
- For QPS: is it true that it does not require exact QPS matching today?
- Confirm that there is no interpolation between prompt/output ranges.
- If a (prompt_len, output_len) pair doesn't exist, we simply fail or return nothing.
- What happens when an exact (prompt_len, output_len) combination exists for multiple QPS values?
- What is the fallback behavior when there is no match at all?