Skip to content

Conversation

@kimbochen
Copy link
Owner

  • Added warmup with CLI argument --num-warmups
  • Added endpoint ready checking: Client tests endpoint and waits up to 10 minutes
  • Updated sequence length generation logic:
    • --random_range_ratio matches vLLM: sequence length is uniformly sampled in [seq_len * (1.0 - random_range_ratio), seq_len * (1.0 + random_range_ratio)]
    • Added iterative encode-decode to minimize token mismatch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants