Testing for parity between cli and server #1459

zphoenixrises · 2025-05-16T06:43:03Z

To test cli script:

python3 -m shortfin_apps.llm.cli --device hip --tokenizer_json=/data/Mistral-Nemo-Instruct-2407-FP8/tokenizer.json  --model_config=../artifacts/quark_mistral_config.json  --vmfb=../artifacts/quark_mistral_nemo.vmfb  --prompt "one two three four five"  --parameters /data/Mistral-Nemo-Instruct-2407-FP8/quark_mistral_nemo.irpa  --benchmark  --benchmark_tasks=64  --decode_steps=64 --token_selection_strategy=multi_greedy --num_beams=8 --device_ids 0 --workers_offline=16

To test server

 python -m shortfin_apps.llm.server   --tokenizer_json /data/Mistral-Nemo-Instruct-2407-FP8/tokenizer.json   --model_config ../artifacts/quark_mistral_config.json   --vmfb ../artifacts/quark_mistral_nemo.vmfb   --device=hip   --device_ids 0   --parameters /data/Mistral-Nemo-Instruct-2407-FP8/quark_mistral_nemo.irpa   --token_selection_strategy multi_greedy   --num_beams 8   --port 8081

Then run benchmark script with the following:

      OUTPUT="${MODEL}_gpu${NUM_GPUS}_bo${BON}_${OUTPUT_TOKENS}_out_${INPUT_TOKENS}_in.json"
      python3 profiles.py profile --samples 64 \
        --output $OUTPUT \
        --output_tokens $OUTPUT_TOKENS \
        --input_tokens $INPUT_TOKENS \
        --best_of_n $BON \
        --batches "8" \
        --server_ports "8081"

Testing for parity between cli and server

4872bb7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testing for parity between cli and server #1459

Testing for parity between cli and server #1459

zphoenixrises commented May 16, 2025

Uh oh!

Uh oh!

Testing for parity between cli and server #1459

Are you sure you want to change the base?

Testing for parity between cli and server #1459

Conversation

zphoenixrises commented May 16, 2025

Uh oh!

Uh oh!