Open
Description
The model engine is built from llama 3 70b with tensor parallelism tp=2 and pp=2 and deployed by below triton launch script:
python3 scripts/launch_triton_server.py --world_size 4 --model_repo=llama_ifb
In this case, how to leverage model-analyzer to analyze this parallelized model/deployment?
Metadata
Metadata
Assignees
Labels
No labels