How to analyze large models like llama 3 70B that requires model parallelism?

The model engine is built from llama 3 70b with tensor parallelism tp=2 and pp=2 and deployed by below triton launch script:
python3 scripts/launch_triton_server.py --world_size 4 --model_repo=llama_ifb

In this case, how to leverage model-analyzer to analyze this parallelized model/deployment?