Skip to content

Llama2-70B Profiling and Pipeline Parallel investigation #25

@naveenmiriyaluredhat

Description

@naveenmiriyaluredhat
  1. Comparing vLLM and TRT-LLM Profiles
  2. NVIDIA NCU software profiling
  3. NVIDIA NSYS profiling
  4. Compare gaps on Llama2-70B iteration characteristics ?
  5. Autotune on LLama2-70B for H100 for maximum throughput ?
  6. Understanding Pipeline Parallelism ?
  7. Undertand the PR from vLLM ?
  8. How to measure bubbles in PP ??

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions