Skip to content

Release v0.5.0

Latest

Choose a tag to compare

@briancoutinho briancoutinho released this 28 May 16:45
· 44 commits to main since this release

Summary

Added

  • Added support for AMD GPUs.
  • Update pyproject.toml to workaround missing stub packages for yaml.
  • Add trace format validator
  • Added multiple trace filter classes and demos.
  • Added enhanced trace call stack graph implementation.
  • Added memory timeline view.
  • Added support for trace parser customization.
  • Added support for H100 traces.
  • Add nccl collective fields to parser config
  • Queue length analysis: Add feature to compute time blocked on a stream hitting max queue length.
  • Add kernel_backend to parser config for Triton / torch.compile() support.
  • Add analyses features for GPU user annotation attribution at trace and kernel level.
  • Add support to parse all trace event args.

New Feature: Critical Path Analysis

  • Added lightweight critical path analysis feature.
  • Critical path analysis features: event attribution and summary()
  • Critical path analysis fixes: fixing async memcpy and adding GPU to CPU event based synchronization.
  • Added save and restore feature for critical path graph.
  • Added save and restore feature for critical path graph.
  • Fixes bug in Critical path analysis relating to listing out the edges on the critical path.
  • Updated critical path analysis with edge attribution.
  • Improvement: allow filtering of flow events in the overlaid trace.

Changed

  • Change test data path in unittests from relative path to real path to support running test within IDEs.
  • Add a workaround for overlapping events when using ns resolution traces (pytorch/pytorch#122425)
  • Better handling of CUDA sync evaents with steam = -1
  • Fix ijson metadata parser for some corner cases
  • Add an option for ns rounding and cover ijson loading with it.
  • Updated Trace() api to specify a list of files and auto figure out ranks.

Fixed

  • Fixed issue #65 to handle floating point counter values in cupti_counter_analysis.

Full Changelog: v0.2.0...v0.5.0