·
17 commits
to master
since this release
In this release, we've moved from Python+Platform specific builds to Platform-only specific builds. Now, rather than only supporting Python 3.10-3.13, we support all Python versions greater than 3.10 and all of their variants (3.14t).
Release Notes
Highlights
- Trace Viewer V2 Enhancements: Significant UI/UX improvements, including a new mouse mode toolbar, area selection, static search, and persistent UI states.
- Performance Optimizations: Faster trace processing by avoiding unnecessary memory copies, using proto arenas, and optimizing compression levels.
- Build & Packaging: Improved multi-Python version build support and resolved various OSS build and macOS compatibility issues.
Detailed Changes
Trace Viewer V2 & Frontend
- New Features:
- Implemented Static Search in Trace Viewer V2 with a Material 3 pill-shaped search box.
- Added a Mouse Mode Toolbar, keyboard shortcuts, and zoom interactions.
- Added support for Area Selection and drag-and-drop for JSON files.
- Added a feature to click to copy track names.
- Displayed an area chart for top-level process utilization in the timeline preview.
- UI/UX Improvements:
- Updated fonts and styling to Material 3 specifications.
- Added vertical scrolling and automatic track expansion when navigating to an event.
- Made the detail drawer resizable and preserved its size across sessions.
- Preserved group expanded states across trace data updates.
- Improved counter event hover visual effects.
- Bug Fixes & Stability:
- Fixed canvas resize flickering.
- Fixed mouse wheel shortcuts and multi-host selection.
- Prevented layout shifts by reserving scrollbar width.
- Stabilized Trace Viewer URL generation.
- Internal Refactoring:
- Introduced a scheduler to manage redraws.
- Refactored timeline navigation and drawing logic.
- Renamed Emscripten bindings for better TypeScript readability.
Core Profiler & Backend
- Performance Optimizations:
- Sped up trace processing by avoiding unnecessary memory copies of Perfetto output.
- Used Proto Arenas to speed up
LoadFromFileandWriteToCord. - Changed default gzip compression level from 6 to 1 for faster processing.
- Reduced memory allocation during the creation of the prefix trie search index.
- Optimized
TraceEventserialization for LevelDB storage.
- Features & Protocol Updates:
- Added
TraceDataResponseproto to significantly reduce payload size over the wire. - Enabled async full DMA feature in 3P Trace Viewer.
- Added counter lines for megascale collective count and in-flight bytes.
- Added
- Smart Suggestions:
- Optimized
EventTimeFractionAnalyzerto process all events at once. - Added caching and prevented redundant fetches for smart suggestions.
- Optimized
- Bug Fixes:
- Fixed SparseCore busy and idle time calculations.
- Synchronized viewport initialized ranges and locked concurrent fetches.
Graph Viewer
- Fixed HLO download bug.
- Added fallback to find
HloProtoby node name. - Added an HLO metadata fixer.
Build, Infrastructure & Packaging
- Created a custom
CustomBdistWheelclass to enable multi-Python version builds. - Fixed Kokoro build failures by forcing platlib in
setup.py. - Limited macOS
linkoptsto exclude Linux-exclusive flags. - Included Trace Viewer V2 assets in the pip package and fixed OSS build issues.
- Decoupled heavy
google-cloud-cppdependencies.
Documentation & Tests
- Added documentation for Custom Call Profiling and CPU Perf Counters support.
- Updated the demo trace to a modern MaxText workload.
- Added automated load performance benchmarks and various unit tests to improve coverage.
Individual Commits
- Fix Kokoro build failure by forcing platlib in setup.py. by Profiler Team in #2513
- Synchronize mouse mode changes from keyboard shortcuts to UI. by Profiler Team in #2502
- Stabilize Trace Viewer URL generation and remove redundant caches. by Profiler Team in #2492
- [XProf: trace viewer] Code cleanup: Refactor timeline_test.cc by grouping and sorting tests by @lilysjtu2011 in #2505
- [XProf: trace viewer] Extract
GetNextGroupStartLevelhelper function. by @lilysjtu2011 in #2510 - [XProf: trace viewer] Go to next search result on Enter by @lilysjtu2011 in #2506
- [XProf: trace viewer] Expand related tracks when revealing an event by @lilysjtu2011 in #2498
- [XProf: trace viewer] Add vertical scrolling when navigating to an event by @lilysjtu2011 in #2497
- [XProf: trace viewer] Implement static search in Trace Viewer V2 by @lilysjtu2011 in #2496
- Create custom
CustomBdistWheelclass to enable multi-Python version builds by @Matt-Hurd in #2504 - [XProf: trace viewer] Fix canvas resize flickering by @lilysjtu2011 in #2500
- [XProf: trace viewer] Refactor timeline navigation to reveal by @lilysjtu2011 in #2494
- [XProf: trace viewer] Use stable hash function for event color generation by @lilysjtu2011 in #2499
- [XProf: trace viewer] Keep visualization visible when unfolded for process tracks by @lilysjtu2011 in #2487
- Add mouse mode toolbar and fix cursor sync in Trace Viewer v2. by Profiler Team in #2484
- Add mouse mode shortcuts and zoom interaction to Trace Viewer v2. by Profiler Team in #2474
- Limit macos linkopts to exclude linux-exclusive ones. by @Matt-Hurd in #2486
- Replace broken
sedcommand with a patch file by @Matt-Hurd in #2475 - [XProf: trace viewer] Add automated load performance benchmark by @lilysjtu2011 in #2473
- Enable async full DMA feature in 3P Trace Viewer. by Profiler Team in #2459
- [XProf: trace viewer] Add JSON drag-and-drop support by @lilysjtu2011 in #2461
- Add HLO metadata fixer by @subhamsoni-google in #2468
- Implement Details Selector in Trace Viewer v2 UI by Profiler Team in #2460
- Implement area selection and migrate click selection to MouseReleased in Trace Viewer. by Profiler Team in #2464
- [XProf: trace viewer] Introduce a scheduler to manage redraws in Trace Viewer v2. by @lilysjtu2011 in #2067
- Fix SparseCore busy and idle time calculations. by @bmass02 in #2465
- Add TraceDataResponse proto definition. This would be used for transferring trace data over the wire significantly reducing the payload size from the current JSON baseline. by Profiler Team in #2436
- Support
uint32_t,int64_t, anduint64_ttypes when convertingabsl::anytoemscripten::valfor WASM bindings. Also add styling for.single-event-tableand.selection-infoto make single-event tables more compact. by Profiler Team in #2454 - This test verifies that performance counters are successfully collected and can be processed by the xprof plugin when using jax.profiler.trace. The test runs a simple matrix multiplication and addition, then checks for the presence of 'perf_counters' data in the generated xplane file. by Profiler Team in #2453
- Optimize TraceEvent serialization for LevelDB storage. by Profiler Team in #2382
- Fix HLO Download Bug in Graph Viewer. by @smit-hinsu in #2455
- Synchronize viewport initialized ranges and lock concurrent fetches. by Profiler Team in #2451
- Enable option to not use cached smart suggestion results in 3P by @RuiyangZhu in #2440
- SS Optimization: Make EventTimeFractionAnalyzer to process all events at once by @RuiyangZhu in #2406
- Fix Build Error in 3P Xprof Build Script by @RuiyangZhu in #2439
- [XProf: trace viewer] Merge Timeline Level Y Positions into precalculated relative offsets by @lilysjtu2011 in #2435
- [XProf: trace viewer] Cleanup ImGui Table Flags and Extract Pixel Constants by @lilysjtu2011 in #2434
- [XProf: trace viewer] Make trace viewer v2 search box a GM3 pill shape by @lilysjtu2011 in #2432
- [XProf: trace viewer] Refactor Timeline logic by @lilysjtu2011 in #2433
- [XProf: trace viewer] Display an area chart for the top-level process utilization in the timeline preview. by @lilysjtu2011 in #2431
- [XProf: trace viewer] Improve counter event hover visual effect by @lilysjtu2011 in #2430
- Resolve rest of formatting and lint warnings in ts files. by @zzzaries in #2425
- [XProf: trace viewer] Add click to copy track name feature by @lilysjtu2011 in #2429
- [XProf: trace viewer] Format Process Track Name in Two Lines by @lilysjtu2011 in #2428
- [XProf: trace viewer] Update Trace Viewer fonts to Material 3 specifications by @lilysjtu2011 in #2427
- [XProf: trace viewer] UI styling and track alignment improvements. by @lilysjtu2011 in #2426
- Add docs for upcoming cpu perf counters support by @Matt-Hurd in #2419
- Add fallback to find HloProto by node name in Graph Viewer. by @subhamsoni-google in #2391
- [XProf: trace viewer] Separate drawing logic for ruler and vertical lines by @lilysjtu2011 in #2421
- Create EventFractionAnalyzerResults Proto by @RuiyangZhu in #2387
- [XProf: trace viewer] Add secondary container color and fix process track color collision by @lilysjtu2011 in #2420
- [XProf: trace viewer] Add aggregated event preview for collapsed groups. by @lilysjtu2011 in #2395
- [XProf: trace viewer] Rename Emscripten bindings for TypeScript readability. by @lilysjtu2011 in #2388
- Add an option to not use cached smart suggestion results in 1P by @RuiyangZhu in #2394
- Project import generated by Copybara by Profiler Team in #2418
- Don't show legacy DCN view in Trace Viewer by default. Users should use the new by @bbadawi1 in #2404
- [XProf: trace viewer] Fix mouse wheel shortcuts and add input handler tests. by @lilysjtu2011 in #2414
- Change gzip compression level from 6 (default) to 1. by @bbadawi1 in #2412
- Apply code formatting to TypeScript files. by @zzzaries in #2415
- Speed up trace processing by avoiding unnecessary memory copy of perfetto output. by @bbadawi1 in #2410
- Switch EventForest to use new XPlaneVisitor factory. by @bmass02 in #2401
- Use SmartSuggestionProcessor in ConvertMultiXSpacesToSmartSuggestion in xplane_to_tools_data by @xiongbolu-mineral in #2372
- Use proto arenas to speed up LoadFromFile and WriteToCord. by @bbadawi1 in #2409
- Add documentation for Custom Call Profiling in XProf. by @cliveverghese in #2408
- Update demo trace to modern MaxText workload. by @Matt-Hurd in #2407
- [XProf: trace viewer] Add tests to improve test coverage and fix regressions. by @lilysjtu2011 in #2402
- Fix multi-host selection in trace viewer by @muditgokhale2 in #2403
- Use channel_id instead of rendezvous for connecting megascale events. by @bbadawi1 in #2392
- Refractor AddHloProto to get module_id from module by @cliveverghese in #2304
- Switch EventForest to use new XPlaneVisitor factory. by @muditgokhale2 in #2384
- Add counter lines for megascale collective count and in-flight bytes. by @bbadawi1 in #2381
- Use public targets from google_cloud_cpp by @ethanluoyc in #2376
- Add default constructors to hlo_proto_map by @cliveverghese in #2349
- [XProf: trace viewer] Making one-line threads not collapsible. by @lilysjtu2011 in #2370
- [XProf: trace viewer] Preserve group expanded states across trace data updates. by @lilysjtu2011 in #2367
- Only render the timeline graph if the allocation timeline is not empty. by @muditgokhale2 in #2378
- Reduce memory allocation while creation of prefix trie trace events search index. by Profiler Team in #2379
- Switch EventForest to use new XPlaneVisitor factory. by @bmass02 in #2371
- Register BarrierCoresRule for 3P XProf by @xiongbolu-mineral in #2353
- Update xprof WORKSPACE.BAZEL with patch commands for emsdk by Profiler Team in #2374
- Cache smart suggestion result for 3P XProf by @RuiyangZhu in #2366
- Fix missing overview page contents after smart suggestion request by @xiongbolu-mineral in #2359
- Remove unused function ConvertMultiXSpacesToTfDataBottleneckAnalysis from xplane_to_tools_data by @xiongbolu-mineral in #2365
- Include trace viewer v2 assets in the pip package, and fix OSS build issues by Profiler Team in #2351
- [XProf: trace viewer] Make the trace viewer drawer size persistent. by @lilysjtu2011 in #2343
- [XProf: trace viewer] Reserve scrollbar width to avoid layout shift. by @lilysjtu2011 in #2356
- Refactor: Convert recursive event tree traversal to iterative by Profiler Team in #2360
- Refactor: Abstract smart suggestion check in DataServiceV2 by @xiongbolu-mineral in #2354
- Don't populate disaggregated serving latency for irrelevant sessions by Profiler Team in #2358
- Prevent redundant smart suggestion fetches for the same session. by @xiongbolu-mineral in #2352
- Decouple third_party/xprof/convert:repository from heavy google-cloud-cpp dependencies. by Profiler Team in #2344
- Register 3P specific rules in smart suggestion processor and disable it from profile processor framework by @xiongbolu-mineral in #2348
- When a module already has a run_id, use it instead of creating a new one. by @bbadawi1 in #2350
- [XProf: trace viewer] Add angular-split to make the detail drawer resizable. by @lilysjtu2011 in #2341
- Update XProf documentation and README. by Profiler Team in #2346
- Fix documentation typo in OpMetrics proto by @charlesalaras in #2347
Full Changelog: xprof-v2.22.0...xprof-v2.22.2