Skip to content

Releases: openucx/ucc

v1.2.0-rc1

25 May 16:24
c0b5d1f

Choose a tag to compare

v1.2.0-rc1 Pre-release
Pre-release

This release includes numerous updates, bug fixes, and improvements across various components. The following is a summary of the changes based on the commit messages:

New Features and Enhancements

CL/HIER

  • Fixed single proc on node issue in alltoall (#658)
  • Implemented allreduce rab pipelined (#608)
  • Added bcast 2step algorithm (#620)
  • Fixed allreduce rab pipeline (#759)

TL/CUDA

  • Fixed cache unmap issue (#642)
  • Implemented reduce scatter linear (#669)
  • Added algorithm selection based on topology (#688)
  • Fixed linear algorithms (#751)
  • Fixed pipelining in linear rs (#770)

TL/UCP

  • Added special service worker (#560)
  • Added scatterv (#663)
  • Added gatherv (#664)
  • Fixed running with npolls 0 (#695)
  • Added knomial allgather (#729)
  • Fixed bug for triggered colls (#757)
  • Added bruck alltoall (#756)

TL/SHARP

  • Fixed memory type check in allreduce (#662)
  • Added support for sharpv3 dt (#661)
  • Fixed assert check (#686)
  • Implemented SHARP OOB fixes (#746)
  • Fixed local rank when NODE SBGP not enabled (#760)
  • Prevented sharp team with team max ppn > 1 (#761)

CORE

  • Fixed memory type score update (#650)
  • Fixed ucc parser build (#666)
  • Implemented ucc_pipeline_params (#675)
  • Changed log level of config_modify (#667)
  • Fixed timeout handle for triggered post (#679)

DOCS

  • Added User Guide (#720)

UCC Version 1.1.0

07 Oct 14:02
cd3fce9

Choose a tag to compare

Features

API

  • Added float 128 and float 32, 64, 128 (complex) data types
  • Added Active Sets based collectives to support dynamic groups as well as
    point-to-point messaging
  • Added ucc_team_get_attr interface

Core

  • Config file support
  • Fixed component search

CL

  • Added split rail allreduce collective implementation
  • Enable hierarchical alltoallv and barrier
  • Fixed cleanup bugs

TL

  • Added SELF TL supporting team size one

UCP

  • Added service broadcast
  • Added reduce_scatterv ring algorithm
  • Added k-nomial based gather collective implementation
  • Added one-sided get based algorithms

SHARP

  • Fixed SHARP OOB
  • Added SHARP broadcast

GPU Collectives (CUDA, NCCL TL and RCCL TL)

  • Added RCCL TL to support RCCL collectives
  • Added support for CUDA TL (intranode collectives for NVIDIA GPUs)
  • Added multiring allgatherv, alltoall, reduce-scatter, and reduce-scatterv
    multiring in CUDA TL
  • Added topo based ring construction in CUDA TL to maximize bandwidth
  • Added NCCL gather, scatter and its vector variant
  • Enable using multiple streams for collectives
  • Added support for RCCL gather (v), scatter (v), broadcast, allgather (v),
    barrier, alltoall (v) and all reduce collectives
  • Added ROCm memory component
  • Adapted all GPU collectives to executor design

Tests

  • Added tests for triggered collectives in perftests
  • Fixed bugs in multi-threading tests

Utils

  • Added CPU model and vendor detection
  • Several bug fixes in all components

UCC Version 1.1.0 - RC1

07 Sep 15:40
9f22d78

Choose a tag to compare

Pre-release

1.1.0

Features

API

  • Added float 128 and float 32, 64, 128 (complex) data types
  • Added Active Sets based collectives to support dynamic groups as well as point-to-point messaging

Core

  • Config file support
  • Fixed component search

CL

  • Added split rail all reduce collective implementation
  • Enable hierarchical alltoallv
  • Fixed cleanup bugs

TL

  • Added SELF TL supporting team size one

UCP

  • Added service broadcast
  • Added reduce_scatterv ring algorithm
  • Added k-nomial based gather collective implementation
  • Added one-sided get based algorithms

SHARP

  • Fixed SHARP OOB
  • Added SHARP broadcast

GPU Collectives (CUDA, NCCL TL and RCCL TL)

  • Added support for CUDA TL (intranode collectives for NVIDIA GPUs)
  • Added multiring allgatherv, alltoall in CUDA TL
  • Added NCCL gather, scatter and its vector variant
  • Enable using multiple streams for collectives
  • Added support for RCCL gather (v), scatter (v), broadcast, allgather (v), barrier, alltoall (v) and all reduce collectives
  • Added ROCm memory component
  • Adapted all GPU collectives to executor design

Tests

  • Added tests for triggered collectives in perftests
  • Fixed bugs in multi-threading tests

Utils

  • Added CPU model and vendor detection
  • Several bug fixes in all components

Unified Collective Communication, Version 1.0.0

19 Apr 21:57
c69c53b

Choose a tag to compare

1.0.0

Features

API

  • Added Avg reduce operation
  • Added nonblocking team destroy option
  • Added user-defined datatype definitions
  • Added Bfloat16 type
  • Clarify semantics of core abstractions including teams and context
  • Added timeout option

Core

  • Added coll scoring and selection support
  • Added support for Triggered collectives
  • Added support for timeouts in collectives
  • Added support for team create without ep in post
  • Added support for multithreaded context progress
  • Added support for nonblocking team destroy

CL

  • Added support for hierarchical collectives
  • Added support for hierarchical allreduce collective operation
  • Added support for collectives based on one-sided communication routines

TL

  • Added SHARP TL

UCP

  • Added Bcast SAG algorithm for large messages
  • Added Knomial based reduce algorithm
  • Making allgather and alltoall agree with the API
  • Added SRA knomial allreduce algorithm
  • Added pairwise alltoall and alltoallv algorithms
  • Added allgather and allgatherv ring algorithms
  • Added support for collective operations based on one-sided semantics
  • Added support for alltoall with one-sided transfer semantics
  • Bug fixes

SHARP

  • Added support for switch-based hardware collectives (SHARP)

NCCL

  • Add support for NCCL allreduce, alltoall, alltoallv, barrier, reduce, reduce
    scatter, bcast, allgather and allgatherv

Tests

  • Updated tests to test the newly added algorithms and operations

Unified Collective Communication, Version 1.0.0 - RC2

27 Jan 19:04
c5d3ee5

Choose a tag to compare

1.0.0

Features

API

  • Added Avg reduce operation
  • Added nonblocking team destroy option
  • Added user-defined datatype definitions
  • Added Bfloat16 type
  • Clarify semantics of core abstractions including teams and context
  • Added timeout option

Core

  • Added coll scoring and selection support
  • Added support for Triggered collectives
  • Added support for timeouts in collectives
  • Added support for team create without ep in post
  • Added support for multithreaded context progress
  • Added support for nonblocking team destroy

CL

  • Added support for hierarchical collectives
  • Added support for hierarchical allreduce collective operation
  • Added support for collectives based on one-sided communication routines

TL

  • Added SHARP TL

UCP

  • Added Bcast SAG algorithm for large messages
  • Added Knomial based reduce algorithm
  • Making allgather and alltoall agree with the API
  • Added SRA knomial allreduce algorithm
  • Added pairwise alltoall and alltoallv algorithms
  • Added allgather and allgatherv ring algorithms
  • Added support for collective operations based on one-sided semantics
  • Added support for alltoall with one-sided transfer semantics
  • Bug fixes

SHARP

  • Added support for switch-based hardware collectives (SHARP)

NCCL

  • Add support for NCCL allreduce, alltoall, alltoallv, barrier, reduce, reduce
    scatter, bcast, allgather and allgatherv

Tests

  • Updated tests to test the newly added algorithms and operations

Unified Collective Communication, Version 0.1.0 - RC1

30 Jul 20:40
f324a91

Choose a tag to compare

This is an early release of the UCC API and its implementation. Major features in this release are detailed below.

Features

API

  • UCC API to support library, contexts, teams, collective operations, execution
    engine, memory types, and triggered operations

Core

  • Added implementation for UCC abstractions - library, context, team,
    collective operations, execution engine, memory types, and triggered
    operations
  • Added support for memory types - CUDA, and CPU
  • Added support for configuring UCC library and contexts

CL

  • Added support for collectives, while the source and destination is either in
    CPU or device (GPU)
  • Added support for UCC_THREAD_MULTIPLE
  • Added support for CUDA stream-based collectives

TL

  • Added support for send/receive based collectives using UCX/UCP as a transport
    layer
  • Support for basic collectives types including barrier, alltoall, alltoallv,
    broadcast, allgather, allgatherv, allreduce was added in the UCP TL
  • Added support using NCCL as a transport layer
  • Support for collectives types including alltoall, alltoallv, allgather,
    allgatherv, allreduce, and broadcast

Tests

  • Added support for unit testing (gtest) infrastructure
  • Added support for MPI tests

Unified Collective Communication, Version 0.1.0

31 Aug 19:18
f324a91

Choose a tag to compare

This is an early release of the UCC API and its implementation. Major features in this release are detailed below.

Features

API

  • UCC API to support library, contexts, teams, collective operations, execution
    engine, memory types, and triggered operations

Core

  • Added implementation for UCC abstractions - library, context, team,
    collective operations, execution engine, memory types, and triggered
    operations
  • Added support for memory types - CUDA, and CPU
  • Added support for configuring UCC library and contexts

CL

  • Added support for collectives, while the source and destination is either in
    CPU or device (GPU)
  • Added support for UCC_THREAD_MULTIPLE
  • Added support for CUDA stream-based collectives

TL

  • Added support for send/receive based collectives using UCX/UCP as a transport
    layer
  • Support for basic collectives types including barrier, alltoall, alltoallv,
    broadcast, allgather, allgatherv, allreduce was added in the UCP TL
  • Added support using NCCL as a transport layer
  • Support for collectives types including alltoall, alltoallv, allgather,
    allgatherv, allreduce, and broadcast

Tests

  • Added support for unit testing (gtest) infrastructure
  • Added support for MPI tests