Skip to content

AZP: UCXX integration - tests + builds#11473

Open
Alexey-Rivkin wants to merge 6 commits into
openucx:masterfrom
Alexey-Rivkin:ucxx-azure-tests
Open

AZP: UCXX integration - tests + builds#11473
Alexey-Rivkin wants to merge 6 commits into
openucx:masterfrom
Alexey-Rivkin:ucxx-azure-tests

Conversation

@Alexey-Rivkin

@Alexey-Rivkin Alexey-Rivkin commented May 20, 2026

Copy link
Copy Markdown
Contributor

What?

Add UCXX_build + UCXX_tests stages to the UCX PR pipeline. Each UCX PR builds UCXX conda packages, libucxx/ucxx wheels, and docs from rapidsai/ucxx, and runs the C++ (CPU+GPU), Python (GPU), and wheel tests.

Why?

Move UCXX CI from RAPIDS GitHub Actions onto UCX's Azure pipeline (mirrors upstream pr.yaml).

How?

Two runner scripts in buildlib/tools/ (build_ucxx.sh, test_ucxx.sh) + container images wrapping rapidsai/ci-conda and rapidsai/ci-wheel. Matrix: CUDA 12 + 13 × x86_64 + aarch64.

@Alexey-Rivkin Alexey-Rivkin changed the title AZP: add UCXX_tests stage to PR pipeline (Phase 1 plumbing) AZP: add UCXX_tests stage to PR pipeline (Phase #1) May 20, 2026
@Alexey-Rivkin Alexey-Rivkin force-pushed the ucxx-azure-tests branch 17 times, most recently from 05531f8 to 1def0bd Compare May 24, 2026 15:24
@Alexey-Rivkin Alexey-Rivkin force-pushed the ucxx-azure-tests branch 3 times, most recently from 1a79d8a to dec2673 Compare May 24, 2026 19:25
@Alexey-Rivkin Alexey-Rivkin changed the title AZP: add UCXX_tests stage to PR pipeline (Phase #1) AZP: UCXX integration - tests + builds May 24, 2026
@Alexey-Rivkin Alexey-Rivkin force-pushed the ucxx-azure-tests branch 7 times, most recently from 3b30b46 to e78b89e Compare May 25, 2026 03:24
@Alexey-Rivkin Alexey-Rivkin force-pushed the ucxx-azure-tests branch 16 times, most recently from 6f59248 to 47c2eb8 Compare June 2, 2026 13:52
@Alexey-Rivkin Alexey-Rivkin marked this pull request as ready for review June 2, 2026 16:30
@Alexey-Rivkin Alexey-Rivkin force-pushed the ucxx-azure-tests branch 4 times, most recently from 0a45412 to 24623e9 Compare June 2, 2026 21:00
# Azure wrapper around rapidsai/ci-conda: chmod /opt/conda so the non-root UID Azure runs
# steps as can use conda/python (rapidsai owns it as root); + adds gdb for stack capture.

ARG BASE_IMAGE=rapidsai/ci-conda:26.06-latest

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've started, RAPIDS 26.06 was released and all ToT development is now happening for 26.08, including the images we depend on. I suggest targeting 26.08 throughout this PR as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c668a97
Thanks!

Comment thread buildlib/tools/build_ucxx.sh Outdated
Comment on lines +82 to +83
# Upstream ucxx header uses usleep() but omits <unistd.h>; undeclared on
# newer gcc. Affects all C++ phases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix rapidsai/ucxx#674 has been merged on main. If you switch to building main this should not be necessary anymore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a new tag soon?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too soon, this is why I'd prefer to target main, but Yossi has a preference for stability at this time (understandable) so we may have to wait. The next tag should occur around July 16. Maybe instead of relying on specific tags we can test and target specific commits instead, such that we can do controlled upgrades? I fear keeping an older tag may diverge from RAPIDS CI updates, as you have seen there are several aspects that need to work in tandem (CI images, CI scripts in the project such as ucxx/ci, etc.).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip! I went ahead and pinned it to the latest main SHA for now. That lets me drop both patches. Going forward, we can update the SHA or switch to a tag in a controlled manner as RAPIDS advances.
ac9f7b1

Comment thread buildlib/tools/test_ucxx.sh Outdated
Comment on lines +52 to +53
# Upstream ucxx examples header uses usleep() but omits <unistd.h>;
# undeclared on newer gcc. Same patch as build_ucxx.sh.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

UCXX tests run in rapidsai/ci-conda and ci-wheel base images. Thin
wrappers open /opt/conda and /pyenv so the Azure-injected step user
can use them, and add gdb so ucxx's timeout_with_stack.py can capture
stacks on hangs.
Pull rapidsai/ucxx as a pipeline resource and add two stages gated on
Static_check: UCXX_build (conda + wheel packages, docs, devcontainer,
checks) then UCXX_tests (conda C++/Python on the CPU + GPU matrix).
Covers x86_64 + aarch64, CUDA 12 + 13; GPU tests on amd64/cuda13.
distributed-ucxx excluded (not upstreamed).
build_ucxx.sh and test_ucxx.sh wrap UCXX's ci/*.sh entrypoints for the
Azure agents: stage rapids download shims, set the wheel toolchain, run
the conda/wheel build, C++ gtest and Python test phases. CPU slices
disable CUDA-only gtests; GPU slices force the host CUDA driver so
cuInit matches the MPS daemon. test_client_shutdown is skipped (flaky
teardown under MPS contention).
Each UCX PR must test a fixed UCXX revision; refs/heads/main drifts, so a
green run says nothing durable. Pin to a tag and bump it deliberately as new
UCXX releases are validated.
RAPIDS 26.06 shipped; ToT and the base images we wrap moved to 26.08.
Pin the rapidsai/ucxx resource to a specific main commit (33deb0b) rather
than v0.51.00a. Alpha tags are cut at code-freeze and don't pick up ongoing
main work, and an old tag drifts from RAPIDS CI updates (images, ci/ scripts)
that must move in tandem. A pinned commit stays immutable/reproducible while
letting us do controlled bumps. This commit already includes ucxx openucx#674, so
drop the local <unistd.h> patch in build_ucxx.sh + test_ucxx.sh.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP-DNM Work in progress / Do not review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants