E2E test suite for distributed workloads on RHOAI covering KFTO v1, Trainer v2, and KubeRay, plus training examples and runtime/test images. Built with Go, Python, Kubernetes, Ray, PyTorch.
tests/- E2E test suites (Go)examples/- Training examples (Ray, KFTO, etc.)images/- Runtime and test container images
tests/kfto/- KFTO v1 (PyTorchJob) teststests/fms/- fms-hf-tuning GPU fine-tuning teststests/odh/- ODH integration tests (Ray, notebooks)tests/trainer/- Kubeflow Trainer v2 teststests/common/support/- Shared test infrastructure (clients, helpers for Ray, PyTorchJob, Kueue, etc.)
images/universal/training/- Key training runtime imagestests/trainer/- Main test suite location
go test -run <TestName> -v -timeout 60m ./tests/<suite>/Example:
go test -run TestRhaiS3FsdpSharedStateCheckpointingCuda -v -timeout 60m ./tests/trainer/- Logged into OpenShift cluster with admin access
- RHOAI installed with required distributed workload components enabled
- Tests require specific env vars (assertion errors will specify missing vars with context)
See the Common environment variables section in README.md for the full env var reference.
make golangci-lint # Run golangci-lint project-wide
go vet ./... # Vet all Go code
make verify-imports # Verify import ordering
make precommit # Run all pre-commit hooksFor quick feedback on specific files instead of running project-wide:
# Go
make golangci-lint LINT_PKG=./tests/common/support/... # Lint a single Go package
go vet ./tests/common/support/... # Vet a single Go package
gofmt -w path/to/file.go # Format a single Go file
# Python
pre-commit run --files path/to/file.py # Run all hooks on a single file
See .claude/skills/add-e2e-test/SKILL.md for the full guide on writing E2E tests (namespace isolation, resource naming, cleanup, tags, notebook editing, environment variables).
See images/universal/training/README.md for instructions on updating Python dependencies in training images. Key point: dependencies come from a private AIPCC PyPI index, not public PyPI — always query the index for available versions before pinning.