ci: Ginkgo CLI / runtime version mismatch likely causing AfterSuite timeout in unit-and-integration-tests

## Summary

The `unit-and-integration-tests` GitHub Actions job is failing on multiple recent PRs at the same point: the `pkg/controllers/workapplier` AfterSuite teardown times out after the 30s grace period. CI logs surface a Ginkgo CLI/runtime version mismatch which is the most likely cause but **not yet confirmed** — filing this so the right person can verify before changing anything.

## Observed in CI

From `unit-and-integration-tests` job logs (e.g. https://github.com/kubefleet-dev/kubefleet/actions/runs/25395689081/job/74481731598?pr=691):

```
Ginkgo detected a version mismatch between the Ginkgo CLI and the version of Ginkgo imported by your packages:
  Ginkgo CLI Version:
    2.19.1
  Mismatched package versions found:
    2.23.4 used by workapplier
```

Then later:

```
[FAILED] in [AfterSuite] - pkg/controllers/workapplier/suite_test.go:467
[FAILED] Expected success, but got an error:
    failed waiting for all runnables to end within grace period of 30s: context deadline exceeded

Ran 290 of 290 Specs in 465.500 seconds
FAIL! -- 290 Passed | 0 Failed | 0 Pending | 0 Skipped
```

All 290 specs pass — the failure is purely in suite teardown.

## Likely root cause

The repo pins **two different Ginkgo CLI versions** across workflow jobs:

```bash
$ grep "ginkgo/v2/ginkgo@v" .github/workflows/ci.yml
go install github.com/onsi/ginkgo/v2/ginkgo@v2.19.1   # unit-and-integration-tests
go install github.com/onsi/ginkgo/v2/ginkgo@v2.23.4   # other job
```

`go.mod` has `github.com/onsi/ginkgo/v2 v2.23.4`. The `@v2.19.1` install was added in Aug 2024 and never bumped when the package import was updated. The other `@v2.23.4` install was bumped at some point.

`.github/workflows/upgrade.yml` also has three `@v2.19.1` references.

## Reproduces on every PR

Recent CI runs across unrelated PRs:

| Time | PR | Result |
|---|---|---|
| 21:36:45 | configureUpdateRunThreshold | failure |
| 21:32:41 | copilot/fix-timedwait-invalid-time | failure |
| 18:48:20 | fix/override-snapshot-transition-race | failure |
| 18:04:22 | copilot/refactor-policyobservedclus | failure |
| 17:54:41 | configureUpdateRunThreshold | failure |
| 17:42:39 | copilot/fix-timedwait-invalid-time | success ← last green |

It's hardcoded in the workflow, so every PR triggering this job hits the same failure point.

## Proposed fix (needs verification)

Bump the pinned Ginkgo CLI in the unit-test job to match `go.mod`:

```diff
-          go install github.com/onsi/ginkgo/v2/ginkgo@v2.19.1
+          go install github.com/onsi/ginkgo/v2/ginkgo@v2.23.4
```

Same fix in `.github/workflows/upgrade.yml` (3 occurrences).

## What I'm NOT certain about

I'm filing this rather than sending a PR straight away because the chain "version mismatch warning → AfterSuite teardown timeout" is plausible correlation but I haven't directly proven causation. Other possible causes I haven't ruled out:

1. The warning is just noise; the real timeout has a different root cause (e.g. recent workapplier shutdown logic regression, slow envtest API-server shutdown, runner resource contention).
2. A specific change in `pkg/controllers/workapplier` that introduced a slow shutdown path coincident with the Ginkgo bump.

A quick way to verify before merging the fix:
- Check Ginkgo CHANGELOG between `v2.19.1` and `v2.23.4` for changes to AfterSuite / runnable-drain / grace-period semantics.
- See if the workapplier failures started exactly when `go.mod` bumped Ginkgo (or when 30s grace was set).
- Try the CLI bump on a test branch and watch a few CI runs.

Happy to send the fix PR if a maintainer confirms the diagnosis (or wants to land it speculatively given how broad the impact is right now).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: Ginkgo CLI / runtime version mismatch likely causing AfterSuite timeout in unit-and-integration-tests #695

Summary

Observed in CI

Likely root cause

Reproduces on every PR

Proposed fix (needs verification)

What I'm NOT certain about

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Time	PR	Result
21:36:45	configureUpdateRunThreshold	failure
21:32:41	copilot/fix-timedwait-invalid-time	failure
18:48:20	fix/override-snapshot-transition-race	failure
18:04:22	copilot/refactor-policyobservedclus	failure
17:54:41	configureUpdateRunThreshold	failure
17:42:39	copilot/fix-timedwait-invalid-time	success ← last green

Uh oh!

ci: Ginkgo CLI / runtime version mismatch likely causing AfterSuite timeout in unit-and-integration-tests #695

Description

Summary

Observed in CI

Likely root cause

Reproduces on every PR

Proposed fix (needs verification)

What I'm NOT certain about

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions