Skip to content

Commit 1301710

Browse files
authored
fix(ci): parallelize gotest, cleanup output, flakiness (#11113)
* ci: parallelize gotest by separating test/cli into own job split the Go Test workflow into two parallel jobs: - `unit-tests`: runs unit tests (excluding test/cli) - `cli-tests`: runs test/cli end-to-end tests test/cli takes ~3 minutes (~50% of total gotest time), so running it in parallel should reduce wall-clock CI time by ~1.5-2.5 minutes. both jobs produce JUnit XML and HTML reports for consistent debugging. * ci(gotest): reduce noise on test timeout panics add GOTRACEBACK=single to show only one goroutine stack instead of all when a test timeout panic occurs. this makes CI output much cleaner when tests hang. * fix(ci): prevent stderr from corrupting test JSON output - remove 2>&1 which mixed "go: downloading" stderr messages into JSON - add JSON validation before parsing - print failed test names for easier debugging * ci(gotest): use gotestsum for human-readable test output - replace per-package coverage loop with single gotestsum invocation - both unit-tests and cli-tests now show human-readable output - simplified coverage collection (single -coverprofile, no gocovmerge) - clarified step names to indicate they run tests * ci: fix codecov uploads by adding token - add CODECOV_TOKEN to gotest.yml and sharness.yml - update codecov-action to v5.5.2 - add fail_ci_if_error: false for robustness codecov stopped receiving coverage data ~1 year ago when they started requiring tokens for public repos * refactor(make): add test_unit and test_cli targets - add `make test_unit` for unit tests with coverage (used by CI) - add `make test_cli` for CLI integration tests (used by CI) - only disable colors when CI env var is set (local dev gets colors) - remove legacy targets: test_go_test, test_go_short, test_go_race, test_go_expensive - update gotest.yml to use make targets instead of inline commands - add test artifacts to .gitignore * fix(ci): move client/rpc tests to cli-tests job client/rpc tests use test/cli/harness which requires the ipfs binary. Move them from test_unit to test_cli where the binary is built. also: - update gotestsum to v1.13.0 - simplify workflow step names * fix(ci): use build tags when listing test packages go list needs build tags to properly exclude packages like fuse/mfs when running with TEST_FUSE=0 (nofuse tag). * fix(ci): move test/integration to cli-tests job test/integration tests need the ipfs binary, move them from test_unit to test_cli. * fix(test): fix flaky kubo-as-a-library and GetClosestPeers tests kubo-as-a-library: use `Bootstrap()` instead of raw `Swarm().Connect()` to fix race condition between swarm connection and bitswap peer discovery. `Bootstrap()` properly integrates peers into the routing system, ensuring bitswap learns about connected peers synchronously. GetClosestPeers: simplify retry logic using `EventuallyWithT` with 10-minute timeout. tests all 4 routing types (`auto`, `autoclient`, `dht`, `dhtclient`) against real bootstrap peers with patient polling. * fix(example): use bidirectional Swarm().Connect() for reliable bitswap - connect nodes bidirectionally (A→B and B→A) to simulate mutual peering - mutual peering protects connection from resource manager culling - use port 0 for random available ports (avoids CI conflicts) - enable LoopbackAddressesOnLanDHT for local testing - move retry logic to test file using require.Eventually * fix(ci): add test_examples target and parallel example-tests job - add `make test_examples` target to mk/golang.mk for consistency with test_unit/test_cli - move example tests to separate parallel CI job (example-tests) - example: use Bootstrap() with autoconf.FallbackBootstrapPeers for reliable bitswap - example: increase context timeout to 10 minutes - test: add 60s per-request timeout to GetClosestPeers (server has 30s routing timeout) - test: reduce EventuallyWithT to 3 minutes (locally passes in under 1 minute) * fix(ci): improve test targets, exclusion patterns, and artifact naming - define COVERPKG_EXCLUDE and UNIT_EXCLUDE as documented variables - use grep -vE with single regex instead of multiple grep -v calls - add mkdir -p before rm to ensure directories exist - add DEPS_GO dependency to test_cli target - make CLI test timeout configurable via TEST_CLI_TIMEOUT (default 10m) - fix test_examples cleanup on failure using subshell - reduce GetClosestPeers test wait time from 3m to 2m - rename artifacts to match job names: unit-tests-{junit,html}, cli-tests-{junit,html} - update cli-tests upload-artifact from v5 to v6 * fix(ci): fix unit test exclusion and speed up example test - fix UNIT_EXCLUDE regex to match client/rpc at end of path - remove public bootstrap peers from example (only connect to nodeA) - example test now runs in ~3s instead of timing out * fix(test): fix flaky TestAddMultipleGCLive race condition added time.Sleep after spawning GC goroutines to ensure they reach GCLock() before the test proceeds. without this, the adder's maybePauseForGC() might check GCRequested() before GC has even requested the lock, causing the lock to not be released and GC to block indefinitely. this matches the existing pattern in TestAddGCLive which already had this sleep. also replaced context.Background() with t.Context() in both TestAddMultipleGCLive and TestAddGCLive for proper test lifecycle management. * fix(example): use test harness settings for reliable CI the kubo-as-a-library example was flaky on CI. applied test-harness-like settings that match what transports_test.go uses: - TCP-only on 127.0.0.1 with random port (no QUIC/UDP) - explicitly disable non-TCP transports (QUIC, Relay, WebTransport, etc) - use NilRouterOption (no routing) since we connect peers directly - bitswap works with directly connected peers without DHT lookups - 2-minute context timeout - streaming output in test for debugging
1 parent 55b9475 commit 1301710

File tree

13 files changed

+263
-183
lines changed

13 files changed

+263
-183
lines changed

.github/workflows/gotest.yml

Lines changed: 93 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,13 @@ concurrency:
1414
cancel-in-progress: true
1515

1616
jobs:
17-
go-test:
17+
# Unit tests with coverage collection (uploaded to Codecov)
18+
unit-tests:
1819
if: github.repository == 'ipfs/kubo' || github.event_name == 'workflow_dispatch'
1920
runs-on: ${{ fromJSON(github.repository == 'ipfs/kubo' && '["self-hosted", "linux", "x64", "2xlarge"]' || '"ubuntu-latest"') }}
20-
timeout-minutes: 20
21+
timeout-minutes: 15
2122
env:
23+
GOTRACEBACK: single # reduce noise on test timeout panics
2224
TEST_DOCKER: 0
2325
TEST_FUSE: 0
2426
TEST_VERBOSE: 1
@@ -36,41 +38,18 @@ jobs:
3638
go-version-file: 'go.mod'
3739
- name: Install missing tools
3840
run: sudo apt update && sudo apt install -y zsh
39-
- name: 👉️ If this step failed, go to «Summary» (top left) → inspect the «Failures/Errors» table
40-
env:
41-
# increasing parallelism beyond 2 doesn't speed up the tests much
42-
PARALLEL: 2
41+
- name: Run unit tests
4342
run: |
44-
make -j "$PARALLEL" test/unit/gotest.junit.xml &&
43+
make test_unit &&
4544
[[ ! $(jq -s -c 'map(select(.Action == "fail")) | .[]' test/unit/gotest.json) ]]
4645
- name: Upload coverage to Codecov
4746
uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de # v5.5.2
4847
if: failure() || success()
4948
with:
5049
name: unittests
5150
files: coverage/unit_tests.coverprofile
52-
- name: Test kubo-as-a-library example
53-
run: |
54-
# we want to first test with the kubo version in the go.mod file
55-
go test -v ./...
56-
57-
# we also want to test the examples against the current version of kubo
58-
# however, that version might be in a fork so we need to replace the dependency
59-
60-
# backup the go.mod and go.sum files to restore them after we run the tests
61-
cp go.mod go.mod.bak
62-
cp go.sum go.sum.bak
63-
64-
# make sure the examples run against the current version of kubo
65-
go mod edit -replace github.com/ipfs/kubo=./../../..
66-
go mod tidy
67-
68-
go test -v ./...
69-
70-
# restore the go.mod and go.sum files to their original state
71-
mv go.mod.bak go.mod
72-
mv go.sum.bak go.sum
73-
working-directory: docs/examples/kubo-as-a-library
51+
token: ${{ secrets.CODECOV_TOKEN }}
52+
fail_ci_if_error: false
7453
- name: Create a proper JUnit XML report
7554
uses: ipdxco/gotest-json-to-junit-xml@v1
7655
with:
@@ -80,7 +59,7 @@ jobs:
8059
- name: Archive the JUnit XML report
8160
uses: actions/upload-artifact@v6
8261
with:
83-
name: unit
62+
name: unit-tests-junit
8463
path: test/unit/gotest.junit.xml
8564
if: failure() || success()
8665
- name: Create a HTML report
@@ -93,7 +72,7 @@ jobs:
9372
- name: Archive the HTML report
9473
uses: actions/upload-artifact@v6
9574
with:
96-
name: html
75+
name: unit-tests-html
9776
path: test/unit/gotest.html
9877
if: failure() || success()
9978
- name: Create a Markdown report
@@ -106,3 +85,86 @@ jobs:
10685
- name: Set the summary
10786
run: cat test/unit/gotest.md >> $GITHUB_STEP_SUMMARY
10887
if: failure() || success()
88+
89+
# End-to-end integration/regression tests from test/cli
90+
# (Go-based replacement for legacy test/sharness shell scripts)
91+
cli-tests:
92+
if: github.repository == 'ipfs/kubo' || github.event_name == 'workflow_dispatch'
93+
runs-on: ${{ fromJSON(github.repository == 'ipfs/kubo' && '["self-hosted", "linux", "x64", "2xlarge"]' || '"ubuntu-latest"') }}
94+
timeout-minutes: 15
95+
env:
96+
GOTRACEBACK: single # reduce noise on test timeout panics
97+
TEST_VERBOSE: 1
98+
GIT_PAGER: cat
99+
IPFS_CHECK_RCMGR_DEFAULTS: 1
100+
defaults:
101+
run:
102+
shell: bash
103+
steps:
104+
- name: Check out Kubo
105+
uses: actions/checkout@v6
106+
- name: Set up Go
107+
uses: actions/setup-go@v6
108+
with:
109+
go-version-file: 'go.mod'
110+
- name: Install missing tools
111+
run: sudo apt update && sudo apt install -y zsh
112+
- name: Run CLI tests
113+
env:
114+
IPFS_PATH: ${{ runner.temp }}/ipfs-test
115+
run: make test_cli
116+
- name: Create JUnit XML report
117+
uses: ipdxco/gotest-json-to-junit-xml@v1
118+
with:
119+
input: test/cli/cli-tests.json
120+
output: test/cli/cli-tests.junit.xml
121+
if: failure() || success()
122+
- name: Archive JUnit XML report
123+
uses: actions/upload-artifact@v6
124+
with:
125+
name: cli-tests-junit
126+
path: test/cli/cli-tests.junit.xml
127+
if: failure() || success()
128+
- name: Create HTML report
129+
uses: ipdxco/junit-xml-to-html@v1
130+
with:
131+
mode: no-frames
132+
input: test/cli/cli-tests.junit.xml
133+
output: test/cli/cli-tests.html
134+
if: failure() || success()
135+
- name: Archive HTML report
136+
uses: actions/upload-artifact@v6
137+
with:
138+
name: cli-tests-html
139+
path: test/cli/cli-tests.html
140+
if: failure() || success()
141+
- name: Create Markdown report
142+
uses: ipdxco/junit-xml-to-html@v1
143+
with:
144+
mode: summary
145+
input: test/cli/cli-tests.junit.xml
146+
output: test/cli/cli-tests.md
147+
if: failure() || success()
148+
- name: Set summary
149+
run: cat test/cli/cli-tests.md >> $GITHUB_STEP_SUMMARY
150+
if: failure() || success()
151+
152+
# Example tests (kubo-as-a-library)
153+
example-tests:
154+
if: github.repository == 'ipfs/kubo' || github.event_name == 'workflow_dispatch'
155+
runs-on: ${{ fromJSON(github.repository == 'ipfs/kubo' && '["self-hosted", "linux", "x64", "2xlarge"]' || '"ubuntu-latest"') }}
156+
timeout-minutes: 5
157+
env:
158+
GOTRACEBACK: single
159+
defaults:
160+
run:
161+
shell: bash
162+
steps:
163+
- name: Check out Kubo
164+
uses: actions/checkout@v6
165+
- name: Set up Go
166+
uses: actions/setup-go@v6
167+
with:
168+
go-version-file: 'go.mod'
169+
- name: Run example tests
170+
run: make test_examples

.github/workflows/sharness.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,8 @@ jobs:
6060
with:
6161
name: sharness
6262
files: kubo/coverage/sharness_tests.coverprofile
63+
token: ${{ secrets.CODECOV_TOKEN }}
64+
fail_ci_if_error: false
6365
- name: Aggregate results
6466
run: find kubo/test/sharness/test-results -name 't*-*.sh.*.counts' | kubo/test/sharness/lib/sharness/aggregate-results.sh > kubo/test/sharness/test-results/summary.txt
6567
- name: 👉️ If this step failed, go to «Summary» (top left) → «HTML Report» → inspect the «Failures» column

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,11 @@ go-ipfs-source.tar.gz
2828
docs/examples/go-ipfs-as-a-library/example-folder/Qm*
2929
/test/sharness/t0054-dag-car-import-export-data/*.car
3030

31+
# test artifacts from make test_unit / test_cli
32+
/test/unit/gotest.json
33+
/test/unit/gotest.junit.xml
34+
/test/cli/cli-tests.json
35+
3136
# ignore build output from snapcraft
3237
/ipfs_*.snap
3338
/parts

Rules.mk

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -134,15 +134,14 @@ help:
134134
@echo ''
135135
@echo 'TESTING TARGETS:'
136136
@echo ''
137-
@echo ' test - Run all tests'
138-
@echo ' test_short - Run short go tests and short sharness tests'
139-
@echo ' test_go_short - Run short go tests'
140-
@echo ' test_go_test - Run all go tests'
137+
@echo ' test - Run all tests (test_go_fmt, test_unit, test_cli, test_sharness)'
138+
@echo ' test_short - Run fast tests (test_go_fmt, test_unit)'
139+
@echo ' test_unit - Run unit tests with coverage (excludes test/cli)'
140+
@echo ' test_cli - Run CLI integration tests (requires built binary)'
141+
@echo ' test_go_fmt - Check Go source formatting'
141142
@echo ' test_go_build - Build kubo for all platforms from .github/build-platforms.yml'
142-
@echo ' test_go_expensive - Run all go tests and build all platforms'
143-
@echo ' test_go_race - Run go tests with the race detector enabled'
144-
@echo ' test_go_lint - Run the `golangci-lint` vetting tool'
143+
@echo ' test_go_lint - Run golangci-lint'
145144
@echo ' test_sharness - Run sharness tests'
146-
@echo ' coverage - Collects coverage info from unit tests and sharness'
145+
@echo ' coverage - Collect coverage info from unit tests and sharness'
147146
@echo
148147
.PHONY: help

core/coreunix/add_test.go

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ import (
3030
const testPeerID = "QmTFauExutTsy4XP6JbMFcw2Wa9645HJt2bTqL6qYDCKfe"
3131

3232
func TestAddMultipleGCLive(t *testing.T) {
33+
ctx := t.Context()
3334
r := &repo.Mock{
3435
C: config.Config{
3536
Identity: config.Identity{
@@ -38,13 +39,13 @@ func TestAddMultipleGCLive(t *testing.T) {
3839
},
3940
D: syncds.MutexWrap(datastore.NewMapDatastore()),
4041
}
41-
node, err := core.NewNode(context.Background(), &core.BuildCfg{Repo: r})
42+
node, err := core.NewNode(ctx, &core.BuildCfg{Repo: r})
4243
if err != nil {
4344
t.Fatal(err)
4445
}
4546

4647
out := make(chan interface{}, 10)
47-
adder, err := NewAdder(context.Background(), node.Pinning, node.Blockstore, node.DAG)
48+
adder, err := NewAdder(ctx, node.Pinning, node.Blockstore, node.DAG)
4849
if err != nil {
4950
t.Fatal(err)
5051
}
@@ -67,7 +68,7 @@ func TestAddMultipleGCLive(t *testing.T) {
6768

6869
go func() {
6970
defer close(out)
70-
_, _ = adder.AddAllAndPin(context.Background(), slf)
71+
_, _ = adder.AddAllAndPin(ctx, slf)
7172
// Ignore errors for clarity - the real bug would be gc'ing files while adding them, not this resultant error
7273
}()
7374

@@ -80,9 +81,12 @@ func TestAddMultipleGCLive(t *testing.T) {
8081
gc1started := make(chan struct{})
8182
go func() {
8283
defer close(gc1started)
83-
gc1out = gc.GC(context.Background(), node.Blockstore, node.Repo.Datastore(), node.Pinning, nil)
84+
gc1out = gc.GC(ctx, node.Blockstore, node.Repo.Datastore(), node.Pinning, nil)
8485
}()
8586

87+
// Give GC goroutine time to reach GCLock (will block there waiting for adder)
88+
time.Sleep(time.Millisecond * 100)
89+
8690
// GC shouldn't get the lock until after the file is completely added
8791
select {
8892
case <-gc1started:
@@ -119,9 +123,12 @@ func TestAddMultipleGCLive(t *testing.T) {
119123
gc2started := make(chan struct{})
120124
go func() {
121125
defer close(gc2started)
122-
gc2out = gc.GC(context.Background(), node.Blockstore, node.Repo.Datastore(), node.Pinning, nil)
126+
gc2out = gc.GC(ctx, node.Blockstore, node.Repo.Datastore(), node.Pinning, nil)
123127
}()
124128

129+
// Give GC goroutine time to reach GCLock
130+
time.Sleep(time.Millisecond * 100)
131+
125132
select {
126133
case <-gc2started:
127134
t.Fatal("gc shouldn't have started yet")
@@ -155,6 +162,7 @@ func TestAddMultipleGCLive(t *testing.T) {
155162
}
156163

157164
func TestAddGCLive(t *testing.T) {
165+
ctx := t.Context()
158166
r := &repo.Mock{
159167
C: config.Config{
160168
Identity: config.Identity{
@@ -163,13 +171,13 @@ func TestAddGCLive(t *testing.T) {
163171
},
164172
D: syncds.MutexWrap(datastore.NewMapDatastore()),
165173
}
166-
node, err := core.NewNode(context.Background(), &core.BuildCfg{Repo: r})
174+
node, err := core.NewNode(ctx, &core.BuildCfg{Repo: r})
167175
if err != nil {
168176
t.Fatal(err)
169177
}
170178

171179
out := make(chan interface{})
172-
adder, err := NewAdder(context.Background(), node.Pinning, node.Blockstore, node.DAG)
180+
adder, err := NewAdder(ctx, node.Pinning, node.Blockstore, node.DAG)
173181
if err != nil {
174182
t.Fatal(err)
175183
}
@@ -193,7 +201,7 @@ func TestAddGCLive(t *testing.T) {
193201
go func() {
194202
defer close(addDone)
195203
defer close(out)
196-
_, err := adder.AddAllAndPin(context.Background(), slf)
204+
_, err := adder.AddAllAndPin(ctx, slf)
197205
if err != nil {
198206
t.Error(err)
199207
}
@@ -211,7 +219,7 @@ func TestAddGCLive(t *testing.T) {
211219
gcstarted := make(chan struct{})
212220
go func() {
213221
defer close(gcstarted)
214-
gcout = gc.GC(context.Background(), node.Blockstore, node.Repo.Datastore(), node.Pinning, nil)
222+
gcout = gc.GC(ctx, node.Blockstore, node.Repo.Datastore(), node.Pinning, nil)
215223
}()
216224

217225
// gc shouldn't start until we let the add finish its current file.
@@ -255,9 +263,6 @@ func TestAddGCLive(t *testing.T) {
255263
last = c
256264
}
257265

258-
ctx, cancel := context.WithTimeout(context.Background(), time.Second*5)
259-
defer cancel()
260-
261266
set := cid.NewSet()
262267
err = dag.Walk(ctx, dag.GetLinksWithDAG(node.DAG), last, set.Visit)
263268
if err != nil {

coverage/Rules.mk

Lines changed: 4 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,14 @@ include mk/header.mk
33
GOCC ?= go
44

55
$(d)/coverage_deps: $$(DEPS_GO) cmd/ipfs/ipfs
6-
rm -rf $(@D)/unitcover && mkdir $(@D)/unitcover
76
rm -rf $(@D)/sharnesscover && mkdir $(@D)/sharnesscover
87

9-
ifneq ($(IPFS_SKIP_COVER_BINS),1)
10-
$(d)/coverage_deps: test/bin/gocovmerge
11-
endif
12-
138
.PHONY: $(d)/coverage_deps
149

15-
# unit tests coverage
16-
UTESTS_$(d) := $(shell $(GOCC) list -f '{{if (or (len .TestGoFiles) (len .XTestGoFiles))}}{{.ImportPath}}{{end}}' $(go-flags-with-tags) ./... | grep -v go-ipfs/vendor | grep -v go-ipfs/Godeps)
17-
18-
UCOVER_$(d) := $(addsuffix .coverprofile,$(addprefix $(d)/unitcover/, $(subst /,_,$(UTESTS_$(d)))))
19-
20-
$(UCOVER_$(d)): $(d)/coverage_deps ALWAYS
21-
$(eval TMP_PKG := $(subst _,/,$(basename $(@F))))
22-
$(eval TMP_DEPS := $(shell $(GOCC) list -f '{{range .Deps}}{{.}} {{end}}' $(go-flags-with-tags) $(TMP_PKG) | sed 's/ /\n/g' | grep ipfs/go-ipfs) $(TMP_PKG))
23-
$(eval TMP_DEPS_LIST := $(call join-with,$(comma),$(TMP_DEPS)))
24-
$(GOCC) test $(go-flags-with-tags) $(GOTFLAGS) -v -covermode=atomic -json -coverpkg=$(TMP_DEPS_LIST) -coverprofile=$@ $(TMP_PKG) | tee -a test/unit/gotest.json
25-
26-
27-
$(d)/unit_tests.coverprofile: $(UCOVER_$(d))
28-
gocovmerge $^ > $@
29-
30-
TGTS_$(d) := $(d)/unit_tests.coverprofile
10+
# unit tests coverage is now produced by test_unit target in mk/golang.mk
11+
# (outputs coverage/unit_tests.coverprofile and test/unit/gotest.json)
3112

32-
.PHONY: $(d)/unit_tests.coverprofile
13+
TGTS_$(d) :=
3314

3415
# sharness tests coverage
3516
$(d)/ipfs: GOTAGS += testrunmain
@@ -46,7 +27,7 @@ endif
4627
export IPFS_COVER_DIR:= $(realpath $(d))/sharnesscover/
4728

4829
$(d)/sharness_tests.coverprofile: export TEST_PLUGIN=0
49-
$(d)/sharness_tests.coverprofile: $(d)/ipfs cmd/ipfs/ipfs-test-cover $(d)/coverage_deps test_sharness
30+
$(d)/sharness_tests.coverprofile: $(d)/ipfs cmd/ipfs/ipfs-test-cover $(d)/coverage_deps test/bin/gocovmerge test_sharness
5031
(cd $(@D)/sharnesscover && find . -type f | gocovmerge -list -) > $@
5132

5233

0 commit comments

Comments
 (0)