Skip to content

Commit e0f08bd

Browse files
authored
PENT-103-part-1: refactor integration test so it works on clusterd org (#336)
1 parent 8eb53d6 commit e0f08bd

File tree

8 files changed

+185
-74
lines changed

8 files changed

+185
-74
lines changed

.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,6 @@ Brewfile.lock.json
44
.vscode
55

66
dist/
7+
8+
# For all glorious direnv users.
9+
.envrc

DEVELOPMENT.md

+75-14
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,72 @@ just --list
1616

1717
# Integration Tests
1818

19+
## Architecture
20+
21+
Agent Stack K8s integration tests depend on a running Buildkite instance. By default, they use the production Buildkite.
22+
23+
```mermaid
24+
flowchart LR
25+
c((Controller)) -->|create jobs| K
26+
Buildkite <-->|Pull jobs| c
27+
subgraph K8s cluster
28+
K(Kube API)
29+
end
30+
```
31+
32+
During test run, the test suites:
33+
1. create ephemeral pipelines and queues for a given [Buildkite Agent Cluster](https://buildkite.com/docs/clusters/overview).
34+
2. Run executor, which will monitor jobs from the target queue in target Buildkite Cluster,
35+
starts new Jobs in a Kubernetes cluster.
36+
3. Test suite will clean up those ephemeral objects in the end.
37+
38+
To run integration test locally, we recommend you to run individual test. For example,
39+
40+
```bash
41+
just test -run TestWalkingSkeleton
42+
```
43+
1944
## Setup
20-
For running the integration tests you'll need to add some additional scopes to your Buildkite API token:
45+
46+
Any member of the public should be able to run our integration as long as you are an user of Buildkite, and you have
47+
access to a Kubernetes cluster.
48+
49+
Concretely, to get the integration test running locally, you will need:
50+
1. A valid Buildkite API token (presuming you are a customer of Buildkite).
51+
2. A valid Buildkite Agent Token in your target Buildkite Cluster. The agent token needs to be installed in your K8s
52+
cluster.
53+
3. Your organization name in Buildkite and your target Buildkite Cluster UUID.
54+
4. Depending on test cases, you may also need a SSH keys, please read below.
55+
5. Your shell environment will need CLI write access to a k8s cluster.
56+
57+
### Use environment variables
58+
59+
We found it's convenient to supply API token, organization name, and cluster UUID as environment variables.
60+
61+
```bash
62+
export BUILDKITE_TOKEN="bkua_**************"
63+
export ORG="your-cool-org-slug"
64+
export CLUSTER_UUID="UUID-UUID-UUID-UUID"
65+
```
66+
67+
### Token Scopes
68+
69+
Required Buildkite API token scopes:
2170

2271
- `read_artifacts`
2372
- `read_build_logs`
2473
- `write_pipelines`
74+
- `write_clusters`
75+
76+
### Install Agent Token
77+
78+
Agent token is used by the k8s jobs instead of controller, so:
79+
80+
```bash
81+
kubectl create secret generic buildkite-agent-token --from-literal=BUILDKITE_AGENT_TOKEN=my-agent-token
82+
```
83+
84+
### SSH secret
2585

2686
You'll also need to create an SSH secret in your cluster to run [this test pipeline](internal/integration/fixtures/secretref.yaml). This SSH key needs to be associated with your GitHub account to be able to clone this public repo, and must be in a form acceptable to OpenSSH (aka `BEGIN OPENSSH PRIVATE KEY`, not `BEGIN PRIVATE KEY`).
2787

@@ -34,13 +94,16 @@ The integration tests on the [`kubernetes-agent-stack`](https://buildkite.com/bu
3494

3595

3696
## Cleanup
37-
These will be deleted automatically for successful tests, but for unsuccessful tests, then will remain after then end of the test job to allow you to debug them.
38-
However, this means they should be cleaned up manually. To do this run
97+
98+
In general, pipelines and queues will be deleted automatically for successful tests, but for unsuccessful tests, then will remain after then end of the test job to allow you to debug them.
99+
100+
To do clean them up:
101+
39102
```bash
40-
CLEANUP_PIPELINES=true just cleanup-orphans --org=buildkite-kubernetes-stack --buildkite-token=<buildkite-api-token>
103+
just cleanup-orphans
41104
```
42105

43-
The token will need to have graphql access as well as:
106+
The token will need to have GraphQL access as well as:
44107
- `read_artifacts`
45108
- `write_pipelines`
46109

@@ -50,19 +113,17 @@ To clean these out you should run the following in a kubernetes context in the n
50113
kubectl get -o jsonpath='{.items[*].metadata.name}' jobs | xargs -L1 kubectl delete job
51114
```
52115

53-
At the time of writing, the CI pipeline is run in an EKS cluster, `agent-stack-k8s-ci` in the `buildkite-agent` AWS account.
54-
The controller is deployed to the `buildkite` namespace in that cluster.
55-
See https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html for how to obtain a kubeconfig for an EKS cluster.
116+
## CI ❤️ Integration Test
56117

57-
# Run from source
118+
At the time of writing, the CI pipeline run in an EKS cluster, `agent-stack-k8s-ci` in the `buildkite-agent` AWS account.
119+
CI deployes the controller onto `buildkite` namespace in that cluster.
58120

59-
First store the agent token in a Kubernetes secret:
121+
# Run from source
60122

61-
```bash!
62-
kubectl create secret generic buildkite-agent-token --from-literal=BUILDKITE_AGENT_TOKEN=my-agent-token
63-
```
123+
Running from the source can be useful for debugging purpose, you will generally need to meet the same requirement of
124+
running a integration test.
64125

65-
Next start the controller:
126+
In this case, you can choose to supply some inputs via CLI parameters instead of environment variable.
66127

67128
```bash!
68129
just run --org my-org --buildkite-token my-api-token --debug

internal/controller/config/config.go

+13-11
Original file line numberDiff line numberDiff line change
@@ -22,17 +22,19 @@ var DefaultAgentImage = "ghcr.io/buildkite/agent:" + version.Version()
2222
// mapstructure (the module) supports switching the struct tag to "json", viper does not. So we have
2323
// to have the `mapstructure` tag for viper and the `json` tag is used by the mapstructure!
2424
type Config struct {
25-
Debug bool `json:"debug"`
26-
JobTTL time.Duration `json:"job-ttl"`
27-
PollInterval time.Duration `json:"poll-interval"`
28-
AgentTokenSecret string `json:"agent-token-secret" validate:"required"`
29-
BuildkiteToken string `json:"buildkite-token" validate:"required"`
30-
Image string `json:"image" validate:"required"`
31-
MaxInFlight int `json:"max-in-flight" validate:"min=0"`
32-
Namespace string `json:"namespace" validate:"required"`
33-
Org string `json:"org" validate:"required"`
34-
Tags stringSlice `json:"tags" validate:"min=1"`
35-
ProfilerAddress string `json:"profiler-address" validate:"omitempty,hostname_port"`
25+
Debug bool `json:"debug"`
26+
JobTTL time.Duration `json:"job-ttl"`
27+
PollInterval time.Duration `json:"poll-interval"`
28+
AgentTokenSecret string `json:"agent-token-secret" validate:"required"`
29+
BuildkiteToken string `json:"buildkite-token" validate:"required"`
30+
Image string `json:"image" validate:"required"`
31+
MaxInFlight int `json:"max-in-flight" validate:"min=0"`
32+
Namespace string `json:"namespace" validate:"required"`
33+
Org string `json:"org" validate:"required"`
34+
Tags stringSlice `json:"tags" validate:"min=1"`
35+
ProfilerAddress string `json:"profiler-address" validate:"omitempty,hostname_port"`
36+
// This field is mandatory for most new orgs.
37+
// Some old orgs allows unclustered setup.
3638
ClusterUUID string `json:"cluster-uuid" validate:"omitempty"`
3739
AdditionalRedactedVars stringSlice `json:"additional-redacted-vars" validate:"omitempty"`
3840
PodSpecPatch *corev1.PodSpec `json:"pod-spec-patch" validate:"omitempty"`

internal/integration/integration_test.go

+19-38
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,7 @@ func TestWalkingSkeleton(t *testing.T) {
2222
GraphQL: api.NewClient(cfg.BuildkiteToken),
2323
}.Init()
2424
ctx := context.Background()
25-
pipelineID, cleanup := tc.CreatePipeline(ctx)
26-
t.Cleanup(cleanup)
25+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
2726
tc.StartController(ctx, cfg)
2827
build := tc.TriggerBuild(ctx, pipelineID)
2928
tc.AssertSuccess(ctx, build)
@@ -44,8 +43,7 @@ func TestPodSpecPatchInStep(t *testing.T) {
4443
GraphQL: api.NewClient(cfg.BuildkiteToken),
4544
}.Init()
4645
ctx := context.Background()
47-
pipelineID, cleanup := tc.CreatePipeline(ctx)
48-
t.Cleanup(cleanup)
46+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
4947
tc.StartController(ctx, cfg)
5048
build := tc.TriggerBuild(ctx, pipelineID)
5149

@@ -62,8 +60,7 @@ func TestPodSpecPatchInStepFailsWhenPatchingContainerCommands(t *testing.T) {
6260
}.Init()
6361

6462
ctx := context.Background()
65-
pipelineID, cleanup := tc.CreatePipeline(ctx)
66-
t.Cleanup(cleanup)
63+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
6764

6865
tc.StartController(ctx, cfg)
6966
build := tc.TriggerBuild(ctx, pipelineID)
@@ -80,8 +77,7 @@ func TestPodSpecPatchInController(t *testing.T) {
8077
GraphQL: api.NewClient(cfg.BuildkiteToken),
8178
}.Init()
8279
ctx := context.Background()
83-
pipelineID, cleanup := tc.CreatePipeline(ctx)
84-
t.Cleanup(cleanup)
80+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
8581
cfg := cfg
8682
cfg.PodSpecPatch = &corev1.PodSpec{
8783
Containers: []corev1.Container{
@@ -113,8 +109,7 @@ func TestControllerPicksUpJobsWithSubsetOfAgentTags(t *testing.T) {
113109
}.Init()
114110

115111
ctx := context.Background()
116-
pipelineID, cleanup := tc.CreatePipeline(ctx)
117-
t.Cleanup(cleanup)
112+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
118113

119114
cfg := cfg
120115
cfg.Tags = append(cfg.Tags, "foo=bar") // job has queue=<something>, agent has queue=<something> and foo=bar
@@ -133,8 +128,7 @@ func TestControllerSetsAdditionalRedactedVars(t *testing.T) {
133128
}.Init()
134129

135130
ctx := context.Background()
136-
pipelineID, cleanup := tc.CreatePipeline(ctx)
137-
t.Cleanup(cleanup)
131+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
138132

139133
cfg := cfg
140134
cfg.AdditionalRedactedVars = []string{"ELEVEN_HERBS_AND_SPICES"}
@@ -157,8 +151,7 @@ func TestPrePostCheckoutHooksRun(t *testing.T) {
157151
}.Init()
158152

159153
ctx := context.Background()
160-
pipelineID, cleanup := tc.CreatePipeline(ctx)
161-
t.Cleanup(cleanup)
154+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
162155

163156
tc.StartController(ctx, cfg)
164157
build := tc.TriggerBuild(ctx, pipelineID)
@@ -176,8 +169,7 @@ func TestChown(t *testing.T) {
176169
GraphQL: api.NewClient(cfg.BuildkiteToken),
177170
}.Init()
178171
ctx := context.Background()
179-
pipelineID, cleanup := tc.CreatePipeline(ctx)
180-
t.Cleanup(cleanup)
172+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
181173
tc.StartController(ctx, cfg)
182174
build := tc.TriggerBuild(ctx, pipelineID)
183175
tc.AssertSuccess(ctx, build)
@@ -198,8 +190,7 @@ func TestSSHRepoClone(t *testing.T) {
198190
Get(ctx, "agent-stack-k8s", metav1.GetOptions{})
199191
require.NoError(t, err, "agent-stack-k8s secret must exist")
200192

201-
pipelineID, cleanup := tc.CreatePipeline(ctx)
202-
t.Cleanup(cleanup)
193+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
203194
tc.StartController(ctx, cfg)
204195
build := tc.TriggerBuild(ctx, pipelineID)
205196
tc.AssertSuccess(ctx, build)
@@ -215,8 +206,7 @@ func TestPluginCloneFailsTests(t *testing.T) {
215206

216207
ctx := context.Background()
217208

218-
pipelineID, cleanup := tc.CreatePipeline(ctx)
219-
t.Cleanup(cleanup)
209+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
220210
tc.StartController(ctx, cfg)
221211
build := tc.TriggerBuild(ctx, pipelineID)
222212
tc.AssertFail(ctx, build)
@@ -232,8 +222,7 @@ func TestMaxInFlightLimited(t *testing.T) {
232222

233223
ctx := context.Background()
234224

235-
pipelineID, cleanup := tc.CreatePipeline(ctx)
236-
t.Cleanup(cleanup)
225+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
237226
cfg := cfg
238227
cfg.MaxInFlight = 1
239228
tc.StartController(ctx, cfg)
@@ -271,8 +260,7 @@ func TestMaxInFlightUnlimited(t *testing.T) {
271260

272261
ctx := context.Background()
273262

274-
pipelineID, cleanup := tc.CreatePipeline(ctx)
275-
t.Cleanup(cleanup)
263+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
276264
cfg := cfg
277265
cfg.MaxInFlight = 0
278266
tc.StartController(ctx, cfg)
@@ -315,8 +303,7 @@ func TestSidecars(t *testing.T) {
315303
GraphQL: api.NewClient(cfg.BuildkiteToken),
316304
}.Init()
317305
ctx := context.Background()
318-
pipelineID, cleanup := tc.CreatePipeline(ctx)
319-
t.Cleanup(cleanup)
306+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
320307
tc.StartController(ctx, cfg)
321308
build := tc.TriggerBuild(ctx, pipelineID)
322309
tc.AssertSuccess(ctx, build)
@@ -331,8 +318,7 @@ func TestExtraVolumeMounts(t *testing.T) {
331318
GraphQL: api.NewClient(cfg.BuildkiteToken),
332319
}.Init()
333320
ctx := context.Background()
334-
pipelineID, cleanup := tc.CreatePipeline(ctx)
335-
t.Cleanup(cleanup)
321+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
336322
tc.StartController(ctx, cfg)
337323
build := tc.TriggerBuild(ctx, pipelineID)
338324
tc.AssertSuccess(ctx, build)
@@ -346,8 +332,7 @@ func TestInvalidPodSpec(t *testing.T) {
346332
GraphQL: api.NewClient(cfg.BuildkiteToken),
347333
}.Init()
348334
ctx := context.Background()
349-
pipelineID, cleanup := tc.CreatePipeline(ctx)
350-
t.Cleanup(cleanup)
335+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
351336
tc.StartController(ctx, cfg)
352337
build := tc.TriggerBuild(ctx, pipelineID)
353338
tc.AssertFail(ctx, build)
@@ -365,8 +350,7 @@ func TestInvalidPodJSON(t *testing.T) {
365350
GraphQL: api.NewClient(cfg.BuildkiteToken),
366351
}.Init()
367352
ctx := context.Background()
368-
pipelineID, cleanup := tc.CreatePipeline(ctx)
369-
t.Cleanup(cleanup)
353+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
370354
tc.StartController(ctx, cfg)
371355
build := tc.TriggerBuild(ctx, pipelineID)
372356
tc.AssertFail(ctx, build)
@@ -384,8 +368,7 @@ func TestEnvVariables(t *testing.T) {
384368
GraphQL: api.NewClient(cfg.BuildkiteToken),
385369
}.Init()
386370
ctx := context.Background()
387-
pipelineID, cleanup := tc.CreatePipeline(ctx)
388-
t.Cleanup(cleanup)
371+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
389372
tc.StartController(ctx, cfg)
390373
build := tc.TriggerBuild(ctx, pipelineID)
391374
tc.AssertSuccess(ctx, build)
@@ -400,8 +383,7 @@ func TestImagePullBackOffCancelled(t *testing.T) {
400383
GraphQL: api.NewClient(cfg.BuildkiteToken),
401384
}.Init()
402385
ctx := context.Background()
403-
pipelineID, cleanup := tc.CreatePipeline(ctx)
404-
t.Cleanup(cleanup)
386+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
405387
tc.StartController(ctx, cfg)
406388
build := tc.TriggerBuild(ctx, pipelineID)
407389
tc.AssertFail(ctx, build)
@@ -416,8 +398,7 @@ func TestArtifactsUploadFailedJobs(t *testing.T) {
416398
GraphQL: api.NewClient(cfg.BuildkiteToken),
417399
}.Init()
418400
ctx := context.Background()
419-
pipelineID, cleanup := tc.CreatePipeline(ctx)
420-
t.Cleanup(cleanup)
401+
pipelineID := tc.PrepareQueueAndPipelineWithCleanup(ctx)
421402
tc.StartController(ctx, cfg)
422403
build := tc.TriggerBuild(ctx, pipelineID)
423404
tc.AssertFail(ctx, build)

internal/integration/interrupt_test.go

+1
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ func CleanupOnInterrupt(cleanup func()) {
4040

4141
// EnsureCleanup will run the provided cleanup function when the test ends,
4242
// either via t.Cleanup or on interrupt via CleanupOnInterrupt.
43+
// But this can't cover test timeout case.
4344
func EnsureCleanup(t *testing.T, cleanup func()) {
4445
t.Cleanup(cleanup)
4546
CleanupOnInterrupt(cleanup)

internal/integration/main_test.go

+5-3
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,11 @@ const (
2020
)
2121

2222
var (
23-
branch string
24-
cfg config.Config
25-
cleanupPipelines bool
23+
branch string
24+
cfg config.Config
25+
cleanupPipelines bool
26+
// Preserve pipelines even if the test passses.
27+
// By default, failed pipeline will always be kept.
2628
preservePipelines bool
2729

2830
//go:embed fixtures/*

0 commit comments

Comments
 (0)