Skip to content

Commit 5b41985

Browse files
authored
e2e/features/tracing: reduce flakiness of entire suite, capture technical debt (#10409)
1 parent 3bb702e commit 5b41985

File tree

6 files changed

+110
-23
lines changed

6 files changed

+110
-23
lines changed
+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
changelog:
2+
- type: NON_USER_FACING
3+
issueLink: https://github.com/solo-io/gloo/issues/10365
4+
resolvesIssue: false
5+
description: >-
6+
Remove the chance of a flake occurring due to an UpstreamNotFound error
7+
- type: NON_USER_FACING
8+
issueLink: https://github.com/k8sgateway/k8sgateway/issues/10327
9+
resolvesIssue: false
10+
description: >-
11+
Remove the chance of a flake occurring due to an UpstreamNotFound error
12+
- type: NON_USER_FACING
13+
issueLink: https://github.com/k8sgateway/k8sgateway/issues/10293
14+
resolvesIssue: false
15+
description: >-
16+
Mask a product UX issue by enabling some retry logic in a test. The attached
17+
issue should resolve the bad UX -and- remove the retry logic introduced here.

test/kubernetes/e2e/debugging.md

+28-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,34 @@ The entry point for an e2e test is a Go test function of the form `func TestXyz(
88

99
Each feature suite is invoked as a subtest of the top level suite. The subtests use [testify](https://github.com/stretchr/testify) to structure the tests in the feature's test suite and make use of the library's assertions.
1010

11-
## Workflows
11+
## Step 1: Setting Up A Cluster
12+
### Using a previously released version
13+
It is possible to run these tests against a previously released version of Gloo Gateway. This is useful for testing a release candidate, or a nightly build.
14+
15+
There is no setup required for this option, as the test suite will download the helm chart archive and `glooctl` binary from the specified release. You will use the `RELEASED_VERSION` environment variable when running the tests. See the [variable definition](/test/testutils/env.go) for more details.
16+
17+
### Using a locally built version
18+
For these tests to run, we require the following conditions:
19+
- Gloo Gateway Helm chart archive is present in the `_test` folder,
20+
- `glooctl` is built in the `_output` folder
21+
- A KinD cluster is set up and loaded with the images to be installed by the helm chart
22+
23+
[ci/kind/setup-kind.sh](/ci/kind/setup-kind.sh) gets run in CI to setup the test environment for the above requirements.
24+
It accepts a number of environment variables, to control the creation of a kind cluster and deployment of Gloo resources to that kind cluster. Please refer to the script itself to see what variables are available.
25+
26+
Example:
27+
```bash
28+
CLUSTER_NAME=solo-test-cluster CLUSTER_NODE_VERSION=v1.30.0 VERSION=v1.0.0-solo-test ci/kind/setup-kind.sh
29+
```
30+
31+
## Step 2: Running Tests
32+
_To run the regression tests, your kubeconfig file must point to a running Kubernetes cluster:_
33+
```
34+
kubectl config current-context`
35+
```
36+
_should run `kind-<CLUSTER_NAME>`_
37+
38+
> Note: If you are running tests against a previously released version, you must set RELEASED_VERSION when invoking the tests
1239
1340
### Running a single feature's suite
1441

test/kubernetes/e2e/features/tracing/suite.go

+41-3
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,20 @@ func (s *testingSuite) SetupSuite() {
7575
LabelSelector: "app.kubernetes.io/name=http-echo",
7676
},
7777
)
78+
79+
// Previously, we would create/delete the Service for each test. However, this would occasionally lead to:
80+
// * Hostname gateway-proxy-tracing.gloo-gateway-edge-test.svc.cluster.local was found in DNS cache
81+
//* Trying 10.96.181.139:18080...
82+
//* Connection timed out after 3001 milliseconds
83+
//
84+
// The suspicion is that the rotation of the Service meant that the DNS cache became out of date,
85+
// and we would curl the old IP.
86+
// The workaround to that is to create the service just once at the beginning of the suite.
87+
// This mirrors how Services are typically managed in Gloo Gateway, where they are tied
88+
// to an installation, and not dynamically updated
89+
err = s.testInstallation.Actions.Kubectl().ApplyFile(s.ctx, gatewayProxyServiceManifest,
90+
"-n", s.testInstallation.Metadata.InstallNamespace)
91+
s.NoError(err, "can apply service/gateway-proxy-tracing")
7892
}
7993

8094
func (s *testingSuite) TearDownSuite() {
@@ -85,6 +99,10 @@ func (s *testingSuite) TearDownSuite() {
8599

86100
err = s.testInstallation.Actions.Kubectl().DeleteFile(s.ctx, testdefaults.HttpEchoPodManifest)
87101
s.Assertions.NoError(err, "can delete echo server")
102+
103+
err = s.testInstallation.Actions.Kubectl().DeleteFile(s.ctx, gatewayProxyServiceManifest,
104+
"-n", s.testInstallation.Metadata.InstallNamespace)
105+
s.NoError(err, "can delete service/gateway-proxy-tracing")
88106
}
89107

90108
func (s *testingSuite) BeforeTest(string, string) {
@@ -98,8 +116,19 @@ func (s *testingSuite) BeforeTest(string, string) {
98116
otelcolSelector,
99117
)
100118

101-
err = s.testInstallation.Actions.Kubectl().ApplyFile(s.ctx, tracingConfigManifest)
102-
s.NoError(err, "can apply gloo tracing resources")
119+
// Technical Debt!!
120+
// https://github.com/k8sgateway/k8sgateway/issues/10293
121+
// There is a bug in the Control Plane that results in an Error reported on the status
122+
// when the Upstream of the Tracing Collector is not found. This results in the VirtualService
123+
// that references that Upstream being rejected. What should occur is a Warning is reported,
124+
// and the resource is accepted since validation.allowWarnings=true is set.
125+
// We have plans to fix this in the code itself. But for a short-term solution, to reduce the
126+
// noise in CI/CD of this test flaking, we perform some simple retry logic here.
127+
s.EventuallyWithT(func(c *assert.CollectT) {
128+
err = s.testInstallation.Actions.Kubectl().ApplyFile(s.ctx, tracingConfigManifest)
129+
assert.NoError(c, err, "can apply gloo tracing resources")
130+
}, time.Second*5, time.Second*1, "can apply tracing resources")
131+
103132
// accept the upstream
104133
// Upstreams no longer report status if they have not been translated at all to avoid conflicting with
105134
// other syncers that have translated them, so we can only detect that the objects exist here
@@ -109,6 +138,7 @@ func (s *testingSuite) BeforeTest(string, string) {
109138
otelcolUpstream.Namespace, otelcolUpstream.Name, clients.ReadOpts{Ctx: s.ctx})
110139
},
111140
)
141+
112142
// accept the virtual service
113143
s.testInstallation.Assertions.EventuallyResourceStatusMatchesState(
114144
func() (resources.InputResource, error) {
@@ -142,7 +172,7 @@ func (s *testingSuite) AfterTest(string, string) {
142172

143173
err = s.testInstallation.Actions.Kubectl().DeleteFile(s.ctx, gatewayConfigManifest,
144174
"-n", s.testInstallation.Metadata.InstallNamespace)
145-
s.Assertions.NoError(err, "can delete gloo tracing config")
175+
s.Assertions.NoError(err, "can delete gateway config")
146176
}
147177

148178
func (s *testingSuite) TestSpanNameTransformationsWithoutRouteDecorator() {
@@ -156,6 +186,10 @@ func (s *testingSuite) TestSpanNameTransformationsWithoutRouteDecorator() {
156186
curl.WithHostHeader(testHostname),
157187
curl.WithPort(gatewayProxyPort),
158188
curl.WithPath(pathWithoutRouteDescriptor),
189+
// We are asserting that a request is consistent. To prevent flakes with that assertion,
190+
// we should have some basic retries built into the request
191+
curl.WithRetryConnectionRefused(true),
192+
curl.WithRetries(3, 0, 10),
159193
curl.Silent(),
160194
},
161195
&matchers.HttpResponse{
@@ -183,6 +217,10 @@ func (s *testingSuite) TestSpanNameTransformationsWithRouteDecorator() {
183217
curl.WithHostHeader("example.com"),
184218
curl.WithPort(gatewayProxyPort),
185219
curl.WithPath(pathWithRouteDescriptor),
220+
// We are asserting that a request is consistent. To prevent flakes with that assertion,
221+
// we should have some basic retries built into the request
222+
curl.WithRetryConnectionRefused(true),
223+
curl.WithRetries(3, 0, 10),
186224
curl.Silent(),
187225
},
188226
&matchers.HttpResponse{

test/kubernetes/e2e/features/tracing/testdata/gateway.yaml

-16
Original file line numberDiff line numberDiff line change
@@ -23,20 +23,4 @@ spec:
2323
collectorUpstreamRef:
2424
name: opentelemetry-collector
2525
namespace: default
26-
---
27-
apiVersion: v1
28-
kind: Service
29-
metadata:
30-
name: gateway-proxy-tracing
31-
labels:
32-
app.kubernetes.io/name: gateway-proxy-tracing-service
33-
spec:
34-
ports:
35-
- name: gateway-proxy-tracing
36-
port: 18080
37-
protocol: TCP
38-
targetPort: 18080
39-
selector:
40-
gateway-proxy: live
41-
gateway-proxy-id: gateway-proxy
4226

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
apiVersion: v1
2+
kind: Service
3+
metadata:
4+
name: gateway-proxy-tracing
5+
labels:
6+
app.kubernetes.io/name: gateway-proxy-tracing-service
7+
spec:
8+
type: LoadBalancer
9+
ports:
10+
# This service exposes the Port 18080, used by the Gateway defined in ./gateway.yaml
11+
- name: gateway-proxy-tracing
12+
port: 18080
13+
protocol: TCP
14+
targetPort: 18080
15+
# This selector is meant to match the Selector of the deployed gateway-proxy Service
16+
# We intend to route traffic to the gateway-proxy pod(s) that are deployed at install time
17+
selector:
18+
gateway-proxy-id: gateway-proxy
19+
gateway-proxy: live
20+
---

test/kubernetes/e2e/features/tracing/types.go

+4-3
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,10 @@ const (
1717
)
1818

1919
var (
20-
setupOtelcolManifest = filepath.Join(util.MustGetThisDir(), "testdata", "setup-otelcol.yaml")
21-
tracingConfigManifest = filepath.Join(util.MustGetThisDir(), "testdata", "tracing.yaml")
22-
gatewayConfigManifest = filepath.Join(util.MustGetThisDir(), "testdata", "gateway.yaml")
20+
setupOtelcolManifest = filepath.Join(util.MustGetThisDir(), "testdata", "setup-otelcol.yaml")
21+
tracingConfigManifest = filepath.Join(util.MustGetThisDir(), "testdata", "tracing.yaml")
22+
gatewayConfigManifest = filepath.Join(util.MustGetThisDir(), "testdata", "gateway.yaml")
23+
gatewayProxyServiceManifest = filepath.Join(util.MustGetThisDir(), "testdata", "gw-proxy-tracing-service.yaml")
2324

2425
otelcolPod = &corev1.Pod{
2526
ObjectMeta: metav1.ObjectMeta{Name: "otel-collector", Namespace: "default"},

0 commit comments

Comments
 (0)