Skip to content

Commit 05f45c1

Browse files
authored
Improve longevity tests (nginx#4769)
- Disable access logging to reduce noise and cost - Scale the controller to multiple replicas to verify leader election works - Reduce runtime from 4 days to 3 days - Fix workflow results collection path
1 parent 90a1cec commit 05f45c1

File tree

10 files changed

+73
-10
lines changed

10 files changed

+73
-10
lines changed

.github/workflows/longevity-start.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,5 @@ jobs:
132132
echo "Longevity Test Run ID: ${{ github.run_id }}" >> $GITHUB_STEP_SUMMARY
133133
134134
- name: Start Longevity Tests
135-
continue-on-error: true
136135
working-directory: ./tests
137136
run: make start-longevity-test

.github/workflows/longevity-stop.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,15 +117,14 @@ jobs:
117117
run: make update-firewall-with-local-ip
118118

119119
- name: Stop Longevity Tests
120-
continue-on-error: true
121120
working-directory: ./tests
122121
run: make stop-longevity-test
123122

124123
- name: Upload Artifacts
125124
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
126125
with:
127126
name: results-${{ matrix.type }}
128-
path: tests/results/longevity/*-${{ matrix.type }}.*
127+
path: tests/results/longevity/**/*-${{ matrix.type }}.*
129128

130129
- name: Cleanup
131130
working-directory: ./tests

.yamllint.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ ignore:
33
- charts/nginx-gateway-fabric/templates
44
- config/crd/bases/
55
- deploy
6-
- site/static
76

87
rules:
98
braces: enable
@@ -32,6 +31,7 @@ rules:
3231
.github/workflows/redhat-certification.yml
3332
examples/proxy-settings-policy/app.yaml
3433
tests/suite/manifests/proxy-settings-policy/app.yaml
34+
tests/suite/manifests/longevity
3535
key-duplicates: enable
3636
key-ordering: disable
3737
line-length:

docs/developer/release-process.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ To create a new release, follow these steps:
4242
3. Create a release branch following the `release-X.Y` naming convention.
4343
- Once the release branch is created, reach out to the infra team to get it added to the runner permissions.
4444
4. Once the release branch pipeline completes, run tests using the `release-X.X-rc` images that are pushed to Github (for example, `release-1.3-rc`).
45-
1. Follow the [longevity testing](https://github.com/nginx/nginx-gateway-fabric/blob/main/tests/README.md#longevity-testing) steps to start the tests in the pipeline. For `image_tag`, use `release-X.X-rc`, and for `version`, use the upcoming `X.Y.Z` NGF version. Run the workflow on the new release branch. This process takes 4 days, where you'll collect results and tear down at the end.
45+
1. Follow the [longevity testing](https://github.com/nginx/nginx-gateway-fabric/blob/main/tests/README.md#longevity-testing) steps to start the tests in the pipeline. For `image_tag`, use `release-X.X-rc`, and for `version`, use the upcoming `X.Y.Z` NGF version. Run the workflow on the new release branch. This process takes 3 days, where you'll collect results and tear down at the end.
4646
2. Kick off the [NFR workflow](https://github.com/nginx/nginx-gateway-fabric/actions/workflows/nfr.yml) in the browser. For `image_tag`, use `release-X.X-rc`, and for `version`, use the upcoming `X.Y.Z` NGF version. Run the workflow on the new release branch. This will run all of the NFR tests which are automated and open a PR with the results files when it is complete. Review this PR and make any necessary changes before merging. Once merged, be sure to cherry-pick the commit to the release branch (set the `needs cherry pick` label on the main PR).
4747
3. Run the IPv6 tests using the `make ipv6-tests` target. This must be run from within the `tests` directory. An example of running this script for release 2.1.0 would look like this: `make ipv6-test TAG=release-2.1-rc`
4848
5. Run the [Release PR](https://github.com/nginx/nginx-gateway-fabric/actions/workflows/release-pr.yml) workflow to update the repo files for the release. Then there are a few manual steps to complete:

tests/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ nfr-test: check-for-plus-usage-endpoint build-crossplane-image ## Run the NFR t
160160

161161
.PHONY: start-longevity-test
162162
start-longevity-test: export START_LONGEVITY=true
163-
start-longevity-test: nfr-test ## Start the longevity test to run for 4 days in GKE
163+
start-longevity-test: nfr-test ## Start the longevity test to run for 3 days in GKE
164164

165165
.PHONY: stop-longevity-test
166166
stop-longevity-test: export STOP_LONGEVITY=true

tests/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -345,12 +345,12 @@ make nfr-test
345345

346346
##### Longevity testing
347347

348-
This test is run on its own due to its long-running nature. It will run for 4 days (as defined in `suite/scripts/longevity-wrk.sh`) before
348+
This test is run on its own due to its long-running nature. It will run for 3 days (as defined in `suite/scripts/longevity-wrk.sh`) before
349349
the tester must collect the results and complete the test.
350350

351351
To run in the pipeline, [run the workflow](https://github.com/nginx/nginx-gateway-fabric/actions/workflows/longevity-start.yml) to start the tests. Once the workflow completes, the job ID will be included in the summary. This must be used as input when stopping the longevity tests.
352352

353-
After 4 days (96h), visit the [GCP Monitoring Dashboards](https://console.cloud.google.com/monitoring/dashboards) page and select the `NGF Longevity Test` dashboard. Update the `cluster_name` filter to the names of the longevity clusters. Take PNG screenshots of each chart for the time period in which your test ran, and save those to be added to the results file. Then you can [stop the longevity tests](https://github.com/nginx/nginx-gateway-fabric/actions/workflows/longevity-stop.yml). If done too early, the traffic will still be flowing and results may not be collected properly, so be sure to wait the full time period.
353+
After 3 days (72h) from the time that the startup workflow **finished**, visit the [GCP Monitoring Dashboards](https://console.cloud.google.com/monitoring/dashboards) page and select the `NGF Longevity Test` dashboard. Update the `cluster_name` filter to the names of the longevity clusters. Take PNG screenshots of each chart for the time period in which your test ran, and save those to be added to the results file. Then you can [stop the longevity tests](https://github.com/nginx/nginx-gateway-fabric/actions/workflows/longevity-stop.yml). If done too early, the traffic will still be flowing and results may not be collected properly, so be sure to wait the full time period.
354354

355355
The final workflow will tear down the test and open a PR with the results. The PNGs you took should be added, and any summaries as well. Combine any results files if necessary. If you don't want to open a PR, you can toggle it off in the input when running the workflow.
356356

tests/suite/longevity_test.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,11 @@ var _ = Describe("Longevity", Label("longevity-setup", "longevity-teardown"), fu
5353
Skip("'longevity-setup' label not specified; skipping...")
5454
}
5555

56+
// scale controller to test leader election
57+
ngfDeployment, err := resourceManager.GetNGFDeployment(ngfNamespace, "ngf-longevity")
58+
Expect(err).ToNot(HaveOccurred())
59+
Expect(resourceManager.ScaleDeployment(ngfNamespace, ngfDeployment.GetName(), 2)).To(Succeed())
60+
5661
Expect(resourceManager.Apply([]client.Object{&ns})).To(Succeed())
5762
Expect(resourceManager.ApplyFromFiles(files, ns.Name)).To(Succeed())
5863
Expect(resourceManager.ApplyFromFiles(promFile, ngfNamespace)).To(Succeed())

tests/suite/manifests/longevity/cafe-routes.yaml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,12 @@ spec:
1313
- path:
1414
type: PathPrefix
1515
value: /coffee
16+
filters:
17+
- type: ExtensionRef
18+
extensionRef:
19+
group: gateway.nginx.org
20+
kind: SnippetsFilter
21+
name: loggable-snippet
1622
backendRefs:
1723
- name: coffee
1824
port: 80
@@ -32,6 +38,27 @@ spec:
3238
- path:
3339
type: PathPrefix
3440
value: /tea
41+
filters:
42+
- type: ExtensionRef
43+
extensionRef:
44+
group: gateway.nginx.org
45+
kind: SnippetsFilter
46+
name: loggable-snippet
3547
backendRefs:
3648
- name: tea
3749
port: 80
50+
---
51+
# Only log non-200 responses
52+
apiVersion: gateway.nginx.org/v1alpha1
53+
kind: SnippetsFilter
54+
metadata:
55+
name: loggable-snippet
56+
spec:
57+
snippets:
58+
- context: http
59+
value: |
60+
map $status $loggable {
61+
~^[2] 0;
62+
default 1;
63+
}
64+
access_log /dev/stdout combined if=$loggable;

tests/suite/manifests/longevity/cafe.yaml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,14 @@ spec:
2525
preStop:
2626
exec:
2727
command: ["/bin/sleep", "15"]
28+
volumeMounts:
29+
- name: nginx-config
30+
mountPath: /etc/nginx/nginx.conf
31+
subPath: nginx.conf
32+
volumes:
33+
- name: nginx-config
34+
configMap:
35+
name: nginx-no-access-log
2836
---
2937
apiVersion: v1
3038
kind: Service
@@ -66,6 +74,14 @@ spec:
6674
preStop:
6775
exec:
6876
command: ["/bin/sleep", "15"]
77+
volumeMounts:
78+
- name: nginx-config
79+
mountPath: /etc/nginx/nginx.conf
80+
subPath: nginx.conf
81+
volumes:
82+
- name: nginx-config
83+
configMap:
84+
name: nginx-no-access-log
6985
---
7086
apiVersion: v1
7187
kind: Service
@@ -79,3 +95,20 @@ spec:
7995
name: http
8096
selector:
8197
app: tea
98+
---
99+
apiVersion: v1
100+
kind: ConfigMap
101+
metadata:
102+
name: nginx-no-access-log
103+
data:
104+
nginx.conf: |
105+
events {}
106+
http {
107+
access_log off;
108+
server {
109+
listen 8080;
110+
location / {
111+
return 200;
112+
}
113+
}
114+
}

tests/suite/scripts/longevity-wrk.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,5 +44,5 @@ if ((ELAPSED >= MAX_WAIT)); then
4444
exit 1
4545
fi
4646

47-
nohup wrk -t2 -c100 -d96h http://cafe.example.com/coffee &>~/coffee.txt &
48-
nohup wrk -t2 -c100 -d96h https://cafe.example.com/tea &>~/tea.txt &
47+
nohup wrk -t2 -c100 -d72h http://cafe.example.com/coffee &>~/coffee.txt &
48+
nohup wrk -t2 -c100 -d72h https://cafe.example.com/tea &>~/tea.txt &

0 commit comments

Comments
 (0)