Acceptance test runner TODOs

## Description

The acceptance test server runs the tests that promote `dev` to `nightly` on success. It is now functional, but here's a list of improvements that could be made to make it smoother and more robust.

Repo that the server runs is at https://github.com/Sovereign-Labs/acceptance-test-runner

## Test runner TODOs
### Quick(ish) wins
 - [x] Some of the scripts have been edited inline in the server. Update the repository, and in an ideal world find a way to access the private git repo from the server to be able to update it there.
 - [x] The failure slack message gives the wrong directory (`less -R /opt/acceptance-server/...` when it should be `/opt/acceptance-tests-runner`)
 - [x] Do not crash the run if there's an existing `test-runs/` subfolder for that commit hash. Ideal behaviour would be moving the previous run to a backup folder. Maybe start crashing if there's already multiple backup folders for this commit (to avoid exploding the disk usage if there's a repeatedly re-running failed commit)
 - [x] Set up the second storage disk on the server (currently unmounted) to have 2TB of storage instead of 1TB
 - [x] Add a strategy to clean up existing failed runs when a new run succeeds OR to upload previous failed runs somewhere in case future investigation is necessary - either way avoiding disk space exhaustion over time. (Could be as simple as deleting runs older than X days)
 - [ ] Add proper error reporting to failure notifications. Usually the most interesting parts of the logs are either a) a slot snapshot mismatch, b) an explicit `ERROR` log, or c) compilation failure. The script should try to extract these intelligently and make them available on slack (perhaps in a thread message under the main failure notification, to avoid taking up space)
 - [ ] Add success notifications, e.g. sent to the #github-updates channel.
 - [x] The script, when checking out the rollup-starter repo, should attempt to check out a branch with a deterministic name - in case there's a starter upgrade necessary to accompany the SDK update. If none exists then it can proceed with `main`.
 - [x] Add telemetry to the acceptance test box so we can check logs and metrics from graphana
 - [x] Move the performance throughput JSON to a different folder, and do not wipe it every time the data is regenerated
 - [x] Auto-regenerate acceptance data in a new location on test failure, so that if a breaking change is manually identified we can quickly promote rather than manually having to wait for regeneration

### More work
 - [ ] Look into identifying performance degradations on every commit: either schedule short test runs on every merge to `dev`, or set up a procedure to bisect when a nightly test flags slow performance
 - [ ] Separate the soak test to run several targeted performance loads: CPU bounded transactions only, I/O bounded transactions only (`synthentic-load` module?), and blob size bounded transactions only. Save throughput results independently for each. Design work: have short targeted load tests and then mixed load be the main long-running soak, or divide the long running test into 3 equal parts? Or 4 equal parts with the mixed load being just a 4th load type?
 - [ ] Add start-stop testing under load, both graceful and ungraceful. (Design: during main soak or extended section? How often - random?)
 - [ ] If we add intelligent error identification, then on a slot mismatch error we could automatically trigger re-generating the test data and keeping it in an auxilliary directory, ready to be swapped in manually once the failure is investigated
 - [ ] If we add categories to breaking changes in the changelog, the test could intelligently expect failure cases (e.g. chain hash change = expect slot mismatch + expect transactions to fail with invalid signature), abort early, and trigger the appropriate remedy (e.g. regenerate the data).
 - [ ] Integrate web3 SDK integration testing into the acceptance testing workflow. Design work: handled by github action or directly on the server? Initial thoughts are to add it as a step in the trigger workflow, and avoid triggering the test run if the web3 sdk doesn't work. OR can be added as part of the promotion workflow, after the test run succeeds.
 - [ ] Diversify runtimes and configurations: trigger the script with more different runtimes for better edge case coverage. Consider integrating customer runtimes.
   - This would be particularly useful with the previous point (web3 integration testing)
   - [ ] Integrate customer EVM benchmark binary

## Documentation
A few scenarios and common tasks that need to be documented, at least in the README of the project.
- [x] Steps to debug: 1) check the logs, 2) verify the failure locally, 3) regenerate the data or fix in a branch
- [x] Where to find the logs: journalctl for the server and the `test-runs/` subdirectory for actual test run output
- [x] Editing the branch in the `run-tests.sh` script, if code changes in the starter are required (will not be necessary if the automatic branch selection from above is implement, but for now it's required)
- [x] Use `screen`/`tmux` for data regeneration because it's a long-running foreground process
- [ ] Add a script (makefile?) to automate regenerating the data and swapping it in once generation completes?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Acceptance test runner TODOs #2095

Description

Test runner TODOs

Quick(ish) wins

More work

Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Acceptance test runner TODOs #2095

Description

Description

Test runner TODOs

Quick(ish) wins

More work

Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions