Skip to content

Acceptance test runner TODOs #2095

@theodorebugnet

Description

@theodorebugnet

Description

The acceptance test server runs the tests that promote dev to nightly on success. It is now functional, but here's a list of improvements that could be made to make it smoother and more robust.

Repo that the server runs is at https://github.com/Sovereign-Labs/acceptance-test-runner

Test runner TODOs

Quick(ish) wins

  • Some of the scripts have been edited inline in the server. Update the repository, and in an ideal world find a way to access the private git repo from the server to be able to update it there.
  • The failure slack message gives the wrong directory (less -R /opt/acceptance-server/... when it should be /opt/acceptance-tests-runner)
  • Do not crash the run if there's an existing test-runs/ subfolder for that commit hash. Ideal behaviour would be moving the previous run to a backup folder. Maybe start crashing if there's already multiple backup folders for this commit (to avoid exploding the disk usage if there's a repeatedly re-running failed commit)
  • Set up the second storage disk on the server (currently unmounted) to have 2TB of storage instead of 1TB
  • Add a strategy to clean up existing failed runs when a new run succeeds OR to upload previous failed runs somewhere in case future investigation is necessary - either way avoiding disk space exhaustion over time. (Could be as simple as deleting runs older than X days)
  • Add proper error reporting to failure notifications. Usually the most interesting parts of the logs are either a) a slot snapshot mismatch, b) an explicit ERROR log, or c) compilation failure. The script should try to extract these intelligently and make them available on slack (perhaps in a thread message under the main failure notification, to avoid taking up space)
  • Add success notifications, e.g. sent to the #github-updates channel.
  • The script, when checking out the rollup-starter repo, should attempt to check out a branch with a deterministic name - in case there's a starter upgrade necessary to accompany the SDK update. If none exists then it can proceed with main.
  • Add telemetry to the acceptance test box so we can check logs and metrics from graphana
  • Move the performance throughput JSON to a different folder, and do not wipe it every time the data is regenerated
  • Auto-regenerate acceptance data in a new location on test failure, so that if a breaking change is manually identified we can quickly promote rather than manually having to wait for regeneration

More work

  • Look into identifying performance degradations on every commit: either schedule short test runs on every merge to dev, or set up a procedure to bisect when a nightly test flags slow performance
  • Separate the soak test to run several targeted performance loads: CPU bounded transactions only, I/O bounded transactions only (synthentic-load module?), and blob size bounded transactions only. Save throughput results independently for each. Design work: have short targeted load tests and then mixed load be the main long-running soak, or divide the long running test into 3 equal parts? Or 4 equal parts with the mixed load being just a 4th load type?
  • Add start-stop testing under load, both graceful and ungraceful. (Design: during main soak or extended section? How often - random?)
  • If we add intelligent error identification, then on a slot mismatch error we could automatically trigger re-generating the test data and keeping it in an auxilliary directory, ready to be swapped in manually once the failure is investigated
  • If we add categories to breaking changes in the changelog, the test could intelligently expect failure cases (e.g. chain hash change = expect slot mismatch + expect transactions to fail with invalid signature), abort early, and trigger the appropriate remedy (e.g. regenerate the data).
  • Integrate web3 SDK integration testing into the acceptance testing workflow. Design work: handled by github action or directly on the server? Initial thoughts are to add it as a step in the trigger workflow, and avoid triggering the test run if the web3 sdk doesn't work. OR can be added as part of the promotion workflow, after the test run succeeds.
  • Diversify runtimes and configurations: trigger the script with more different runtimes for better edge case coverage. Consider integrating customer runtimes.
    • This would be particularly useful with the previous point (web3 integration testing)
    • Integrate customer EVM benchmark binary

Documentation

A few scenarios and common tasks that need to be documented, at least in the README of the project.

  • Steps to debug: 1) check the logs, 2) verify the failure locally, 3) regenerate the data or fix in a branch
  • Where to find the logs: journalctl for the server and the test-runs/ subdirectory for actual test run output
  • Editing the branch in the run-tests.sh script, if code changes in the starter are required (will not be necessary if the automatic branch selection from above is implement, but for now it's required)
  • Use screen/tmux for data regeneration because it's a long-running foreground process
  • Add a script (makefile?) to automate regenerating the data and swapping it in once generation completes?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions