Testing updates: soundness and regressions; minor bug fixes#286
Testing updates: soundness and regressions; minor bug fixes#286ttj merged 27 commits intoverivital:masterfrom
Conversation
…ed on testing status
… these may get overwitten and changed with new runs and differing matlab versions, os, etc, but adding for sanity checking and will be fine to overwrite if modified
There was a problem hiding this comment.
Pull request overview
This PR enhances the testing infrastructure for NNV by adding comprehensive test files, soundness verification utilities, regression testing capabilities, and minor bug fixes. The changes focus on improving test coverage, enabling regression detection, and establishing a framework for automated testing.
Key changes:
- Addition of 40+ new test files for utilities, soundness verification, and layer testing
- Implementation of regression testing infrastructure with baseline comparison
- Creation of test utilities for soundness verification and data management
- Minor bug fix in VolumeStar test (reduced image size to prevent OOM errors)
Reviewed changes
Copilot reviewed 160 out of 351 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
test_onnx2nnv.m |
Tests ONNX to NNV conversion functionality |
test_matlab2nnv.m |
Tests MATLAB to NNV network conversion |
test_lpsolver.m |
Tests LP solver interface across different solvers |
test_load_vnnlib.m |
Tests VNN-LIB specification parsing |
run_all_utils_tests.m |
Runner script for utility function tests |
MatMul_To_MatMulLayer1009.m |
Auto-generated ONNX layer for single pendulum controller |
FlattenLayer1009.m (cartpole) |
Auto-generated ONNX Flatten layer for cartpole |
FlattenLayer1018.m (ACASXU) |
Auto-generated ONNX Flatten layer for ACAS XU |
test_tutorial_figures.m |
Runs tutorial examples and saves generated figures |
verify_soundness.m |
Utility for verifying soundness of set transformations |
save_test_figure.m |
Saves test figures to results directory |
save_test_data.m |
Saves test workspace data for regression testing |
run_tests_with_regression.m |
Runs tests with regression detection |
manage_baselines.m |
Manages baseline files for regression detection |
get_test_config.m |
Global test configuration |
compare_regression_data.m |
Compares test output with baselines |
test_soundness_*.m (35 files) |
Comprehensive soundness tests for layers and set representations |
test_VolumeStar.m |
Fixed OOM issue by using smaller test images |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ency checking; root cause of github action failures were missing toolbox dependencies, specifically deep learning toolbox and optimization toolbox; so, messaging improved to warn user if critical functions like dlarray, nnet parts, linprog, etc are missing
…ably cross-platform now; added documentation for it and its setup, as it calls python inline, needs to be tested further, but working locally at least
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 179 out of 370 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| offDiag1 = IS_concat5.C(1:4, 3:5); % rows from IS5a, cols from IS5b | ||
| offDiag2 = IS_concat5.C(5:10, 1:2); % rows from IS5b, cols from IS5a | ||
| assert(all(offDiag1(:) == 0), 'Off-diagonal block 1 should be zero'); | ||
| assert(all(offDiag2(:) == 0), 'Off-diagonal block 2 should be zero'); |
There was a problem hiding this comment.
[nitpick] Variable names 'offDiag1' and 'offDiag2' are ambiguous. Consider renaming to 'offDiagonalBlock1' and 'offDiagonalBlock2' for clarity.
| offDiag1 = IS_concat5.C(1:4, 3:5); % rows from IS5a, cols from IS5b | |
| offDiag2 = IS_concat5.C(5:10, 1:2); % rows from IS5b, cols from IS5a | |
| assert(all(offDiag1(:) == 0), 'Off-diagonal block 1 should be zero'); | |
| assert(all(offDiag2(:) == 0), 'Off-diagonal block 2 should be zero'); | |
| offDiagonalBlock1 = IS_concat5.C(1:4, 3:5); % rows from IS5a, cols from IS5b | |
| offDiagonalBlock2 = IS_concat5.C(5:10, 1:2); % rows from IS5b, cols from IS5a | |
| assert(all(offDiagonalBlock1(:) == 0), 'Off-diagonal block 1 should be zero'); | |
| assert(all(offDiagonalBlock2(:) == 0), 'Off-diagonal block 2 should be zero'); |
| assert(size(output, 2) == 4, 'Width should be preserved'); | ||
|
|
||
| %% Test 4: DepthConcatenationLayer reach_single_input | ||
| % NOTE: reach_multipleInputs_Star has a library bug (nI undefined) |
There was a problem hiding this comment.
[nitpick] This note references a library bug. Consider creating a TODO or issue tracker reference to ensure this gets addressed, rather than just commenting about it in tests.
| % NOTE: reach_multipleInputs_Star has a library bug (nI undefined) | |
| % TODO: Fix library bug in reach_multipleInputs_Star (nI undefined) | |
| % Consider referencing or creating an issue in the tracker for this bug. |
| % NOTE: LeakyReLU approx-zono has a library bug in LeakyReLU.reach_zono_approx | ||
| % line 1595: V1(map1, :) = gamma*V1(map1) has dimension mismatch | ||
| % This should be fixed in the NNV library | ||
|
|
There was a problem hiding this comment.
The comment identifies a specific bug with line number reference, but no test exists to verify or track this bug. Consider adding a test case with try/catch that documents the expected failure until the bug is fixed.
| %% Test 6: LeakyReLU with approx-zono (expected failure due to known bug) | |
| try | |
| rng(42); | |
| L6 = LeakyReLULayer('Name', 'leakyrelu_zono', 'Gamma', 0.1); | |
| lb6 = rand(3, 3) - 0.5; | |
| ub6 = lb6 + rand(3, 3) * 0.2; | |
| input_iz6 = ImageZono(lb6, ub6); | |
| output_iz6 = L6.reach(input_iz6, 'approx-zono'); | |
| error('LeakyReLU approx-zono did not fail as expected (bug may be fixed)'); | |
| catch ME | |
| assert(contains(ME.message, 'dimension mismatch') || ... | |
| contains(ME.message, 'Matrix dimensions must agree') || ... | |
| contains(ME.message, 'Unable to perform assignment'), ... | |
| 'LeakyReLU approx-zono failed, but not due to expected dimension mismatch bug'); | |
| disp('LeakyReLU approx-zono test: expected failure due to known bug (line 1595 in LeakyReLU.reach_zono_approx)'); | |
| end |
| function result = ternary(condition, if_true, if_false) | ||
| if condition | ||
| result = if_true; | ||
| else | ||
| result = if_false; | ||
| end | ||
| end |
There was a problem hiding this comment.
[nitpick] This ternary helper function is only used once. Consider inlining it or moving it to a shared utilities file if it will be used across multiple test files.
| if ~contains(full_path, 'test') && ... | ||
| ~contains(full_path, 'cora') && ... | ||
| ~contains(full_path, 'tbxmanager') |
There was a problem hiding this comment.
[nitpick] The exclusion patterns are hardcoded. Consider defining these as configuration constants at the top of the function or in a separate config file for easier maintenance.
|
Merging the test updates and CI/CD improvements back now, next will add Usama's weight perturbation pull and test it |
This is a fairly comprehensive update to the testing infrastructure, including addition of existing test figures to save them (as many tests were just based on creating figures), modifications to them for checking assertions, some regression testing to prior results
A few minor bugs also were fixed, namely in concatenation layer and some other minor updates (eg, some inference fixes for bias being ignored)
A base new github action for running regressions was created, we will need to test and debug it live now on github