Testing updates: soundness and regressions; minor bug fixes by ttj · Pull Request #286 · verivital/nnv

ttj · 2025-12-01T04:42:30Z

This is a fairly comprehensive update to the testing infrastructure, including addition of existing test figures to save them (as many tests were just based on creating figures), modifications to them for checking assertions, some regression testing to prior results

A few minor bugs also were fixed, namely in concatenation layer and some other minor updates (eg, some inference fixes for bias being ignored)

A base new github action for running regressions was created, we will need to test and debug it live now on github

…ed on testing status

…utilities

… these may get overwitten and changed with new runs and differing matlab versions, os, etc, but adding for sanity checking and will be fine to overwrite if modified

Copilot

Pull request overview

This PR enhances the testing infrastructure for NNV by adding comprehensive test files, soundness verification utilities, regression testing capabilities, and minor bug fixes. The changes focus on improving test coverage, enabling regression detection, and establishing a framework for automated testing.

Key changes:

Addition of 40+ new test files for utilities, soundness verification, and layer testing
Implementation of regression testing infrastructure with baseline comparison
Creation of test utilities for soundness verification and data management
Minor bug fix in VolumeStar test (reduced image size to prevent OOM errors)

Reviewed changes

Copilot reviewed 160 out of 351 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`test_onnx2nnv.m`	Tests ONNX to NNV conversion functionality
`test_matlab2nnv.m`	Tests MATLAB to NNV network conversion
`test_lpsolver.m`	Tests LP solver interface across different solvers
`test_load_vnnlib.m`	Tests VNN-LIB specification parsing
`run_all_utils_tests.m`	Runner script for utility function tests
`MatMul_To_MatMulLayer1009.m`	Auto-generated ONNX layer for single pendulum controller
`FlattenLayer1009.m` (cartpole)	Auto-generated ONNX Flatten layer for cartpole
`FlattenLayer1018.m` (ACASXU)	Auto-generated ONNX Flatten layer for ACAS XU
`test_tutorial_figures.m`	Runs tutorial examples and saves generated figures
`verify_soundness.m`	Utility for verifying soundness of set transformations
`save_test_figure.m`	Saves test figures to results directory
`save_test_data.m`	Saves test workspace data for regression testing
`run_tests_with_regression.m`	Runs tests with regression detection
`manage_baselines.m`	Manages baseline files for regression detection
`get_test_config.m`	Global test configuration
`compare_regression_data.m`	Compares test output with baselines
`test_soundness_*.m` (35 files)	Comprehensive soundness tests for layers and set representations
`test_VolumeStar.m`	Fixed OOM issue by using smaller test images

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ction handle

…ency checking; root cause of github action failures were missing toolbox dependencies, specifically deep learning toolbox and optimization toolbox; so, messaging improved to warn user if critical functions like dlarray, nnet parts, linprog, etc are missing

…egression

…ably cross-platform now; added documentation for it and its setup, as it calls python inline, needs to be tested further, but working locally at least

Copilot

Pull request overview

Copilot reviewed 179 out of 370 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-02T00:38:27Z

code/nnv/tests/soundness/test_soundness_ConcatenationLayer.m

+offDiag1 = IS_concat5.C(1:4, 3:5);  % rows from IS5a, cols from IS5b
+offDiag2 = IS_concat5.C(5:10, 1:2); % rows from IS5b, cols from IS5a
+assert(all(offDiag1(:) == 0), 'Off-diagonal block 1 should be zero');
+assert(all(offDiag2(:) == 0), 'Off-diagonal block 2 should be zero');


[nitpick] Variable names 'offDiag1' and 'offDiag2' are ambiguous. Consider renaming to 'offDiagonalBlock1' and 'offDiagonalBlock2' for clarity.

Suggested change

offDiag1 = IS_concat5.C(1:4, 3:5); % rows from IS5a, cols from IS5b

offDiag2 = IS_concat5.C(5:10, 1:2); % rows from IS5b, cols from IS5a

assert(all(offDiag1(:) == 0), 'Off-diagonal block 1 should be zero');

assert(all(offDiag2(:) == 0), 'Off-diagonal block 2 should be zero');

offDiagonalBlock1 = IS_concat5.C(1:4, 3:5); % rows from IS5a, cols from IS5b

offDiagonalBlock2 = IS_concat5.C(5:10, 1:2); % rows from IS5b, cols from IS5a

assert(all(offDiagonalBlock1(:) == 0), 'Off-diagonal block 1 should be zero');

assert(all(offDiagonalBlock2(:) == 0), 'Off-diagonal block 2 should be zero');

Copilot · 2025-12-02T00:38:28Z

code/nnv/tests/soundness/test_soundness_DepthConcatenationLayer.m

+assert(size(output, 2) == 4, 'Width should be preserved');
+
+%% Test 4: DepthConcatenationLayer reach_single_input
+% NOTE: reach_multipleInputs_Star has a library bug (nI undefined)


[nitpick] This note references a library bug. Consider creating a TODO or issue tracker reference to ensure this gets addressed, rather than just commenting about it in tests.

Suggested change

% NOTE: reach_multipleInputs_Star has a library bug (nI undefined)

% TODO: Fix library bug in reach_multipleInputs_Star (nI undefined)

% Consider referencing or creating an issue in the tracker for this bug.

Copilot · 2025-12-02T00:38:29Z

code/nnv/tests/soundness/test_soundness_approx_zono.m

+% NOTE: LeakyReLU approx-zono has a library bug in LeakyReLU.reach_zono_approx
+% line 1595: V1(map1, :) = gamma*V1(map1) has dimension mismatch
+% This should be fixed in the NNV library
+


The comment identifies a specific bug with line number reference, but no test exists to verify or track this bug. Consider adding a test case with try/catch that documents the expected failure until the bug is fixed.

Suggested change

%% Test 6: LeakyReLU with approx-zono (expected failure due to known bug)

try

rng(42);

L6 = LeakyReLULayer('Name', 'leakyrelu_zono', 'Gamma', 0.1);

lb6 = rand(3, 3) - 0.5;

ub6 = lb6 + rand(3, 3) * 0.2;

input_iz6 = ImageZono(lb6, ub6);

output_iz6 = L6.reach(input_iz6, 'approx-zono');

error('LeakyReLU approx-zono did not fail as expected (bug may be fixed)');

catch ME

assert(contains(ME.message, 'dimension mismatch') || ...

contains(ME.message, 'Matrix dimensions must agree') || ...

contains(ME.message, 'Unable to perform assignment'), ...

'LeakyReLU approx-zono failed, but not due to expected dimension mismatch bug');

disp('LeakyReLU approx-zono test: expected failure due to known bug (line 1595 in LeakyReLU.reach_zono_approx)');

end

Copilot · 2025-12-02T00:38:29Z

code/nnv/tests/test_utils/compare_regression_data.m

+function result = ternary(condition, if_true, if_false)
+    if condition
+        result = if_true;
+    else
+        result = if_false;
+    end
+end


[nitpick] This ternary helper function is only used once. Consider inlining it or moving it to a shared utilities file if it will be used across multiple test files.

Copilot · 2025-12-02T00:38:29Z

code/nnv/tests/test_utils/track_coverage.m

+                if ~contains(full_path, 'test') && ...
+                   ~contains(full_path, 'cora') && ...
+                   ~contains(full_path, 'tbxmanager')


[nitpick] The exclusion patterns are hardcoded. Consider defining these as configuration constants at the top of the function or in a separate config file for easier maintenance.

ttj · 2025-12-02T00:46:28Z

Merging the test updates and CI/CD improvements back now, next will add Usama's weight perturbation pull and test it

ttj added 5 commits November 30, 2025 21:44

Updating existing testing infrastructure and some minor bug fixes bas…

33a51b8

…ed on testing status

Adding new soundness tests, regression test infrastructure, and test …

e42c4ff

…utilities

Adding regression test baselines and last run results

070ad48

Adding testing exemplar figures from existing testing infrastructure;…

d1005bc

… these may get overwitten and changed with new runs and differing matlab versions, os, etc, but adding for sanity checking and will be fine to overwrite if modified

Adding regression test github actions yaml

e744ca4

ttj requested a review from Copilot December 1, 2025 04:44

Copilot AI reviewed Dec 1, 2025

View reviewed changes

ttj added 22 commits November 30, 2025 22:45

Fixing github action regression yaml

1690530

Fixing github actions for CP

cadfd63

Continue fixing github actions for CP

50ae112

Continue fixing github actions

f927ec3

Continue fixing github actions, setting matlab 2024b

b3efa4d

Continue fixing github actions, removing networks

8cb2d05

Continue fixing github actions, debugging lpsolver with string vs fun…

7a935ed

…ction handle

Continue fixing github actions, debugging lpsolver installation

67844e2

Continue fixing github actions, debugging lpsolver typo

6c261f7

Continue fixing github actions, debugging lpsolver typo

f00041d

Continue fixing github actions, checking toolboxes

ba8be88

Continue fixing github actions, added toolboxes

c2cace6

Remove linprog full status checking for performance, moved to startup

438bd28

Merge branch 'master' of https://github.com/verivital/nnv into test-r…

0354148

…egression

Refine setup and error checking to avoid toolbox failures; update tests

de19c2c

Add coverage tests and quickstart

5aa7082

Update readmes and minor platform changes

0166920

Update readmes for NNV 3

77d2c1f

Update readmes for NNV 3 and other minor updates

a0bed70

Update readme with missing papers

207b1d1

Fix conformal prediction (CP) parts to run also on Windows and presum…

65abf3c

…ably cross-platform now; added documentation for it and its setup, as it calls python inline, needs to be tested further, but working locally at least

ttj requested a review from Copilot December 2, 2025 00:37

Copilot AI reviewed Dec 2, 2025

View reviewed changes

ttj marked this pull request as ready for review December 2, 2025 00:44

ttj merged commit 036a521 into verivital:master Dec 2, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing updates: soundness and regressions; minor bug fixes#286

Testing updates: soundness and regressions; minor bug fixes#286
ttj merged 27 commits intoverivital:masterfrom
ttj:test-regression

ttj commented Dec 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 2, 2025

Uh oh!

Copilot AI Dec 2, 2025

Uh oh!

Copilot AI Dec 2, 2025

Uh oh!

Copilot AI Dec 2, 2025

Uh oh!

Copilot AI Dec 2, 2025

Uh oh!

ttj commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	% NOTE: reach_multipleInputs_Star has a library bug (nI undefined)
	% TODO: Fix library bug in reach_multipleInputs_Star (nI undefined)
	% Consider referencing or creating an issue in the tracker for this bug.

+%% Test 6: LeakyReLU with approx-zono (expected failure due to known bug)
+try
+    rng(42);
+    L6 = LeakyReLULayer('Name', 'leakyrelu_zono', 'Gamma', 0.1);
+    lb6 = rand(3, 3) - 0.5;
+    ub6 = lb6 + rand(3, 3) * 0.2;
+    input_iz6 = ImageZono(lb6, ub6);
+    output_iz6 = L6.reach(input_iz6, 'approx-zono');
+    error('LeakyReLU approx-zono did not fail as expected (bug may be fixed)');
+catch ME
+    assert(contains(ME.message, 'dimension mismatch') || ...
+           contains(ME.message, 'Matrix dimensions must agree') || ...
+           contains(ME.message, 'Unable to perform assignment'), ...
+           'LeakyReLU approx-zono failed, but not due to expected dimension mismatch bug');
+    disp('LeakyReLU approx-zono test: expected failure due to known bug (line 1595 in LeakyReLU.reach_zono_approx)');
+end

Conversation

ttj commented Dec 1, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

ttj commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant