Skip to content

Testing updates: soundness and regressions; minor bug fixes#286

Merged
ttj merged 27 commits intoverivital:masterfrom
ttj:test-regression
Dec 2, 2025
Merged

Testing updates: soundness and regressions; minor bug fixes#286
ttj merged 27 commits intoverivital:masterfrom
ttj:test-regression

Conversation

@ttj
Copy link
Contributor

@ttj ttj commented Dec 1, 2025

This is a fairly comprehensive update to the testing infrastructure, including addition of existing test figures to save them (as many tests were just based on creating figures), modifications to them for checking assertions, some regression testing to prior results

A few minor bugs also were fixed, namely in concatenation layer and some other minor updates (eg, some inference fixes for bias being ignored)

A base new github action for running regressions was created, we will need to test and debug it live now on github

@ttj ttj requested a review from Copilot December 1, 2025 04:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the testing infrastructure for NNV by adding comprehensive test files, soundness verification utilities, regression testing capabilities, and minor bug fixes. The changes focus on improving test coverage, enabling regression detection, and establishing a framework for automated testing.

Key changes:

  • Addition of 40+ new test files for utilities, soundness verification, and layer testing
  • Implementation of regression testing infrastructure with baseline comparison
  • Creation of test utilities for soundness verification and data management
  • Minor bug fix in VolumeStar test (reduced image size to prevent OOM errors)

Reviewed changes

Copilot reviewed 160 out of 351 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test_onnx2nnv.m Tests ONNX to NNV conversion functionality
test_matlab2nnv.m Tests MATLAB to NNV network conversion
test_lpsolver.m Tests LP solver interface across different solvers
test_load_vnnlib.m Tests VNN-LIB specification parsing
run_all_utils_tests.m Runner script for utility function tests
MatMul_To_MatMulLayer1009.m Auto-generated ONNX layer for single pendulum controller
FlattenLayer1009.m (cartpole) Auto-generated ONNX Flatten layer for cartpole
FlattenLayer1018.m (ACASXU) Auto-generated ONNX Flatten layer for ACAS XU
test_tutorial_figures.m Runs tutorial examples and saves generated figures
verify_soundness.m Utility for verifying soundness of set transformations
save_test_figure.m Saves test figures to results directory
save_test_data.m Saves test workspace data for regression testing
run_tests_with_regression.m Runs tests with regression detection
manage_baselines.m Manages baseline files for regression detection
get_test_config.m Global test configuration
compare_regression_data.m Compares test output with baselines
test_soundness_*.m (35 files) Comprehensive soundness tests for layers and set representations
test_VolumeStar.m Fixed OOM issue by using smaller test images

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ttj added 22 commits November 30, 2025 22:45
…ency checking; root cause of github action failures were missing toolbox dependencies, specifically deep learning toolbox and optimization toolbox; so, messaging improved to warn user if critical functions like dlarray, nnet parts, linprog, etc are missing
…ably cross-platform now; added documentation for it and its setup, as it calls python inline, needs to be tested further, but working locally at least
@ttj ttj requested a review from Copilot December 2, 2025 00:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 179 out of 370 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +183 to +186
offDiag1 = IS_concat5.C(1:4, 3:5); % rows from IS5a, cols from IS5b
offDiag2 = IS_concat5.C(5:10, 1:2); % rows from IS5b, cols from IS5a
assert(all(offDiag1(:) == 0), 'Off-diagonal block 1 should be zero');
assert(all(offDiag2(:) == 0), 'Off-diagonal block 2 should be zero');
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Variable names 'offDiag1' and 'offDiag2' are ambiguous. Consider renaming to 'offDiagonalBlock1' and 'offDiagonalBlock2' for clarity.

Suggested change
offDiag1 = IS_concat5.C(1:4, 3:5); % rows from IS5a, cols from IS5b
offDiag2 = IS_concat5.C(5:10, 1:2); % rows from IS5b, cols from IS5a
assert(all(offDiag1(:) == 0), 'Off-diagonal block 1 should be zero');
assert(all(offDiag2(:) == 0), 'Off-diagonal block 2 should be zero');
offDiagonalBlock1 = IS_concat5.C(1:4, 3:5); % rows from IS5a, cols from IS5b
offDiagonalBlock2 = IS_concat5.C(5:10, 1:2); % rows from IS5b, cols from IS5a
assert(all(offDiagonalBlock1(:) == 0), 'Off-diagonal block 1 should be zero');
assert(all(offDiagonalBlock2(:) == 0), 'Off-diagonal block 2 should be zero');

Copilot uses AI. Check for mistakes.
assert(size(output, 2) == 4, 'Width should be preserved');

%% Test 4: DepthConcatenationLayer reach_single_input
% NOTE: reach_multipleInputs_Star has a library bug (nI undefined)
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This note references a library bug. Consider creating a TODO or issue tracker reference to ensure this gets addressed, rather than just commenting about it in tests.

Suggested change
% NOTE: reach_multipleInputs_Star has a library bug (nI undefined)
% TODO: Fix library bug in reach_multipleInputs_Star (nI undefined)
% Consider referencing or creating an issue in the tracker for this bug.

Copilot uses AI. Check for mistakes.
% NOTE: LeakyReLU approx-zono has a library bug in LeakyReLU.reach_zono_approx
% line 1595: V1(map1, :) = gamma*V1(map1) has dimension mismatch
% This should be fixed in the NNV library

Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment identifies a specific bug with line number reference, but no test exists to verify or track this bug. Consider adding a test case with try/catch that documents the expected failure until the bug is fixed.

Suggested change
%% Test 6: LeakyReLU with approx-zono (expected failure due to known bug)
try
rng(42);
L6 = LeakyReLULayer('Name', 'leakyrelu_zono', 'Gamma', 0.1);
lb6 = rand(3, 3) - 0.5;
ub6 = lb6 + rand(3, 3) * 0.2;
input_iz6 = ImageZono(lb6, ub6);
output_iz6 = L6.reach(input_iz6, 'approx-zono');
error('LeakyReLU approx-zono did not fail as expected (bug may be fixed)');
catch ME
assert(contains(ME.message, 'dimension mismatch') || ...
contains(ME.message, 'Matrix dimensions must agree') || ...
contains(ME.message, 'Unable to perform assignment'), ...
'LeakyReLU approx-zono failed, but not due to expected dimension mismatch bug');
disp('LeakyReLU approx-zono test: expected failure due to known bug (line 1595 in LeakyReLU.reach_zono_approx)');
end

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +171
function result = ternary(condition, if_true, if_false)
if condition
result = if_true;
else
result = if_false;
end
end
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This ternary helper function is only used once. Consider inlining it or moving it to a shared utilities file if it will be used across multiple test files.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +67
if ~contains(full_path, 'test') && ...
~contains(full_path, 'cora') && ...
~contains(full_path, 'tbxmanager')
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The exclusion patterns are hardcoded. Consider defining these as configuration constants at the top of the function or in a separate config file for easier maintenance.

Copilot uses AI. Check for mistakes.
@ttj ttj marked this pull request as ready for review December 2, 2025 00:44
@ttj
Copy link
Contributor Author

ttj commented Dec 2, 2025

Merging the test updates and CI/CD improvements back now, next will add Usama's weight perturbation pull and test it

@ttj ttj merged commit 036a521 into verivital:master Dec 2, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant