Skip to content

ModelStar merge; needs review due to divergent histories#284

Draft
m-usama-z wants to merge 28 commits intoverivital:masterfrom
cirens-utd:master
Draft

ModelStar merge; needs review due to divergent histories#284
m-usama-z wants to merge 28 commits intoverivital:masterfrom
cirens-utd:master

Conversation

@m-usama-z
Copy link
Contributor

No description provided.

@ttj ttj requested a review from Copilot November 9, 2025 01:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds significant new functionality for neural network verification and weight perturbation analysis, particularly focused on RUL (Remaining Useful Life) prediction for Collins turbofan engines. The changes include new verification capabilities, memory management improvements, GPU support enhancements, and parallelization of verification tasks.

  • Added weight perturbation support for robustness analysis of neural networks
  • Implemented parallelized verification with memory-aware LP solver management
  • Enhanced GPU/CPU device switching and memory handling for large star sets

Reviewed Changes

Copilot reviewed 50 out of 54 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
monotonicity_CI_shift10_w20.vnnlib VNN-LIB specification file for RUL monotonicity property verification with 400 input variables
collins_RUL_test.m Test script for verifying RUL CNN with weight perturbations
verify_specification.m Added parallelized verification with memory management andLP scheduling
lpsolver.m Changed default solver to Gurobi with fallback logic and improved error handling
Star.m Added static helper methods and optimized range estimation for unconstrained neurons
ImageStar.m Optimized range computation using vectorized operations and improved GPU handling
NN.m Major additions: weight perturbation, sampling, memory management, and evaluation improvements
Layer files Added toCPU methods across all layers for CPU/GPU switching
FeatureInputLayer.m Contains typo in affineMap call

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 98 to 100
if ~any(strcmp(varargin{1}, ["valid"]))
error("Should not need linprog; did gurobi cause a problem, or was it not used at all? Maybe adjust gurobi's feasibility tolerance...")
end
Copy link

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error check assumes varargin{1} exists, but varargin could be empty. This will cause an indexing error if lpsolver is called without the optional 10th argument. Add a length check before accessing varargin{1}.

Copilot uses AI. Check for mistakes.

% solve using glpk (linprog as backup)
elseif strcmp(lp_solver, 'glpk')
error("Should not need to use glpk. Did gurobi and linprog both fail?")
Copy link

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message is overly restrictive. The glpk solver path may be legitimately needed in some cases, and this unconditional error prevents its use entirely. Consider making this a warning or removing it if glpk is meant to be disabled.

Copilot uses AI. Check for mistakes.
@ttj
Copy link
Contributor

ttj commented Nov 9, 2025

first comment, @m-usama-z please make sure tests pass (action failed), so please update on your end until they pass (or let us know more about why they don't); also, please put all your email text here in the request so we don't have combination of emails and other discussions happening in the git

once that is done, @sammsaski @atumlin @HCWDavid (and @Navidhashemicodes and @mldiego if you have any cycles in your off time as we prepare the NNV 3 paper ), please review the changes, especially the points raised by @m-usama-z regarding hard to merge parts

m-usama-z and others added 3 commits November 8, 2025 23:53
fixed typo

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@m-usama-z
Copy link
Contributor Author

Thank you for initiating the review process, Professor @ttj.

Failed Workflow

Regarding the failed workflow action, it was due to the error:

matlab-actions/setup-matlab@v1 is no longer supported. Update your workflow to use the most recent version of MATLAB actions.

I updated the CI file to use v2 and re-ran the workflow in the fork. Upon that, the workflow failed due to lack of space during python packages installation. I am not sure about the best way to deal with this, so any advice would be appreciated.

Code Review Before Merging

The differences between my code and NNV include, besides the ModelStar additions for weight perturbation analysis, many differences that arose, due to divergent development, in scripts that are not directly concerned with the base implementation of ModelStar. Hence, while the number of affected files is large, the number of changes per file for most files is small. Only a few files are changed extensively:

  1. lpsolver.m: I set gurobi as default since I was observing accuracy problems with linprog.

  2. verify_specification.m: I parallelized it using parfeval with a specified free memory margin, to use as much system memory as possible besides that margin. The memory margin could be increased to reduce memory usage. I configured the verification to parallelize only if a parallel pool already exists.

  3. NN.m: I added many functions to it. Some are directly useful, some (I think two) functions are specific to ModelStar experiments (JAIR draft) for comparison with other approaches and can be removed if we do not want to put those particular experiments in this repository, and some functions are utility helper functions. One existing function, reach_withConns, has been heavily changed; this is where I did a fair amount of my inter-layer memory management. I have set this management to activate due to an optional reachOptions field named "layer_specific_numCores".

  4. FullyConnectedLayer.m: Added support for modifying the input set according to weight perturbations in the layer, including support for augmenting an input set that already contains perturbations.

  5. Conv2D.m: I parallelized the computation of generators resulting from weight perturbations, and placed a check for whether or not a parallel pool exists already to determine whether or not to use parfor. However, it is important to note that, unlike the fully-connected case, multi-layer convolutional layer perturbations have NOT been implemented, i.e., the Conv2D reachability function will not account for the multiplicative interactions between the existing perturbations in the star set input to the perturbed convolutional layer and the perturbations within the layer. This is a possible feature that can be added with a bit of time, but I am not sure if it's desirable. At a higher scale, the processing of single-layer as well as multi-layer perturbations would be desirable for every linear layer, but if we are moving to python, it may be better to just implement this at an abstract class level using a parent linear layer class in the python version of the framework.

  6. collins_RUL_test.m: I added a tutorial based on a Collins RUL CNN where a single layer is perturbed and a single vnnlib is checked for satisfaction.

I would appreciate it if member(s) of the team could spare some time to take a look at these changes and coordinate with me where any further changes are required. I have not resolved a number of differences that appeared due to divergent development because I am not sure what to do with regards to them; I am not listing all minor differences here as it is more convenient to just look at them. However, it is important to note that my pull request will "appear mergeable" because it is a simple change of code devoid of the past history of the repository which would have assisted merging through pull-and-push (because, for my development, I had cloned/copied the code instead of forking/keeping track of the original git history), so it would be necessary for a member of your team who is familiar with the recent development history to eyeball, at the very least, the deleted/modified functions without relying on automatic merge to take care of these. The functions that are entirely new should be safer to process/prune later as well.

Some Miscellaneous Points

  1. One common category of many small changes in the updated NNV code is the removal of the toCPU functions. If these functions are unnecessary now, I can simply go ahead and remove them from my code; that will reduce clutter in the review pane.
  2. In ImageStar.m, the function estimateRange appeared twice with the same content, but different variable names. I have removed one instance.
  3. I used the polyfitn toolbox for polynomial fitting (for the purpose of LP memory usage estimation using constraint matrix size as the input). I would appreciate advice on the desirable way to automate the setup of polyfitn along with other toolboxes in the install script.
  4. I have not yet added tests for my additions. I will work on adding some basic tests in the coming week.

Thank you very much for your time and support, and I look forward to working with you.

@ttj
Copy link
Contributor

ttj commented Nov 9, 2025

thanks @m-usama-z ! I guess the CI stuff is currently broken due to some environment things. So, don't worry about that, we'll have to look into it on our end. However, can you please make sure all the tests run fine locally and/or confirm you've done that? If any problems, please let us know.

We don't need a ton of tests, just testing anything new you've added ideally (the example you've added on RUL of course helps as a regression test), as well as to ensure none of the existing tests break (especially given the number of changes).

@m-usama-z
Copy link
Contributor Author

Thank you for letting me know about the CI problem, Professor @ttj. I ran the existing tests locally and found a few issues that I could not resolve because they already exist in a fresh clone of NNV:

  1. test_CP fails because the file code/nnv/engine/nn/Prob_reach/Temp_files_mid_run/trained_Linear_weights_norm.mat is missing.

  2. test_NonLinearODE_stepReachStar and test10_NNCS_ACC fail with the following error message (putting the relevant common part here):

Unrecognized method, property, or field 'c' for class 'zonotope'.

Error in NonLinearODE/stepReachStar (line 359)
                    Zz = Zono(Z1.c, Z1.G);
  1. test_conversion_concatenatePolydedron fails the assertion. However, it appears to be a simple matter of the mismatch of order of half-planes between targetV and P.V (as shown below) and I could change the test to accommodate that if desirable.
>> targetV

targetV =

     1     0     1
     1     0     0
    -1     0     1
    -1     0     0
    -1     2     0
    -1     2     1
     1     2     0
     1     2     1

>> P.V

ans =

     1     0     1
     1     0     0
     1     2     1
     1     2     0
    -1     2     0
    -1     2     1
    -1     0     0
    -1     0     1

I will work on adding tests for the new features in parallel with my other deadlines.

@m-usama-z
Copy link
Contributor Author

Hello Professor @ttj, I have added tests for the three major functionalities added to NNV:

  1. test_fc_layer_perturbation for perturbation of fully-connected layer. Essentially, I create a toy NN with a small fully-connected layer, perturb it, get ranges of the output ImageStar set and compare those with ranges obtained from sampling, i.e., sampling of outputs of weight-perturbed NNs by plugging all combinations of the extreme values of the perturbed weights into copies of the NN. The toy NN has no non-linear layers for this test, which makes the test a simple equality comparison.

  2. test_conv_layer_perturbation for perturbation of Conv2D layer. The test is similar to the one for the fully-connected layer but here, I use an existing MNIST NN and remove all layers after the first Conv2D layer for the test. Then I compare ranges of the ImageStar set that is output from the perturbed Conv2D layer with the ranges from sampling.

  3. test_verify_specification_parallelization for testing parallelization of verify_specification. This parallelization is done across properties, so I use an MNIST NN which translates to 9 properties. I perturb a fully-connected layer in the MNIST NN and test whether the property verification results are the same for the single-threaded case as well as the parallelized case. This comparison is done for a robust case (small perturbation in FC layer) as well as a non-robust case (large perturbation in FC layer) for a single image. Essentially, the test is with regards to the feature of verify_specification that cancels the parallelized LP solution launches as soon as any property is found to not be satisfied.

I hope this covers the major changes. I shall be grateful if a member of your team can take a quick look at the changes in this PR and let me know if another test/modifications are needed. Thank you for your time and consideration.

@m-usama-z
Copy link
Contributor Author

With regards to an example for inclusion in the paper: we may use an MNIST CNN in this regard. I do not know of any existing framework for verification against convolutional layer weight perturbations, so we should not need to compare these results with any other approaches. The code for the Conv2D test, test_conv_layer_perturbation, can be extended (it already uses an MNIST CNN), or a more extensive experiment like the plots in the ModelStar draft (layer-wise perturbation trends) are also possible (I have code for managing such an experiment linked in the JAIR draft). Please let me know what sort of example is desirable and what is necessary from my side. Thank you again for your feedback and support.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 51 out of 55 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ttj
Copy link
Contributor

ttj commented Nov 27, 2025

thanks Usama!

Re some of the questions:

  1. For all the cpu functions: yes, if no longer used, I would get rid to avoid all the changes and it will simplify review
  2. For the polyfitn: I looked at the license, you can just include it in a subfolder with the license, it is allowed to redistribute with ack, so that will be easiest solution unless it is on github somewhere, as doing from the matwokrs site requires login it seems and will be a pain
  3. For other changes and high-level questions: I would suggest for whatever you have added, ensure things still work as they would have before for everything not using the weight perturbation aspects, unless there were bugs you needed to resolve. This should simplify and avoid having unintended side effects.

Hopefully Samuel/Diego/Anne will have some time to look over the other changes and see if any concerns or whether we can just merge.

I am working on a new set of tests that also will help us with the merge and soundness regression checking, but probably we will merge your changes first, then I will merge those and see if any problems introduced. But it will be good to have the other set of eyes on first.

@ttj
Copy link
Contributor

ttj commented Nov 27, 2025

testing review @claude

@claude
Copy link

claude bot commented Nov 27, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@m-usama-z
Copy link
Contributor Author

I have removed the toCPU functions for all the cases where I found them to be absent in NNV's code, except the Conv2D layer: layer wide perutrbations in Conv2D require many convolutions and GPU usage makes them faster, but parallelizing them can cause GPU memory to run out, so my code switches its parameters to CPU if GPU memory runs low. I also deleted some utility function files that were absent from the NNV code and were not used by me either. I have also added polyfitn in the utils folder.

I have, in general, checked logical coherence of changes to avoid breaking existing functionality. There are some changes, however, that I would suggest at least a quick review of before merging:

  1. I would appreciate a review of lpsolver.m. I had found some time ago that linprog can cause critical accuracy issues that gurobi does not. Unfortunately, I do not have the example on hand (I forgot which example it was), but perhaps the usage of linprog can be reconsidered at some point after this incoming deadline. There are some other changes I made regarding this: I had changed the default lpsolver in the ImageStar/getRanges function to gurobi, but I can change it back to linprog. I made a similar change of setting the default solver to gurobi in Star/isEmptySet (that reminds me, one of the issues may have been with wrong results about sets being empty), but I can also reverse this to linprog if desirable.
  2. There is a change at line ImageStar.m line 1203 (new_points(i, :) = [p, channel_id];) that made more sense to me, kindly check that.

Other than these, I do not have any logical confusions or any observations to make about things breaking.

@ttj
Copy link
Contributor

ttj commented Nov 28, 2025

thanks Usama, sounds great! We will try to check over soon, I have been working on an improved test suite that also will hopefully help us identify if any problems are introduced. But hopefully @mldiego @sammsaski @atumlin can help check some of the details. For the solver, I thought we had made this globally configurable, but will check on it.

one quick question as we start to try to merge: can you please point me at a simple weight perturbation example I can run, to see if everything is looking appropriate? I think you have set up in tests or examples from the conversation, but direct pointer(s) to what we should run for your additions will help make sure we've tested that things are working for your additions properly

@m-usama-z
Copy link
Contributor Author

Sure, Professor. Running the tests in tests/weightPerturb should be enough to test the major additions. You may also run examples/Tutorial/NN/Collins_RUL/collins_RUL_test.m.

Regarding the lpsolver being globally configurable, I was unaware of that, in part because linprog was hardcoded at some locations. I appreciate your team checking up on that. Let me list the hardcoded functions I know of here:

  1. ImageStar/getRanges: linprog is set as default, and there are function calls in Tutorials that don't pass any lpsolver configuration so this could be an issue.
  2. Star/getRange: no lpsolver option is passed by Star.m to lpsolver here, and this will use linprog with the original lpsolver.m (I have the default set to gurobi in my lpsolver.m for now).
  3. Star/isEmptySet: no lpsolver option is passed here, so I hardcoded gurobi instead of linprog.
  4. Star/isEmptySet_static: This is an addition by me. I only used it in verify_specification for memory efficiency during parallelization. Might have to pass an lpsolver option to verify_specification for configurability here.
  5. NN.m: I changed the default lpsolver to gurobi here.

I will make whatever changes are advised with regards to global configurability of lpsolver.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 34 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sammsaski
Copy link
Contributor

For the upcoming tool paper, will using Gurobi introduce any issues for the artifact submission part of it? That is another license we will be responsible for.

It might make the most sense to just use linprog for now and test with Gurobi later on. Were you able to reproduce the accuracy issues you were seeing?

@m-usama-z
Copy link
Contributor Author

Hello @sammsaski,

Regarding linprog, unfortunately, I would have to spend some time trying to reconstruct a case where linprog was failing, so I went ahead and changed the defaults back to linprog for the time being. Later on, it should be sufficient to simply be able to change the solver globally as needed (which is not the case right now due to it being hardcoded at some locations) and then the user of NNV could use gurobi as desirable for better accuracy, but this global configurability update could be done after the submission.

As for the experiment, I have added the multi-layer MNIST perturbation experiment you requested in the Tutorial/NN/MNIST/weightPerturb directory. Please change the number of images per class to be verified in line 7 (the NPerClass variable, you may run for 100 images by setting this to 10), and the results are plotted after line 220 using the following four variables in each iteration (one iteration for each of the three plots):
percentages_verif_modelstar
percentages_verif_towards
times_modelstar
times_towards
Please feel free to print this information as desirable for tables, etc. I can help further with that if there are any confusions. I suggest working on the printing code first while NPerClass is 1, and then changing NPerClass to 10.

Last but not least, this code should work as is if the compute has 120GB of free memory (the code appears to be using 104GB in my testing), otherwise please change the memory management configuration in lines 96-101. Using the commented config, lines 96-98, instead of the current config, lines 99-101, enables automatic memory management, but that can result in frequent recreation of the parallel pool to adjust the number of workers, which can make the code rather slow. But hopefully, we should not need to worry about it and the current setting (64 workers throughout without automatic memory management) should work because this is a rather small, fully-connected network. The reason automatic memory configuration can make the code slow is that I stayed conservative in the data points used for polynomial estimation of memory usage using the size of the inequalities matrix, in order to be on the safer side and avoid crashing MATLAB. Also, memory management has only been implemented in the NN/reach_withConns function but can be extended to other reachability functions.

The simpler alternative to using automatic memory management for low-memory is to simply reduce the number of workers from 64 to an appropriate quantity for the memory available, in lines 99-101 (if parallelization is not desired at all, all lines 96-101 should be commented). However, I just thought I would take this chance to explain some important characteristics regarding the automatic memory management implementation.

@m-usama-z
Copy link
Contributor Author

@sammsaski, Apologies, just one more important thing I should have mentioned (although you would notice it soon during testing): the experiment as it is now is going to take hours to verify 100 images (maybe around 4 hours). If a smaller compute time is desirable while keeping the number of images large, the option to perturb only a single-layer layer instead of 2 layers is still there (I provided this code because you selected the two-layer perturbation experiment, but the single-layer experiment would be faster). However, if compute time being a few hours is not a concern, then there should be nothing to worry about. Please let me know if a different experiment is required.

@ttj
Copy link
Contributor

ttj commented Nov 29, 2025

thanks Usama and Samuel! for the purpose of repeatability, I would suggest we definitely have something with lower compute requirements (both memory and total runtime) as a demonstration for it in the paper. the artifact evaluation reviewers are not going to be able to do that, generally it is reasonable to assume they have a laptop and will spend maybe a few hours on this total. if any other existing example is sufficient for that, great, but needed >32GB RAM (and maybe even >16GB) is a stretch, and total runtimes we should aim an hour max (as there also will be other experiments that also will take some time). basically, we just need for this a demo and sanity check to illustrate the capabilities of the weight perturbations, not a full experimental analysis

@m-usama-z
Copy link
Contributor Author

Thank you for that explanation, Professor @ttj. In that case, one option is to simply compare output ranges of our framework with ranges from an existing work for a single image, it works as a demo as well as a check. Another option is layer-wise perturbation of ACAS Xu and verification of vnnlib properties. Since it's a small NN, this quickly covers many layers (the caveat is that these properties' input specifications are not properly used, rather the input upper and lower bounds are averaged to get a single input for which the property is verified in the presence of weight perturbations). Last but not least, we could perhaps perturb only the last 2 or 3 layers of the MNIST MLP, one at a time, if image classification is preferred over ACAS Xu. @sammsaski, please let me know which is preferable, and I can help in setting it up.

@sammsaski
Copy link
Contributor

I think only perturbing the last 2 or 3 layers for the MNIST MLP would be best

@m-usama-z
Copy link
Contributor Author

I have added code for single-layer perturbation in the directory examples/Tutorial/NN/MNIST/weightPerturb/single_layer. The main script is conv_expt_any_layer, but the driver script is run_expt_for_compute. The experimental results will be stored in the file MNIST_MLP.yaml in the results directory in the same folder; please feel free to change paths for this results file in the driver script (the results_file variable, without extension). The yaml file will contain layer-wise results, with layers stored in the reverse order (last layer first, and so on). You can change change the number of layers to run and plot for in the driver script (the n_layers_to_run_for_from_yaml_file variable). There is an existing yaml file, results/MNIST_MLP_empty_results_file.yaml, with no results in it; please do not change that except if you want to change the number of images to run for, that is n_images in line 3; changing this will be fine, but it should be a multiple of 10 to keep it balanced for MNIST. After adjusting these settings, you may run the driver script which will create a copy of this empty file at the specified path, run the experiment, update the results in that file and plot after processing every layer.

The code for plotting is in the EXPT.m file (the plot_results function is called towards the end in conv_expt_any_layer); you may copy and reuse that function for printing the results in tables, etc. if needed, but I hope we don't need tables with the extensive figure. Also, please feel free to paraphrase the experimental description from the conference paper, which I can help with as needed. Thank you for your time. Please ask me about any confusions as needed.

@ttj
Copy link
Contributor

ttj commented Dec 1, 2025

great, thanks Usama!

I will plan to merge things tonight or tomorrow morning. The test suite I've made is about ready, so I'll try to run your perturbation examples and the regression test suite to identify any issues, but @sammsaski please let me know if you notice anything that needs further review. I have fixed some minor issues while creating the test suite as well, so it will be good if my changes also are reviewed as there are quite some updates.

As the paper deadline is coming up quickly, please focus on finalizing that and any further updates it needs based on the status, as the artifact evaluation will be afterward and more time for that, but I think we'll be in reasonable shape for that.

@sammsaski
Copy link
Contributor

sammsaski commented Dec 1, 2025

Sounds good. I've been taking a look and am set up to run tests tonight to verify that no soundness issues introduced from changes as well.

end
end

function p = fit_poly_to_mem_consumption_of_LP(obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this maybe shouldn't belong in this file...?

end

% for ModelStar experiments. Can be removed.
function net = acas_combine_affine_layer_with_fc(obj, args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this definitely doesn't belong here, this is for an example, not the general NN class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone over this one in a bit of detail. I think quite a number of these changes may problematic unfortunately, and I am not sure we should merge in the current status. Quite a few things appear maybe were temporary changes or don't make sense to be in this file, along with some other exemplars (eg the modifications to some layers) as I've gotten into the details.

As the FM paper is going to be due shortly, I would suggest we focus just on the paper right now and include the high-level for the perturbations, and then we will need to refactor significantly the weight perturbation parts before the artifact evaluation.

I would suggest for layers that have weight perturbations, a new layer type is introduced for those that is a subclass or extension of the existing layer types (eg, like Conv2DLayerWP.m or similar, which would subclass from Conv2DLayer, and allow for more easily adding the weight perturbation fields), and I think will allow for cleaner handling of the weight perturbation specific parts versus modifying the existing codebase all over to handle this specific use cases. Further, a lot of the parallel processing handling I think needs to be handled better or appears like it was for some temporary experimentation. But maybe I am missing something.

If all the tests are passing with all these changes though, maybe it is okay though so @sammsaski please weigh in

@m-usama-z
Copy link
Contributor Author

Thank you for the feedback, Professor @ttj. I had a similar idea early on, as in the weight perturbation specific changes could be moved over, and the rest, like memory management, can be worked out later; especially if it is not going to be mentioned in the paper. The memory usage prediction for LPs using a polynomial is indeed experimental; it is not a finalized feature, and it has been implemented in one variant of the reachability function only.

If the only experiment that is going to be in the paper is the aforementioned MNIST MLP experiment, it appears that all that would have to be finalized for now is the fully-connected (FC) layer weight perturbation processing for the single-layer case, which is relatively simple (the multi-layer perturbation part of the code is a bit more involved). If I'm not mistaken, the artifact submission deadline is Feb 13? That should give quite some time to integrate the rest of the desirable changes more cleanly, perhaps over multiple PRs, to streamline the review and integration process. Although, even for now, the code for the MNIST MLP experiment discussed above would have to be changed in sync with the changes in implementation of the FC layer weight perturbations.

I'm not sure if creating subclasses is better (I don't have much experience with inheritance in MATLAB), but if your team finds it easier to modify the NN layers in that fashion, that may be fine. One option is to try implementing just the FC layer single-layer perturbation case first, and testing it. If that change can be made quickly, the experiment code, conv_expt_any_layer, can then be changed/stripped accordingly. The EXPT.m file is primarily for the experiment's yaml management and plotting, so that file is not a concern. I can coordinate in implementing this option, if this is desirable.

@ttj
Copy link
Contributor

ttj commented Dec 1, 2025

thanks Usama! Yes, I think for the paper given the deadline, if it is simple to include just a small bit of changes for the example we include in the paper, that would be ideal, then we can focus on merging and refactoring afterward the full set of changes. as all the proof of concept is there and implemented, I think it is fine to include and highlight this new support, but we should refactor before incorporating all these changes.

I will go ahead and finalize my test suite and merge that first. this will also hopefully help with your changes when we merge the full ones, as I have at least some actual regression tests now (previously most of the tests were visual with not so many checking results).

for how to implement, I think anything is fine, but the subclass suggestion was partly to hopefully help keep the code separate: it seems there are special purpose things that need to be done for handling the weight perturbations, which as of now, are getting mixed in with the rest of the analysis code, but maybe I have missed something

similarly, for all examples and things that are not general features, definitely those should be in example/test files for those tied to specific benchmarks like acas, mnist, etc., and likewise utilities separate where appropriate, and not in the main classes

@m-usama-z
Copy link
Contributor Author

Noted, Professor @ttj. I will try to create a separate pull request with changes for the fully-connected layer only within tonight. However, the modification of the experiment's code will likely get pushed to tomorrow; I hope to finish that within tomorrow. I will keep the FC layer's code in the same class (FullyConnectedLayer.m) for now. The base code for its weight perturbation analysis can be mostly cleanly separated from the rest of the analysis code within the reachability function, as in, it is mostly under an if condition that only activates if there are perturbations. Although it uses an object property to keep track of the layer's perturbation specifications, this should not interfere with any other functionality, to the best of my understanding.

For the cleaning up of the rest of the code of NN.m into appropriate files, I will be happy to coordinate in that after the JAIR draft deadline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants