This project contains the source code for our paper "Seeing the Many: Exploring Parameter Distributions Conditioned on Features in Surrogates".
The code allows one to:
- Run a set of simulations covering a predefined parameter space for use in surrogate modeling.
- Build surrogate models.
- Evaluate the surrogates.
- Serve the surrogate models to a web-based client for visual exploration.
We have provided scripts for running WaterLily simulations, namely for a circle obstacle that varies in position, and gerris simulations, namely for NACA airfoil and Rayleigh-Taylor instability simulations.
To run simulations for the circle obstacle, go to directory data/WaterLily/circle, and run the following:
./run.sh
This will run the WaterLily simulator for a randomly-sampled position of the circle obstacle, for 1,000 different parameter configurations. Each simulation will be stored in a directory titled sim_id, with id being an integer to uniquely identify the parameter configuration.
The resulting time-varying vector field for each simulation will be stored as a sequence of NumPy arrays for individual time steps, while metadata.json will contain the parameter information.
Note: depending on the versions of julia and WaterLily, it is possible that some simulations might fail to run. There is a check.py script to identify those failed simulations, for which one can rerun the simulation again using run.sh.
To run simulations for NACA airfoils of varying shape, go to directory data/gerris/assym_airfoil.
To set up the directories containing paramter-specific information, first run the following:
python setup_sims.py all_data.json
This will create 1,000 directories with names sim_id, in analogy to the above.
Once the directories have been created, simply run:
./run_sims.sh
This will run gerris for each parameter configuration. The resulting time-varying vector field and parameters will be stored in a similar manner.
This follows a similar format. First, go to directory data/gerris/rt and run:
python populate_sims.py
This will create 1,700 directories, each directory corresponding to a parameter configuration.
Next, run:
./run_sims.sh
The data format is identical to what is produced in the NACA airfoil simulations.
We allow for the construction of two types of neural field surrogates: a coordinate & parameter-conditioned SIREN, and an ensemble of SIRENS. The latter can be used to compare uncertainty estimates, but note the former is only used for our visualization.
To optimize a SIREN for modeling a simulation, run the following:
python optimize.py --data_dir path_to_data --out path_to_output --simulation_data sim_type
path_to_data should be replaced with the directory containing all simulations, e.g. for the WaterLily circle simulation this would be data/WaterLily/circle. path_to_output is a directory that will contain the outputs of the surrogate model - this will be created if it doesn't exist. sim_type is either "wl" (for WaterLily) or "gerris".
By default, a SIREN is optimized. To instead use a deep ensemble, pass in --net_type EnsembleParamSIREN.
You may specify the train/test split along parameter configurations via --n_params, specifying the number of simulations to use as training data, and --total_param, representing the total number of parameter configurations for the simulation. By default, the training data size is set to 100, and the total number of simulations is set to 1,000 - this should be set based on the number of simulations you have ran (e.g. for Rayleigh-Taylor instability this would be 1,700 if you ran all simulations). Moreover, --data_seed is an integer to uniquely identify the specific train/test split.
See optimize.py for additional arguments related to the SIREN model.
To evaluate the model, run the following:
python evaluate.py --data_dir path_to_data --net path_to_net
path_to_net is the directory that contains the trained network produced by the optimizer. The evaluation will produce the file evaluation.json within path_to_net, logging the MSE and PSNR of each withheld simulation.
To use the surrogate model for visual exploration, run the following to create a Flask server locally on your machine:
python server.py --data_dir path_to_data --net path_to_net --simulation_name name
The first two arguments are identical to what is passed in to evaluate.py. The simulation_name argument is a name unique to the simulation. This is used to retrieve simulation-specific configurations. Please see server.py for more information, specifically the names given to the three simulations.
Upon running the server for a newly-optimized surrogate, information necessary for the density estimate is precomputed, and stored where the network exists. Upon subsequent server runs, no precomputation will be performed, and the stored information will instead be referenced.
Once the server is running, go here for the web-based visualization client:
https://observablehq.com/@mattberger/seeing-the-many
In the interface you can make choices on the spatial resolution of the vector glyphs, the radius of a hexbin in the binned heatmap plots, the type of analysis to perform (single feature vs. feature comparison), and whether or not to subtract the background flow in the vector fields. The background subtraction assumes a constant, unit horizontal motion, you might wish to modify this depending on the simulation.