Skip to content

matthewberger/seeing-the-many

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Seeing the Many: Exploring Parameter Distributions Conditioned on Features in Surrogates

This project contains the source code for our paper "Seeing the Many: Exploring Parameter Distributions Conditioned on Features in Surrogates".

The code allows one to:

  1. Run a set of simulations covering a predefined parameter space for use in surrogate modeling.
  2. Build surrogate models.
  3. Evaluate the surrogates.
  4. Serve the surrogate models to a web-based client for visual exploration.

Running simulations

We have provided scripts for running WaterLily simulations, namely for a circle obstacle that varies in position, and gerris simulations, namely for NACA airfoil and Rayleigh-Taylor instability simulations.

circle

To run simulations for the circle obstacle, go to directory data/WaterLily/circle, and run the following:

./run.sh

This will run the WaterLily simulator for a randomly-sampled position of the circle obstacle, for 1,000 different parameter configurations. Each simulation will be stored in a directory titled sim_id, with id being an integer to uniquely identify the parameter configuration.

The resulting time-varying vector field for each simulation will be stored as a sequence of NumPy arrays for individual time steps, while metadata.json will contain the parameter information.

Note: depending on the versions of julia and WaterLily, it is possible that some simulations might fail to run. There is a check.py script to identify those failed simulations, for which one can rerun the simulation again using run.sh.

NACA airfoil

To run simulations for NACA airfoils of varying shape, go to directory data/gerris/assym_airfoil.

To set up the directories containing paramter-specific information, first run the following:

python setup_sims.py all_data.json

This will create 1,000 directories with names sim_id, in analogy to the above.

Once the directories have been created, simply run:

./run_sims.sh

This will run gerris for each parameter configuration. The resulting time-varying vector field and parameters will be stored in a similar manner.

Rayleigh-Taylor instability

This follows a similar format. First, go to directory data/gerris/rt and run:

python populate_sims.py

This will create 1,700 directories, each directory corresponding to a parameter configuration.

Next, run:

./run_sims.sh

The data format is identical to what is produced in the NACA airfoil simulations.

Building surrogates

We allow for the construction of two types of neural field surrogates: a coordinate & parameter-conditioned SIREN, and an ensemble of SIRENS. The latter can be used to compare uncertainty estimates, but note the former is only used for our visualization.

To optimize a SIREN for modeling a simulation, run the following:

python optimize.py --data_dir path_to_data --out path_to_output --simulation_data sim_type

path_to_data should be replaced with the directory containing all simulations, e.g. for the WaterLily circle simulation this would be data/WaterLily/circle. path_to_output is a directory that will contain the outputs of the surrogate model - this will be created if it doesn't exist. sim_type is either "wl" (for WaterLily) or "gerris".

By default, a SIREN is optimized. To instead use a deep ensemble, pass in --net_type EnsembleParamSIREN.

You may specify the train/test split along parameter configurations via --n_params, specifying the number of simulations to use as training data, and --total_param, representing the total number of parameter configurations for the simulation. By default, the training data size is set to 100, and the total number of simulations is set to 1,000 - this should be set based on the number of simulations you have ran (e.g. for Rayleigh-Taylor instability this would be 1,700 if you ran all simulations). Moreover, --data_seed is an integer to uniquely identify the specific train/test split.

See optimize.py for additional arguments related to the SIREN model.

Evaluating surrogates

To evaluate the model, run the following:

python evaluate.py --data_dir path_to_data --net path_to_net

path_to_net is the directory that contains the trained network produced by the optimizer. The evaluation will produce the file evaluation.json within path_to_net, logging the MSE and PSNR of each withheld simulation.

Serving surrogates for visual exploration

To use the surrogate model for visual exploration, run the following to create a Flask server locally on your machine:

python server.py --data_dir path_to_data --net path_to_net --simulation_name name

The first two arguments are identical to what is passed in to evaluate.py. The simulation_name argument is a name unique to the simulation. This is used to retrieve simulation-specific configurations. Please see server.py for more information, specifically the names given to the three simulations.

Upon running the server for a newly-optimized surrogate, information necessary for the density estimate is precomputed, and stored where the network exists. Upon subsequent server runs, no precomputation will be performed, and the stored information will instead be referenced.

Once the server is running, go here for the web-based visualization client:

https://observablehq.com/@mattberger/seeing-the-many

In the interface you can make choices on the spatial resolution of the vector glyphs, the radius of a hexbin in the binned heatmap plots, the type of analysis to perform (single feature vs. feature comparison), and whether or not to subtract the background flow in the vector fields. The background subtraction assumes a constant, unit horizontal motion, you might wish to modify this depending on the simulation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors