-
Notifications
You must be signed in to change notification settings - Fork 18
User Manual
PyBioNetFit is a tool for parameter fitting of models written in the BioNetGen language (BNGL).
PyBNF requires an installation of Python version 3.5 or higher. This should come built-in with most new Linux and Mac operating systems. If you are running Windows or an older operating system, you may need to download the latest release at https://www.python.org/downloads/.
In addition, you will need a utility for installing Python packages including PyBNF. If you don’t have one already, we recommend pip. **[This is nontrivial if you are not tech savvy and/or not on Linux. Especially because the default is to have pip affect Python2 and pip3 affect Python3, and you may need to install them separately with pip being easier. Need to play with a bunch of Mac and Windows computers in order to write these instructions.] **
PyBNF requires the Python packages dask, pyparsing, and numpy. If you have pip, you can install these by running the command sudo pip3 install dask pyparsing numpy If you are on a shared computing cluster or other machine for which you do not have root access, you can install the packages for only the current user with pip3 install dask pyparsing numpy --user
After installing Python >= 3.5 and the package dependencies, navigate to the PyBNF folder and run the following command to install PyBNF pip3 install -e . --user
To confirm that the installation was successful, you should be able to run the command pybnf to launch PyBNF.
If you have any trouble with the installation process, please contact [somebody].
PyBNF is designed primarily to work with the simulator BioNetGen, version 2. The current BioNetGen distribution includes support for both network-based simulations and network-free simulations (via the NFSim software) BioNetGen can be installed from https://www.bionetgen.org.
PyBNF will need to know the location of BioNetGen – specifically the location of the script BNG2.pl within the BioNetGen installation. This path can be included in the PyBNF configuration file (see below). A convenient alternative is to set the environment variable BNGPATH to the BioNetGen directory with the following command, where /path/to/bng2 is the path of the folder that contains BNG2.pl: export BNGPATH=/path/to/bng2 This setting can be made permanent as of your next login, by copying the above command into the file .bash_profile in your home directory.
Models for fitting in PyBNF are plain text files written in BioNetGen language (BNGL). Documentation for BNGL can be found at http://www.csb.pitt.edu/Faculty/Faeder/?page_id=409.
Two small modifications of a BioNetGen-compatible BNGL file are necessary to use the file with PyBNF
- Replace each value to be fit with a name that ends in the string “FREE”.
For example, if the parameters block in our original file was the following:
begin parameters
v1 17
v2 42
v3 37
NA 6.02e23
end parameters
the revised version for PyBNF should look like:
begin parameters
v1 v1__FREE__
v2 v2__FREE__
v3 v3__FREE__
NA 6.02e23
end parameters
We have replaced each fixed parameter value in the original file with a “FREE” parameter to be fit. Parameters that we do not want to fit (such as the physical constant NA) are left as is.
- Use the “suffix” argument to create a correspondence between your simulation command and your experimental data file.
For example, if your simulation call simulate({method=>”ode”})
generates data to be fit using the data file data1.exp, you should edit your call to simulate({method=>”ode”, suffix=>”data1”})
Experimental Data Files
Experimental data file are plain text files with the extension “.exp” that contain whitespace-delimited tables of data to be used for fitting.
The first line of the .exp file is the header. It should contain the character #, followed by the names of each column. The first column name should be the name of the independent variable (e.g. “time” for a time course simulation). The rest of the column names should match the names of observables in the model file. The following lines should contain data, with numbers separated by whitespace. Use “nan” to indicate missing data. Here is a simple example of an exp file. In this case, the corresponding BNGL file should contain observables named X and Y:
# time X Y
0 5 1e4
5 7 1.5e4
10 9 4e4
15 nan 6.5e4
20 15 1.1e5
If your are fitting with the chi-squared objective function, you also need to provide a standard deviation for each experimental data point. To do so, include a column in the .exp file with "_SD" appended to the variable name. For example:
# time X Y X_SD Y_SD
0 5 1e4 1 2e2
5 7 1.5e4 1.2 2e2
10 9 4e4 1.4 4e2
15 nan 6.5e4 nan 5e2
20 15 1.1e5 0.9 5e2
The configuration file is a plain text file with the extension “.conf” that specifies all of the information that PyBNF needs to perform the fitting: the location of the model and data files, and the details of the fitting algorithm to be run.
Several examples of .conf files are included in the examples/ folder.
Each line of a conf file has the general format config_key=value, which assigns the configuration key “config_key” to the value “value”.
The available configuration keys to be specified are detailed in the sections below.
[During development, please refer to config_documentation.txt]
PyBNF contains 7 fitting algorithms that I will describe here later.
Differential Evolution
How it works
Running in Parallel PyBNF offers parallel, synchronous differential evolution. In each iteration, n simulations are run in parallel, but all must complete before moving on to the next iteration. It also offers parallel, asynchronous differential evolution, in which the current population consists of m islands. Each island is able to move on to the next iteration even if other islands are still in progress. If m is set to the number of available processors, then processors will never sit idle. Note however that this might not be the most efficient thing to do.
When to use it In our experience, differential evolution tends to be the best general-purpose algorithm, and we suggest it as a starting point for a new fitting problem if you are unsure which algorithm to choose.
Configuration options
Scatter Search
How it works Scatter seach functions similarly to differential evolution, but maintains a smaller current population than the number of available processors. In each iteration, every possible pair of individuals are combined to propose a new individual.
Particle Swarm Optimization
How it works In particle swarm optimization, each parameter set is represented by a particle moving through parameter space at some velocity. Each particle accelerates towards the
Running in Parallel Particle swarm optimization is fundamentally an asynchronous, parallel algorithm. As soon as one simulation completes, that particle can calculate its next parameter set and begin a new simulation. Processors will never remain idle, and adding an arbitrarily large number of processors will continue to improve the performance of the algorithm [citation needed].
When to use it Particle swarm optimization becomes advantageous over the other available algorithms when many processors are available (>100). Be warned that if your problem is under-constrained, this algorithm tends to choose parameters that sit on the edge of box constraints. This solution is arguably fine, but makes it very obvious to a reader that your model is under-constrained.
Simulated Annealing
Markov Chain Monte Carlo
Parallel Tempering
Simplex
** [This section is outdated since the code has been updated for more streamlined running on SLURM clusters] **
PyBNF is designed to run on computing clusters, regardless of what cluster manager is used (Slurm, Torque, etc.). The user is expected to interact with the cluster manager to allocate cluster nodes for the job, and then tell PyBNF which nodes to run on.
The package dask, which you installed as a dependency of PyBNF, is responsible for handling distribution of simulations on a cluster. Currently, the user is responsible for setting up a job scheduler with dask, and passing this information to pybnf (sorry).
The following is a guide for this setup process. This guide assumes that your cluster is running Slurm, but for other cluster manages, it is expected to be possible by replacing the Slurm-specific commands (salloc, squeue, slogin) with the corresponding commands for your cluster manager.
-
Allocate some number of nodes for the job salloc -N 3
-
Make note of the names of the nodes you have (replace "yourname" with your user name) squeue -u yourname Suppose in this example, the command tells you that you have nodes cn101, cn102, and cn103.
-
Log in to one of your nodes slogin
-
Verify that the node is running Python 3.5 or higher (such as by running the command python3). If not, you will need to load or install Python 3.5+ using methods specific to your cluster.
-
Start a screen, in order to run multiple commands simultaneously. This screen will be running the dask scheduler while your main command line will be running PyBNF. screen
-
On your new commandline within the screen, run the following command to start up dask. Replace the node names here with the ones that you noted in step 2. dask-ssh cn{101,102,103} This will generate a lot of output as dask starts up. The key information to note is the IP address and port of the scheduler, which can be found here, toward the start of the output:
Dask.distributed v1.20.2
Worker nodes: 0: cn101 1: cn102 2: cn103
[ scheduler cn117:8786 ] : /projects/opt/centos7/python/3.5.1/bin/python3.5 -m distributed.cli.dask_scheduler --port 8786 [ worker cn117 ] : /projects/opt/centos7/python/3.5.1/bin/python3.5 -m distributed.cli.dask_worker cn117:8786 --nthreads 0 --nprocs 1 --host cn117 [ worker cn613 ] : /projects/opt/centos7/python/3.5.1/bin/python3.5 -m distributed.cli.dask_worker cn117:8786 --nthreads 0 --nprocs 1 --host cn613 [ scheduler cn117:8786 ] : distributed.scheduler - INFO - ----------------------------------------------- [ scheduler cn117:8786 ] : distributed.scheduler - INFO - Scheduler at: tcp://192.168.100.82:8786 [ scheduler cn117:8786 ] : distributed.scheduler - INFO - Local Directory: /tmp/scheduler-fpue0k6p [ scheduler cn117:8786 ] : distributed.scheduler - INFO - ----------------------------------------------- [ worker cn117 ] : distributed.nanny - INFO - Start Nanny at: 'tcp://192.168.100.82:35223' [ worker cn613 ] : distributed.nanny - INFO - Start Nanny at: 'tcp://192.168.100.166:45623' [ scheduler cn117:8786 ] : distributed.scheduler - INFO - Register tcp://192.168.100.166:40537 [ scheduler cn117:8786 ] : distributed.scheduler - INFO - Register tcp://192.168.100.82:41377 [ scheduler cn117:8786 ] : distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.100.166:40537 [ scheduler cn117:8786 ] : distributed.scheduler - INFO - Starting worker compute stream, tcp://192.168.100.82:41377 [ worker cn117 ] : distributed.worker - INFO - Start worker at: tcp://192.168.100.82:41377 [ worker cn117 ] : distributed.worker - INFO - Listening to: tcp://192.168.100.82:41377 [ worker cn117 ] : distributed.worker - INFO - nanny at: 192.168.100.82:35223 [ worker cn117 ] : distributed.worker - INFO - Waiting to connect to: tcp://cn117:8786 [ worker cn117 ] : distributed.worker - INFO - -------------------------------------------------
-
Press Ctrl-A, D to detafch from the screen and return to your original terminal.
-
Run pybnf! Use the -a flag to pass the IP address and port that you noted in step 6. pybnf -c path/to/config_file.conf -a 192.168.82:8786
-
Shutdown: After you are finished running PyBNF, shut down the scheduler. screen -r Reattach to the screen containing the scheduler [Ctrl-C] exit Finally, relinquish your allocated cluster nodes. In Slurm [or at least on Darwin], this is accomplished with two more exit commands.