The master configuration file (.config/masterconfig.yaml) is the entry point for running mspangenome simulations. It defines which demographic models to run and how many replicates to generate for each.
The configuration file uses a simple YAML format with two main sections:
# Named sample configurations
samples:
sample_name:
model: "path/to/demographic_model.json"
replicates: int
# Global settings
output_dir: "results/"
memory_multiplier: 1Each entry in the samples section defines a simulation run:
| Parameter | Type | Description | Example |
|---|---|---|---|
model |
string | Path to the demographic model JSON file | "simulation_data/Panmictic_Model.json" |
replicates |
integer | Number of simulation replicates to run | 10 |
-
Single replicate (
replicates: 1): The sample name is used as-is- Config:
baseline→ Output directory:results/baseline/
- Config:
-
Multiple replicates (
replicates: > 1): A_repNsuffix is added- Config:
test_runwithreplicates: 3 - Output directories:
results/test_run_rep1/,results/test_run_rep2/,results/test_run_rep3/
- Config:
samples:
quick_test:
model: "simulation_data/Test_Model.json"
replicates: 1samples:
panmictic_baseline:
model: "simulation_data/Panmictic_Model.json"
replicates: 1
island_analysis:
model: "simulation_data/Island_Model.json"
replicates: 5
complex_demography:
model: "simulation_data/Complex_Model.json"
replicates: 10samples:
# Fixed parameters - good for baseline
fixed_params:
model: "simulation_data/Fixed_Model.json"
replicates: 1
# Range parameters - explores uncertainty
parameter_uncertainty:
model: "simulation_data/Model_With_Ranges.json"
replicates: 100 # Each replicate samples different values from ranges| Parameter | Type | Description | Default |
|---|---|---|---|
output_dir |
string | Base directory for all simulation outputs | "results/" |
memory_multiplier |
float | Memory scaling factor for cluster jobs (increase if OOM errors) | 1 |
succint |
bool | If True, skip visialization rule | False |
# .config/masterconfig.yaml
# Named sample configurations
samples:
# Baseline simulation with fixed parameters
baseline:
model: "simulation_data/Fixed_Baseline.json"
replicates: 1
# Test different parameter values
mutation_rate_test:
model: "simulation_data/Mutation_Range_Model.json"
replicates: 20 # Each rep samples different mutation rate
# Compare different demographic scenarios
panmictic:
model: "simulation_data/Panmictic_Model.json"
replicates: 10
island:
model: "simulation_data/Island_Model.json"
replicates: 10
# Production run with uncertainty quantification
production_analysis:
model: "simulation_data/Production_Model_Ranges.json"
replicates: 100
# Global settings
output_dir: "results/"
memory_multiplier: 1.5
succint: FalseWhen your demographic model contains parameter ranges (see Demographic Model Configuration):
- Fixed parameters: All replicates use identical values
- Range parameters: Each replicate randomly samples from the specified ranges
- Mixed models: Fixed parameters stay constant, ranges are sampled
Example: If Model_With_Ranges.json contains:
{
"evolutionary_params": {
"mutation_rate": {"min": 1e-8, "max": 1e-6}, // Range
"recombination_rate": 1e-8, // Fixed
"generation_time": 25 // Fixed
}
}With replicates: 10, you'll get:
- 10 simulations with the SAME recombination rate (1e-8) and generation time (25)
- 10 simulations with DIFFERENT mutation rates (randomly sampled between 1e-8 and 1e-6)
When you run the workflow:
-
Expansion phase: The
sample_ranges.pyscript processes your configuration- Creates individual configurations for each replicate
- Samples values from any parameter ranges
- Saves expanded demographic files to
.config/expanded_demographics/
-
Simulation phase: Each expanded sample runs through the full pipeline
- Coalescent simulation with msprime
- Variant generation
- Graph construction
-
Output organization: Results are saved to named directories
- Single replicate:
results/sample_name/ - Multiple replicates:
results/sample_name_rep1/,results/sample_name_rep2/, etc.
- Single replicate:
Q: Where are the simulation parameters defined?
A: All simulation parameters (mutation rate, population sizes, SV distributions, etc.) are defined in the demographic model JSON files, not in masterconfig.yaml.
Q: How do I run the same model with different parameters?
A: Use parameter ranges in your demographic model JSON and set replicates > 1.