Skip to content

Reproducibility test

revema edited this page Aug 6, 2025 · 8 revisions

Testing InstaNexus reproducibility

First check: check conda environment

Running script_dbg.py and script_greedy.py scripts that use all the packages required in the folder src

  1. Any problems with any packages?
  2. Can you run the scripts with no breaks?

Note: at the moment some problems appear in the terminal probably related to the library kaleido for saving images

Second check: script_dbg.py and script_greedy.py scripts

Consider a specific combination of values for a specific dataset (it is already specified in the script)

  1. Do we obtain the same statistical results? check inside folder output the folder "statistic" - json file scaffold (dbg and greedy)
  2. Do we obtain the same number of outputs? check inside folder output the folder "scaffold": example cluster fasta
  3. Check few scaffolds and see if the sequence is the same
  • checking cluster step repr. --> >scaffold_1 length: 56 RFDRPFLLALALKAWSVARLSQKFPKAEFVEVTKFPKAEFVEVTKLVTDLTKVHSQ >scaffold_48 length: 26 RLSQKFPKAEFVEVTKLVTDLTKVHL >scaffold_54 length: 25 TRKFPKAEFVEVTKLVTDLTGKVHK
  • checking consensus step repr. --> greedy >scaffold_1_out Consensus sequence RFDRPFLLALALKAWSVARLSQKFPKAEFVE--KFPKAEFVEVTKLVTDLTKVH--

Third check: check grid search optimization

Considering specific samples and combinations release same results

  1. gridseach.py on greedy
  2. gridsearch.py on DBG

Clone this wiki locally