Reproducibility test

Jump to bottom

revema edited this page Aug 6, 2025 · 8 revisions

Testing InstaNexus reproducibility

First check: check conda environment

Running script_dbg.py and script_greedy.py scripts that use all the packages required in the folder src

Any problems with any packages?
Can you run the scripts with no breaks?

Note: at the moment some problems appear in the terminal probably related to the library kaleido for saving images

Second check: script_dbg.py and script_greedy.py scripts

Consider a specific combination of values for a specific dataset (it is already specified in the script)

Do we obtain the same statistical results? check inside folder output the folder "statistic" - json file scaffold (dbg and greedy)
Do we obtain the same number of outputs? check inside folder output the folder "scaffold": example cluster fasta
Check few scaffolds and see if the sequence is the same

checking cluster step repr. --> >scaffold_1 length: 56 RFDRPFLLALALKAWSVARLSQKFPKAEFVEVTKFPKAEFVEVTKLVTDLTKVHSQ >scaffold_48 length: 26 RLSQKFPKAEFVEVTKLVTDLTKVHL >scaffold_54 length: 25 TRKFPKAEFVEVTKLVTDLTGKVHK
checking consensus step repr. --> greedy >scaffold_1_out Consensus sequence RFDRPFLLALALKAWSVARLSQKFPKAEFVE--KFPKAEFVEVTKLVTDLTKVH--

Third check: check grid search optimization

Considering specific samples and combinations release same results

gridseach.py on greedy
gridsearch.py on DBG