Currently KOMB relies on igraph for k-core decomposition. Hence, although all the adjacency information for the graph construction is done in the functions Kgraph::readSAM, Kgraph::getEdgeInfo, and Kgraph::generateGraph we still need to create and initialize an igraph graph object. This step takes a significant amount of time, and the resulting graph data structure requires a large amount of RAM to be stored. This is limiting in the following ways:
- Large RAM usage prevents KOMB from being run on extremely large datasets (e.g. the whole human genome data as exemplified by HG002 300x Illumina data from GIAB consortium).
- Redundancy of converting already stored information into a different object comes at a large runtime cost. Current KOMB run on HG002 chr11 spends half of all processing time initializing the igraph object.
- Currently igraph only implements serial graph construction, and serial k-core decomposition. Recent work showed promising parallelization options for k-core decomposition, and hence it might be useful to implement these algorithms as an alternative for very large graphs.
Thus, it would be a good idea to implement an alternative graph data structure that is optimized for KOMB and can support parallel construction and k-core decomposition.
Currently KOMB relies on igraph for k-core decomposition. Hence, although all the adjacency information for the graph construction is done in the functions
Kgraph::readSAM,Kgraph::getEdgeInfo, andKgraph::generateGraphwe still need to create and initialize an igraph graph object. This step takes a significant amount of time, and the resulting graph data structure requires a large amount of RAM to be stored. This is limiting in the following ways:Thus, it would be a good idea to implement an alternative graph data structure that is optimized for KOMB and can support parallel construction and k-core decomposition.