Skip to content

BuildConsensusReference Error when no clusters contain all datasets #20

@JTumulty

Description

@JTumulty

There seems to be an error that arises when running cluster_cnmf_results() on an instance of BuildConsensusReference in the case where no single cluster contains a gene program from all of the source datasets (max cluster size < num datasets). For example if you have 4 datasets where you have run cNMF and all clusters contain 3 or fewer programs you get the error:
ValueError: 4 columns passed, passed data had 3 columns

This appears to arise from line 300 in build_consensus_reference.py where the length of assigned column names is set by number of datasets (self.num_results below) and the number of columns in the generated data frame is set by the max size of a cluster.

Line 300-301
clus_df = pd.DataFrame.from_dict(clus_dict_all, orient='index', columns = ['GEP%d' % x for x in range(1, self.num_results+1)])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions