Skip to content

Allow >2 files as input (concatenate as 1 file) #50

@LonnekeScheffer

Description

@LonnekeScheffer

This request came from Brian Corrie (who is integrating CompAIRR into the iReceptor platform):

"What I would like to do is give compairr a list of files and have it compute the overlap between all repertoires across all files.
For example, if I have N AIRR TSV files, one per repertoire, I could run compairr with the N files and get an NxN matrix of the comparison. In addition, if each file happened to have M repertoires in it (say M samples from N subjects) then I would get an NM x NM matrix. This is quite a common way to have data represented, rather than a single large file. This would essentially be the same as concatenating all of the AIRR TSV files together, with the caveat that AIRR TSV files don't have the columns in the same order, so you can't just do a naive concatenation of the files."

I think the easiest solution would be to add a --concat-input flag, where in --matrix, --cluster and --deduplicate modes, an arbitrary number of input files is concatenated and treated as one long input (that would mean that --matrixwith 2 files would behave differently depending on if this flag is set). For--existence` mode, the presumption can remain that the first file is the sequence file, and the rest are the repertoire files (which are concatenated).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions