-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Report
Hello,
I have tried running Scirpy with the following code to define BCR clonotypes using my BCR data:
(Rules: Same V gene, same J gene, and 85% sequence similarity at the nucleotide level of the junction region)
ir.pp.ir_dist(mdata, metric="normalized_hamming", cutoff=15, sequence="nt", histogram=False) ir.tl.define_clonotype_clusters( mdata, sequence="nt", metric="normalized_hamming", receptor_arms=sirpy_receptor_arms, dual_ir="all", same_v_gene=True, same_j_gene=True, partitions="fastgreedy", key_added="clone_id_85_similarity", )
However, I obtained a clonotype containing sequences with different junction lengths (both amino acid and nucleotide) and also different V genes.
Here is the AIRR file of the incorrectly assigned clonotype (clone 24):
test.airr.tsv
I have also tried only on this subset of airr.
It is confusing that contigs with different V genes were not compared (as I verified in the distance matrix—the junctions of index 0 and 1 were not compared to those of index 2 and 3), yet they were assigned the same clone_id.
Thank you for your help in debugging this issue.
Yuyu