Skip to content

IEDB database cdr3_aa stored as junction_aa #469

@racng

Description

@racng

Describe the bug
IEDB database provides cdr3_aa sequences instead of junction_aa sequences. scirpy database import code puts this in the slot for junction_aa. The sequence would then be missing the flanking amino acids. This makes it inaccurate to do identity string matching with ir_dist/ir_query.

To Reproduce
https://github.com/scverse/scirpy/blob/d862cf35740a79e91f95c49ce6da1fbe280f8c1c/src/scirpy/datasets/__init__.py#L338C1-L371C10

Expected behaviour
Is there a way to format cdr3 into junction sequence? like add the missing "C" at the beginning based on V call? Do you have advice on how to do this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions