Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
c9fd520
added tcrBLOSUM matrices as selectable option for the TCRdist metric
felixpetschko Apr 4, 2026
098daa5
renamed matrices
felixpetschko Apr 4, 2026
3d59750
fixed chain_type parameter handling
felixpetschko Apr 4, 2026
334db8c
Add docstrings for TCRdist base_matrix and chain_type
felixpetschko Apr 4, 2026
7974307
Add typing Literal import
felixpetschko Apr 4, 2026
2ef7f18
test: add coverage for TCRdist base_matrix selection
felixpetschko Apr 4, 2026
efb8d5b
Fix tcrdist trcblosum test case results
felixpetschko Apr 4, 2026
7fb41c4
Improve test cases for tcrdist with tcrblosum matrix
felixpetschko Apr 6, 2026
ba817c5
Improve docstring of chain_type parameter of TCRdistDistanceCalculator
felixpetschko Apr 6, 2026
ca80ef1
Improve docstring of TCRdistDistanceCalculator
felixpetschko Apr 6, 2026
719461b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 6, 2026
4b1ed85
Compute distance matrices from original base substitution matrices
felixpetschko May 11, 2026
f4b2acf
Update test cases according to new distance matrix computations
felixpetschko May 11, 2026
39a2e7a
Rename function that converts substitution to distance matrices
felixpetschko May 11, 2026
b7cf71b
Add a distance_cap parameter to TCRdistDistanceCalculator while prese…
felixpetschko May 11, 2026
dcff54f
Update changelog
felixpetschko May 11, 2026
c78ed49
Document TCRBLOSUM option for tcrdist metrix
felixpetschko May 11, 2026
887cf04
Cite TCRBLOSUM in tcrdist docs
felixpetschko May 11, 2026
6c4c5e6
Test ir_dist TCRBLOSUM chain routing
felixpetschko May 11, 2026
8a7640e
Update tutorial to mention tcrblosum usage
felixpetschko May 13, 2026
86f4514
Document shared TCRBLOSUM distance offset
felixpetschko May 13, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ and this project adheres to [Semantic Versioning][].

## Unreleased

### Additions

- Add support for TCRBLOSUM alpha/beta substitution matrices in the `tcrdist` distance metric via
`base_matrix="tcrblosum"`, and allow configuring the substitution-to-distance cap with `distance_cap`.

### Performance improvements

- Speed up identity distance metric computation for comparisons between two different sequence arrays.
Expand Down
14 changes: 14 additions & 0 deletions docs/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -394,6 +394,20 @@ @article{TCRdist
journal = {Nature}
}

@article{TCRBLOSUM,
doi = {10.1093/bib/bbae602},
url = {https://doi.org/10.1093/bib/bbae602},
year = {2024},
month = nov,
publisher = {Oxford University Press ({OUP})},
volume = {26},
number = {1},
pages = {bbae602},
author = {Anna Postovskaya and Koen Vercauteren and Pieter Meysman and Kris Laukens},
title = {{tcrBLOSUM}: an amino acid substitution matrix for sensitive alignment of distant epitope-specific {TCRs}},
journal = {Briefings in Bioinformatics}
}

@article{Brady2010-gh,
title = {Antigen receptor allelic exclusion: an update and reappraisal},
author = {Brady, Brenna L and Steinel, Natalie C and Bassing, Craig H},
Expand Down
8 changes: 5 additions & 3 deletions docs/tutorials/tutorial_3k_tcr.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1423,10 +1423,11 @@
"\n",
"To this end, we need to set `metric=\"tcrdist\"` and specify a `cutoff` parameter.\n",
"The distance is based on the [BLOSUM62](https://en.wikipedia.org/wiki/BLOSUM) matrix.\n",
"For instance, a distance of `10` is equivalent to 2 Rs mutating into N.\n",
"This appoach was initially proposed by Dash et al. {cite}`TCRdist` and is based on the [tcrdist3](https://github.com/kmayerb/tcrdist3) implementation.\n",
"This approach was initially proposed by Dash et al. {cite}`TCRdist` and is based on the [tcrdist3](https://github.com/kmayerb/tcrdist3) implementation.\n",
"For instance, a distance of `12` is equivalent to two `R`s mutating into `K`s with the default parameters.\n",
"Alternatively, [TCRBLOSUM](https://doi.org/10.1093/bib/bbae602) alpha/beta substitution matrices can be selected with `base_matrix=\"tcrblosum\"`.\n",
"\n",
"All cells with a distance between their CDR3 sequences lower than `cutoff` will be connected in the network.\n"
"Cells are connected in the network when their CDR3 sequence distances are at or below `cutoff`. In this case, this must hold for both receptor arms (`receptor_arms=\"all\"`, i.e. both VJ and VDJ chains), considering any possible dual immune receptor chain pairing (`dual_ir=\"any\"`).\n"
]
},
{
Expand Down Expand Up @@ -1464,6 +1465,7 @@
" metric=\"tcrdist\",\n",
" sequence=\"aa\",\n",
" cutoff=15,\n",
" # base_matrix=\"tcrblosum\",\n",
")"
]
},
Expand Down
10 changes: 7 additions & 3 deletions src/scirpy/ir_dist/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ def IrNeighbors(*args, **kwargs):
* `levenshtein` -- Levenshtein edit distance.
See :class:`~scirpy.ir_dist.metrics.LevenshteinDistanceCalculator`.
* `tcrdist` -- Distance based on pairwise sequence alignments between TCR CDR3 sequences based on the tcrdist metric.
Uses the BLOSUM62 substitution matrix by default. TCRBLOSUM alpha/beta substitution matrices
(:cite:`TCRBLOSUM`) can be selected with `base_matrix="tcrblosum"`.
See :class:`~scirpy.ir_dist.metrics.TCRdistDistanceCalculator`.
* `hamming` -- Hamming distance for CDR3 sequences of equal length.
See :class:`~scirpy.ir_dist.metrics.HammingDistanceCalculator`.
Expand Down Expand Up @@ -85,7 +87,9 @@ def _get_metric_key(metric: MetricType) -> str:
return "custom" if isinstance(metric, metrics.DistanceCalculator) else metric # type: ignore


def _get_distance_calculator(metric: MetricType, cutoff: int | None, *, n_jobs=-1, **kwargs):
def _get_distance_calculator(
metric: MetricType, cutoff: int | None, *, n_jobs=-1, chain_type: Literal["VJ", "VDJ"] | None = None, **kwargs
):
"""Returns an instance of :class:`~scirpy.ir_dist.metrics.DistanceCalculator`
given a metric.

Expand Down Expand Up @@ -116,7 +120,7 @@ def _get_distance_calculator(metric: MetricType, cutoff: int | None, *, n_jobs=-
elif metric == "gpu_hamming":
dist_calc = metrics.GPUHammingDistanceCalculator(**kwargs)
elif metric == "tcrdist":
dist_calc = metrics.TCRdistDistanceCalculator(n_jobs=n_jobs, **kwargs)
dist_calc = metrics.TCRdistDistanceCalculator(n_jobs=n_jobs, chain_type=chain_type, **kwargs)
else:
raise ValueError("Invalid distance metric.")

Expand Down Expand Up @@ -252,9 +256,9 @@ def _get_unique_seqs(tmp_adata, chain_type):
result[chain_type][tmp_key] = unique_seqs

# compute distance matrices
dist_calc = _get_distance_calculator(metric, cutoff, n_jobs=n_jobs, **kwargs)
for chain_type in ["VJ", "VDJ"]:
logging.info(f"Computing sequence x sequence distance matrix for {chain_type} sequences.") # type: ignore
dist_calc = _get_distance_calculator(metric, cutoff, n_jobs=n_jobs, chain_type=chain_type, **kwargs)
result[chain_type]["distances"] = dist_calc.calc_dist_mat(
result[chain_type]["seqs"], result[chain_type].get("seqs2", None)
).tocsr()
Expand Down
Loading
Loading