Skip to content

Obtaining Transcript-Level Expression Matrices from scRNA-seq Data #288

Open
@MariosGvr

Description

@MariosGvr

Describe the issue
I'm working with 10X Genomics single-cell RNA-seq data (28bp R1 files containing barcodes/UMIs and 90bp R2 files with transcript sequences), and I'm trying to process this data using kallisto/bustools.

My issue is that I want to get transcript-level expression matrices (transcript abundances per cell), but the current command I'm using with the --tcc flag only gives me transcript-compatibility counts (TCCs) on the counts_unfiltered dierctory. These TCCs are an intermediate format that shows which transcripts each UMI could have originated from, but doesn't resolve to actual transcript expression levels. I was expecting to get a quant_unfiltered directory with this information. I want the full transcript expression matrices to capture isoform-level information in my single-cell data.

Is there a way to do that?

What is the exact command that was run?

kb count -i path/to/kallisto/index.idx \
                    -g path/to/kallisto/t2g.txt \
                    -o ${OUTPUT_BASE_DIR}/${sample} \
                    -x 10XV3 --tcc --h5ad -t 16 \
                    ${all_fastq_files}

Command output (with --verbose flag)

[2025-03-18 13:54:29,325]   DEBUG [main] Printing verbose output
[2025-03-18 13:54:31,543]   DEBUG [main] kallisto binary located at path/to/.local/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto
[2025-03-18 13:54:31,544]   DEBUG [main] bustools binary located at path/to/.local/lib/python3.8/site-packages/kb_python/bins/linux/bustools/bustools
[2025-03-18 13:54:31,545]   DEBUG [main] Creating `/path/to/kallisto_bustools/secondary/Sample_Name/tmp` directory
[2025-03-18 13:54:31,546]   DEBUG [main] Namespace(N=None, aa=False, batch_barcodes=False, bootstraps=None, bustools='path/to/.local/lib/python3.8/site-packages/kb_python/bins/linux/bustools/bustools', c1=None, c2=None, cellranger=False, chromosomes=None, command='count', dry_run=False, em=False, error_rate=None, fastqs=['path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R1_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R1_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R1_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R1_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R2_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R2_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R2_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R2_001.fastq.gz'], filter=None, filter_threshold=None, fragment_l=None, fragment_s=None, g='path/to/kallisto/t2g.txt', gene_names=False, genomebam=False, gtf=None, h5ad=True, i='path/to/kallisto/index.idx', inleaved=False, k=31, kallisto='path/to/.local/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto', keep_flags=False, keep_tmp=True, list=False, long=False, loom=False, loom_names='barcode,target_name', m='2G', matrix_to_directories=False, matrix_to_files=False, mm=False, no_fragment=False, no_inspect=False, no_jump=False, no_validate=False, num=False, o='path/to/kallisto_bustools/secondary/Sample_Name', opt_off=False, overwrite=False, parity=None, platform='ONT', quant_umis=False, r=None, report=False, strand=None, sum='none', t=16, tcc=True, threshold=0.8, tmp=None, union=False, verbose=True, w=None, workflow='standard', x='10XV3')
[2025-03-18 13:54:35,421]    INFO [count] Using index path/to/kallisto/index.idx to generate BUS file to path/to/kallisto_bustools/secondary/Sample_Name from
[2025-03-18 13:54:35,421]    INFO [count]        path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R1_001.fastq.gz
[2025-03-18 13:54:35,421]    INFO [count]        path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R1_001.fastq.gz
[2025-03-18 13:54:35,421]    INFO [count]        path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R1_001.fastq.gz
[2025-03-18 13:54:35,421]    INFO [count]        path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R1_001.fastq.gz
[2025-03-18 13:54:35,421]    INFO [count]        path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R2_001.fastq.gz
[2025-03-18 13:54:35,421]    INFO [count]        path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R2_001.fastq.gz
[2025-03-18 13:54:35,421]    INFO [count]        path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R2_001.fastq.gz
[2025-03-18 13:54:35,421]    INFO [count]        path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R2_001.fastq.gz
[2025-03-18 13:54:35,421]   DEBUG [count] kallisto bus -i path/to/tools/kallisto/index.idx -o path/to/kallisto_bustools/secondary/Sample_Name -x 10XV3 -t 16 path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R1_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L002_R1_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L003_R1_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L004_R1_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L001_R2_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L002_R2_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L003_R2_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L004_R2_001.fastq.gz
[2025-03-18 13:54:35,530]   DEBUG [count]
[2025-03-18 13:54:35,530]   DEBUG [count] [bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[2025-03-18 13:54:38,835]   DEBUG [count] [index] k-mer length: 31
[2025-03-18 13:54:42,843]   DEBUG [count] [index] number of targets: 385,659
[2025-03-18 13:54:42,843]   DEBUG [count] [index] number of k-mers: 176,942,277
[2025-03-18 13:54:42,843]   DEBUG [count] [index] number of D-list k-mers: 6,427,244
[2025-03-18 13:54:42,843]   DEBUG [count] [quant] running in paired-end mode
[2025-03-18 13:54:42,843]   DEBUG [count] [quant] will process sample 1: path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R1_001.fastq.gz
[2025-03-18 13:54:42,843]   DEBUG [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R1_001.fastq.gz
[2025-03-18 13:54:42,843]   DEBUG [count] [quant] will process sample 2: path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R1_001.fastq.gz
[2025-03-18 13:54:42,843]   DEBUG [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R1_001.fastq.gz
[2025-03-18 13:54:42,843]   DEBUG [count] [quant] will process sample 3: path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R2_001.fastq.gz
[2025-03-18 13:54:42,843]   DEBUG [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R2_001.fastq.gz
[2025-03-18 13:54:42,843]   DEBUG [count] [quant] will process sample 4: path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R2_001.fastq.gz
[2025-03-18 13:54:42,843]   DEBUG [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R2_001.fastq.gz
[2025-03-18 13:54:43,845]   DEBUG [count] [quant] finding pseudoalignments for the reads ...
[2025-03-18 13:54:44,146]   DEBUG [count] [progress] 1M reads processed (0.0% mapped)
...
[2025-03-18 13:59:36,979]   DEBUG [count] [progress] 331M reads processed (15.4% mapped)
[2025-03-18 13:59:38,181]   DEBUG [count] [progress] 332M reads processed (15.4% mapped)              done
[2025-03-18 13:59:38,181]   DEBUG [count] [quant] processed 332,722,184 reads, 51,414,736 reads pseudoaligned
[2025-03-18 13:59:39,082]   DEBUG [count]
[2025-03-18 13:59:41,186]    INFO [count] Sorting BUS file path/to/kallisto_bustools/secondary/Sample_Name/output.bus to path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus
[2025-03-18 13:59:41,186]   DEBUG [count] bustools sort -o path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus -T path/to/kallisto_bustools/secondary/Sample_Name/tmp -t 16 -m 2Gpath/to/kallisto_bustools/secondary/Sample_Name/output.bus
[2025-03-18 13:59:43,408]   DEBUG [count] partition time: 1.40556s
[2025-03-18 13:59:43,909]   DEBUG [count] all fits in buffer
[2025-03-18 13:59:46,413]   DEBUG [count] Read in 51414736 BUS records
[2025-03-18 13:59:46,414]   DEBUG [count] reading time 0.339685s
[2025-03-18 13:59:46,414]   DEBUG [count] sorting time 8.20086s
[2025-03-18 13:59:46,414]   DEBUG [count] writing time 1.03736s
[2025-03-18 13:59:46,414]    INFO [count] On-list not provided
[2025-03-18 13:59:46,414]    INFO [count] Copying pre-packaged 10XV3 on-list topath/to/kallisto_bustools/secondary/Sample_Name
[2025-03-18 13:59:47,184]    INFO [count] Inspecting BUS file path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus
[2025-03-18 13:59:47,184]   DEBUG [count] bustools inspect -o path/to/kallisto_bustools/secondary/Sample_Name/inspect.json -w path/to/kallisto_bustools/secondary/Sample_Name/10x_version3_whitelist.txtpath/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus
[2025-03-18 13:59:58,414]    INFO [count] Correcting BUS records in path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus to path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.c.bus with on-listpath/to/kallisto_bustools/secondary/Sample_Name/10x_version3_whitelist.txt
[2025-03-18 13:59:58,414]   DEBUG [count] bustools correct -o path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.c.bus -w path/to/kallisto_bustools/secondary/Sample_Name/10x_version3_whitelist.txtpath/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus
[2025-03-18 14:00:05,136]   DEBUG [count] Found 6794880 barcodes in the on-list
[2025-03-18 14:00:15,856]   DEBUG [count] Processed 39783177 BUS records
[2025-03-18 14:00:15,856]   DEBUG [count] In on-list = 50211
[2025-03-18 14:00:15,856]   DEBUG [count] Corrected    = 443032
[2025-03-18 14:00:15,856]   DEBUG [count] Uncorrected  = 39289934
[2025-03-18 14:00:17,359]    INFO [count] Sorting BUS file path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.c.bus to path/to/kallisto_bustools/secondary/Sample_Name/output.unfiltered.bus
[2025-03-18 14:00:17,359]   DEBUG [count] bustools sort -o path/to/kallisto_bustools/secondary/Sample_Name/output.unfiltered.bus -T path/to/kallisto_bustools/secondary/Sample_Name/tmp -t 16 -m 2Gpath/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.c.bus
[2025-03-18 14:00:17,869]   DEBUG [count] partition time: 0.003896s
[2025-03-18 14:00:17,869]   DEBUG [count] all fits in buffer
[2025-03-18 14:00:18,970]   DEBUG [count] Read in 493243 BUS records
[2025-03-18 14:00:18,971]   DEBUG [count] reading time 0.003955s
[2025-03-18 14:00:18,971]   DEBUG [count] sorting time 0.033978s
[2025-03-18 14:00:18,971]   DEBUG [count] writing time 0.014626s
[2025-03-18 14:00:18,972]    INFO [count] Generating count matrix path/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/cells_x_tcc from BUS file path/to/kallisto_bustools/secondary/Sample_Name/output.unfiltered.bus
[2025-03-18 14:00:18,973]   DEBUG [count] bustools count -o path/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/cells_x_tcc -g path/to/kallisto/t2g.txt -epath/to/kallisto_bustools/secondary/Sample_Name/matrix.ec -tpath/to/kallisto_bustools/secondary/Sample_Name/transcripts.txt --multimapping --umi-genepath/to/kallisto_bustools/secondary/Sample_Name/output.unfiltered.bus
[2025-03-18 14:00:23,134]   DEBUG [count] path/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/cells_x_tcc.mtx passed validation
[2025-03-18 14:00:23,134]    INFO [count] Reading matrix path/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/cells_x_tcc.mtx
[2025-03-18 14:00:26,723]    INFO [count] Writing matrix to h5adpath/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/adata.h5ad

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions