Description
Describe the issue
I'm working with 10X Genomics single-cell RNA-seq data (28bp R1 files containing barcodes/UMIs and 90bp R2 files with transcript sequences), and I'm trying to process this data using kallisto/bustools.
My issue is that I want to get transcript-level expression matrices (transcript abundances per cell), but the current command I'm using with the --tcc flag only gives me transcript-compatibility counts (TCCs) on the counts_unfiltered dierctory. These TCCs are an intermediate format that shows which transcripts each UMI could have originated from, but doesn't resolve to actual transcript expression levels. I was expecting to get a quant_unfiltered directory with this information. I want the full transcript expression matrices to capture isoform-level information in my single-cell data.
Is there a way to do that?
What is the exact command that was run?
kb count -i path/to/kallisto/index.idx \
-g path/to/kallisto/t2g.txt \
-o ${OUTPUT_BASE_DIR}/${sample} \
-x 10XV3 --tcc --h5ad -t 16 \
${all_fastq_files}
Command output (with --verbose
flag)
[2025-03-18 13:54:29,325] DEBUG [main] Printing verbose output
[2025-03-18 13:54:31,543] DEBUG [main] kallisto binary located at path/to/.local/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto
[2025-03-18 13:54:31,544] DEBUG [main] bustools binary located at path/to/.local/lib/python3.8/site-packages/kb_python/bins/linux/bustools/bustools
[2025-03-18 13:54:31,545] DEBUG [main] Creating `/path/to/kallisto_bustools/secondary/Sample_Name/tmp` directory
[2025-03-18 13:54:31,546] DEBUG [main] Namespace(N=None, aa=False, batch_barcodes=False, bootstraps=None, bustools='path/to/.local/lib/python3.8/site-packages/kb_python/bins/linux/bustools/bustools', c1=None, c2=None, cellranger=False, chromosomes=None, command='count', dry_run=False, em=False, error_rate=None, fastqs=['path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R1_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R1_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R1_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R1_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R2_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R2_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R2_001.fastq.gz', 'path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R2_001.fastq.gz'], filter=None, filter_threshold=None, fragment_l=None, fragment_s=None, g='path/to/kallisto/t2g.txt', gene_names=False, genomebam=False, gtf=None, h5ad=True, i='path/to/kallisto/index.idx', inleaved=False, k=31, kallisto='path/to/.local/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto', keep_flags=False, keep_tmp=True, list=False, long=False, loom=False, loom_names='barcode,target_name', m='2G', matrix_to_directories=False, matrix_to_files=False, mm=False, no_fragment=False, no_inspect=False, no_jump=False, no_validate=False, num=False, o='path/to/kallisto_bustools/secondary/Sample_Name', opt_off=False, overwrite=False, parity=None, platform='ONT', quant_umis=False, r=None, report=False, strand=None, sum='none', t=16, tcc=True, threshold=0.8, tmp=None, union=False, verbose=True, w=None, workflow='standard', x='10XV3')
[2025-03-18 13:54:35,421] INFO [count] Using index path/to/kallisto/index.idx to generate BUS file to path/to/kallisto_bustools/secondary/Sample_Name from
[2025-03-18 13:54:35,421] INFO [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R1_001.fastq.gz
[2025-03-18 13:54:35,421] INFO [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R1_001.fastq.gz
[2025-03-18 13:54:35,421] INFO [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R1_001.fastq.gz
[2025-03-18 13:54:35,421] INFO [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R1_001.fastq.gz
[2025-03-18 13:54:35,421] INFO [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R2_001.fastq.gz
[2025-03-18 13:54:35,421] INFO [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R2_001.fastq.gz
[2025-03-18 13:54:35,421] INFO [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R2_001.fastq.gz
[2025-03-18 13:54:35,421] INFO [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R2_001.fastq.gz
[2025-03-18 13:54:35,421] DEBUG [count] kallisto bus -i path/to/tools/kallisto/index.idx -o path/to/kallisto_bustools/secondary/Sample_Name -x 10XV3 -t 16 path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R1_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L002_R1_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L003_R1_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L004_R1_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L001_R2_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L002_R2_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L003_R2_001.fastq.gzpath/to/kallisto_bustools/primary//Sample_Name_S6_L004_R2_001.fastq.gz
[2025-03-18 13:54:35,530] DEBUG [count]
[2025-03-18 13:54:35,530] DEBUG [count] [bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology
[2025-03-18 13:54:38,835] DEBUG [count] [index] k-mer length: 31
[2025-03-18 13:54:42,843] DEBUG [count] [index] number of targets: 385,659
[2025-03-18 13:54:42,843] DEBUG [count] [index] number of k-mers: 176,942,277
[2025-03-18 13:54:42,843] DEBUG [count] [index] number of D-list k-mers: 6,427,244
[2025-03-18 13:54:42,843] DEBUG [count] [quant] running in paired-end mode
[2025-03-18 13:54:42,843] DEBUG [count] [quant] will process sample 1: path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R1_001.fastq.gz
[2025-03-18 13:54:42,843] DEBUG [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R1_001.fastq.gz
[2025-03-18 13:54:42,843] DEBUG [count] [quant] will process sample 2: path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R1_001.fastq.gz
[2025-03-18 13:54:42,843] DEBUG [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R1_001.fastq.gz
[2025-03-18 13:54:42,843] DEBUG [count] [quant] will process sample 3: path/to/kallisto_bustools/primary//Sample_Name_S6_L001_R2_001.fastq.gz
[2025-03-18 13:54:42,843] DEBUG [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L002_R2_001.fastq.gz
[2025-03-18 13:54:42,843] DEBUG [count] [quant] will process sample 4: path/to/kallisto_bustools/primary//Sample_Name_S6_L003_R2_001.fastq.gz
[2025-03-18 13:54:42,843] DEBUG [count] path/to/kallisto_bustools/primary//Sample_Name_S6_L004_R2_001.fastq.gz
[2025-03-18 13:54:43,845] DEBUG [count] [quant] finding pseudoalignments for the reads ...
[2025-03-18 13:54:44,146] DEBUG [count] [progress] 1M reads processed (0.0% mapped)
...
[2025-03-18 13:59:36,979] DEBUG [count] [progress] 331M reads processed (15.4% mapped)
[2025-03-18 13:59:38,181] DEBUG [count] [progress] 332M reads processed (15.4% mapped) done
[2025-03-18 13:59:38,181] DEBUG [count] [quant] processed 332,722,184 reads, 51,414,736 reads pseudoaligned
[2025-03-18 13:59:39,082] DEBUG [count]
[2025-03-18 13:59:41,186] INFO [count] Sorting BUS file path/to/kallisto_bustools/secondary/Sample_Name/output.bus to path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus
[2025-03-18 13:59:41,186] DEBUG [count] bustools sort -o path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus -T path/to/kallisto_bustools/secondary/Sample_Name/tmp -t 16 -m 2Gpath/to/kallisto_bustools/secondary/Sample_Name/output.bus
[2025-03-18 13:59:43,408] DEBUG [count] partition time: 1.40556s
[2025-03-18 13:59:43,909] DEBUG [count] all fits in buffer
[2025-03-18 13:59:46,413] DEBUG [count] Read in 51414736 BUS records
[2025-03-18 13:59:46,414] DEBUG [count] reading time 0.339685s
[2025-03-18 13:59:46,414] DEBUG [count] sorting time 8.20086s
[2025-03-18 13:59:46,414] DEBUG [count] writing time 1.03736s
[2025-03-18 13:59:46,414] INFO [count] On-list not provided
[2025-03-18 13:59:46,414] INFO [count] Copying pre-packaged 10XV3 on-list topath/to/kallisto_bustools/secondary/Sample_Name
[2025-03-18 13:59:47,184] INFO [count] Inspecting BUS file path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus
[2025-03-18 13:59:47,184] DEBUG [count] bustools inspect -o path/to/kallisto_bustools/secondary/Sample_Name/inspect.json -w path/to/kallisto_bustools/secondary/Sample_Name/10x_version3_whitelist.txtpath/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus
[2025-03-18 13:59:58,414] INFO [count] Correcting BUS records in path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus to path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.c.bus with on-listpath/to/kallisto_bustools/secondary/Sample_Name/10x_version3_whitelist.txt
[2025-03-18 13:59:58,414] DEBUG [count] bustools correct -o path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.c.bus -w path/to/kallisto_bustools/secondary/Sample_Name/10x_version3_whitelist.txtpath/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.bus
[2025-03-18 14:00:05,136] DEBUG [count] Found 6794880 barcodes in the on-list
[2025-03-18 14:00:15,856] DEBUG [count] Processed 39783177 BUS records
[2025-03-18 14:00:15,856] DEBUG [count] In on-list = 50211
[2025-03-18 14:00:15,856] DEBUG [count] Corrected = 443032
[2025-03-18 14:00:15,856] DEBUG [count] Uncorrected = 39289934
[2025-03-18 14:00:17,359] INFO [count] Sorting BUS file path/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.c.bus to path/to/kallisto_bustools/secondary/Sample_Name/output.unfiltered.bus
[2025-03-18 14:00:17,359] DEBUG [count] bustools sort -o path/to/kallisto_bustools/secondary/Sample_Name/output.unfiltered.bus -T path/to/kallisto_bustools/secondary/Sample_Name/tmp -t 16 -m 2Gpath/to/kallisto_bustools/secondary/Sample_Name/tmp/output.s.c.bus
[2025-03-18 14:00:17,869] DEBUG [count] partition time: 0.003896s
[2025-03-18 14:00:17,869] DEBUG [count] all fits in buffer
[2025-03-18 14:00:18,970] DEBUG [count] Read in 493243 BUS records
[2025-03-18 14:00:18,971] DEBUG [count] reading time 0.003955s
[2025-03-18 14:00:18,971] DEBUG [count] sorting time 0.033978s
[2025-03-18 14:00:18,971] DEBUG [count] writing time 0.014626s
[2025-03-18 14:00:18,972] INFO [count] Generating count matrix path/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/cells_x_tcc from BUS file path/to/kallisto_bustools/secondary/Sample_Name/output.unfiltered.bus
[2025-03-18 14:00:18,973] DEBUG [count] bustools count -o path/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/cells_x_tcc -g path/to/kallisto/t2g.txt -epath/to/kallisto_bustools/secondary/Sample_Name/matrix.ec -tpath/to/kallisto_bustools/secondary/Sample_Name/transcripts.txt --multimapping --umi-genepath/to/kallisto_bustools/secondary/Sample_Name/output.unfiltered.bus
[2025-03-18 14:00:23,134] DEBUG [count] path/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/cells_x_tcc.mtx passed validation
[2025-03-18 14:00:23,134] INFO [count] Reading matrix path/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/cells_x_tcc.mtx
[2025-03-18 14:00:26,723] INFO [count] Writing matrix to h5adpath/to/kallisto_bustools/secondary/Sample_Name/counts_unfiltered/adata.h5ad