Releases: pachterlab/kb_python
Releases · pachterlab/kb_python
v0.25.0
ref
- Progress bar is now displayed when downloading pre-packaged reference files.
- Added checks to provide more useful outputs for common errors, including: 1) when FASTA and GTF chromosomes do not match, 2) when a GTF entry is not parsable, and 3) when either
transcriptorexonentry for a transcript is missing in the GTF (both are required). - Added
-koption to override default (or calculated optimal) kmer length for the Kallisto index. - Added functionality to generate a feature barcode reference for use with the KITE feature-barcoding workflow. To use this option, supply
--workflow kiteand a feature-barcode to cell-barcode mapping. - Added
-noption to be able to split indices intonparts. This reduces the maximum memory used at any given time. Useful for running in memory-limited environments. When the-noption is used, the-iargument is used as the prefix to thenindices generated. Each of these indices are appended with a.iwhereiis the index number, starting fromi=0. When-nis used the built indices must be passed in as a comma-delimited list tokb count(NOTE: this feature is EXPERIMENTAL Seecountfor more details). When-nis used with--workflow lamannoor--workflow nucleus, only the intron FASTA is split inton-1parts, which are then each indexed separately. The cDNA FASTA is indexed in its entirety and is never split. - Added functionality to build a single index using multiple references. Useful for mixed species experiments. The
fastaargument should be a comma-delimited list of genome FASTAs, and thegtfargument should be a comma-delimited list of GTFs, corresponding in position to each genome FASTA. - Added
--tmpoption to manually specify temporary directory. Otherwise, behavior is identical to previous version (tmpdirectory at the locationkbis executed). - Added support for IUPAC nucleotide code. Note that
kallistoreplaces non-ACGUT nucleotides to pseudorandom ones. Thanks @Maarten-vd-Sande
count
- Added support for KITE feature-barcoding workflow. The
bustoolsbinary was updated to support this feature. - DEPRECATION: The
--lamannoand--nucleusflags will be deprecated in the next release. These have been replaced with--workflow lamannoand--workflow nucleus. - All BUS files that are input/outputs are validated before/after running
kallistoorbustools. A BUS file is considered valid if it is read withbustoolswithout error and it has positive number of BUS records. This should preventbustoolsfrom trying to sort empty BUS files and crashing (#31). - Added functionality to generate TCC matrices with the
--tccflag. - Added
--tccflag to include reads that pseudoalign to multiple genes. - When running in verbose mode (
--verbose), commands are no longer printed with the full path to thebustoolsandkallistobinaries. These paths are printed once at the start of the program. - Added
--dry-runflag, which prints the entire workflow to standard output as shell commands, without actually running them. - EXPERIMENTAL: Added support for multiple indices by passing a comma-delimited list of indices to
-i.kbwill align the reads to each of these indices and merge the BUS files withbustools mashandbustools merge. This feature is currently EXPERIMENTAL, and there are known issues that cause the loss of reads. This feature will be fully supported in a future release. In the meantime, use at your own risk! - Added
--tmpoption to manually specify temporary directory. The default behavior has also changed: the defaulttmpdirectory is created IN THE OUTPUT FOLDER (specified by-o). Previously, thetmpdirectory was created wherekbwas run, which was causing issues when running multiple instances ofkbfrom the same location. Thanks to @Munfred and @kokitsuyuzaki for the suggestion. kbnow outputs akb_info.jsonwhich includes useful run information, such as the commands run and their runtimes.- Added functionality to generate a brief standalone HTML report that includes basic statistics (run_info.json, inspect.json) and quality-control plots (knee plot, elbow plot, pca, genes detected). This feature is available with the
--reportflag. Using this flag on velocity matrices may causekbto crash due to high memory usage, and a corresponding warning is printed at the start. Plots for TCC matrices are not supported. - When the matrix is converted to H5AD or Loom format (using the
--h5ador--loomoptions), the gene/feature names are included as a column in thevarof the anndata. Related to #52 - Added a
--cellrangeroption, which converts the raw gene matrices to cellranger-compatible format in a separate,cellrangerdirectory forstandardworkflow (andcellranger_splicedandcellranger_unsplicedforvelocityandnucleusworkflows). Note that cellranger outputs matrices with genes as rows and cells (barcodes) as columns. - Added
--mmflag to include bus records that pseudoalign to multiple genes, via the--multimappingflag inbustools count(#57). Nonecan be provided as the whitelist, which will forcekbto use thebustools whitelistcommand, even if there exists a pre-packaged whitelist.- Added support for Smart-seq reads with
-x smartseq. FASTQs are paired by first sorting the list of FASTQ paths in lexicographical order, and taking every two to be a pair. For instance, if1.fastq 3.fastq 2.fastq 4.fastqis provided,1.fastqand2.fastqwill be a pair, and3.fastq and 4.fastqwill be another pair. The FASTQ argument now supports glob expressions to make it easier to provide a long list of FASTQs.
v0.24.4
--info
- Fix typo with
indropsv3
ref
- If any input (FASTA or GTF) files are provided as gzip files, they are uncompressed to the temporary directory, instead of being streamed directly. This is because
refrelies on being able to access arbitrary locations of the files quickly. Working with decompressed files results in a considerable speedup.
count
- For
--lamanno: spliced and unspliced busfiles no longer contain the.ssuffix. This was done to make the output consistent with the normal (non--lamanno) command - Implemented
--filterwith--lamanno - Support for single nuclei RNA-seq with
--nuclei. The only difference between--nucleiand--lamannois how the spliced and unspliced matrices are combined. Specifically,--nucleisums the matrices. Using--nucleiwith neither--loomnor--h5adresults in behavior identical with--lamanno.
v0.24.3
v0.24.2
v0.24.1
ref
kbnow provides a pre-built human index for RNA velocity (linnarsson)- The intronic fasta with the
--lamannooption now includes 30-base flanking regions.
count
- Unfiltered count matrices will always be placed in the
counts_unfilteredfolder. - If the
--filteroption is specified, the filtered count matrices will be placed in thecounts_filteredfolder.