QCatch: Quality Control downstream of alevin-fry / simpleaf
View the complete QCatch documentation with interactive examples, FAQs, and detailed usage guides.
You need to have Python 3.11 or 3.12 installed on your system.
There are several alternative options to install QCatch:
You can install using Conda from Bioconda.
conda install -c bioconda qcatch
You can also install from PyPI using pip
:
pip install qcatch
Tips: If you run into environment issues, you can also use the provided Conda .yml file, which specifies the exact versions of all dependencies to ensure consistency.
conda env create -f qcatch_conda_env.yml
Provide the path to the parent folder for quantification results, or the direct path to a .h5ad file generated by alevin-fry
or simpleaf
. QCatch will automatically scan the input path, assess data quality, and generate an interactive HTML report that can be viewed directly in your browser.
qcatch \
--input path/to/your/quantification/result \
--output path/to/desired/QC/output/folder \ # if you want another folder for output
--chemistry 10X_3p_v3
--save_filtered_h5ad
1- Input path:
Provide either:
- the path to the parent directory containing quantification results, or
- the direct path to a .h5ad file generated by those tools.
QCatch will automatically detect the input type:
- If a .h5ad file is provided, QCatch will process it directly.
- If a directory is provided, QCatch will first look for an existing .h5ad file inside. If not found, it will fall back to processing the mtx-based quantification results.
See the example directory structures at the end of the Tips section for reference:
2- Output path:
If you do not want any modifications in your input folder/files, speaficy the output path, we will save any new results and QC HTML report there.
By default, QCatch saves the QC report and all output files in your input directory. Therefore, specifying an output path is optional. Specifically,
- If QCatch finds the
.h5ad
file from input path, it will modify the original.h5ad
file in place by appending cell filtering results toanndata.obs
and create a separate QC report in HTML in the input folder. - For
mtx-based
results, QCatch will generate text files for the cell calling reuslts as well as the QC report in the input folder."
3- Chemistry:
We highly recommend specifying the chemistry used in your experiment. By default, QCatch will assume the settings for 10X 3' v2 and v3 chemistry. If you use custom chemistry that not listed in the predefined chemistry options. You can specify the --n_partitions
.
3- Gene gene mapping file:
If you are using simpleaf v0.19.3 or later, the generated .h5ad file already includes gene names. In this case, you do not need to specify the --gene_id2name_file option.
To provide a 'gene id to name mapping' info, the file should be a TSV containing two columns—‘gene_id’ (e.g., ENSG00000284733) and ‘gene_name’ (e.g., OR4F29)— without header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed, but will not affect the QC report.
4- Save filtered h5ad file:
If you want to save filtered h5ad file separately, you can specify --save_filtered_h5ad
, which is only applicable when QCatch detects the h5ad file as the input.
5- Specify your desired cell list:
If you want to use a specified list of valid cell barcodes, you can provide the file path with --valid_cell_list
. QCatch will then skip the default cell calling step and use the supplied list instead. The updated .h5ad file will include only one additional column, 'is_retained_cells', containing boolean values based on the specified list.
6- Skip clustering plots:
To reduce runtime, you may enable the --skip_umap_tsne
option to bypass dimensionality reduction and visualization steps.
7- Export the summary metrics
To export the summary metrics, enable the --export_summary_table
flag. The summary table will be saved as a separate CSV file in the output directory.
8- Debug-level message
To get debug-level messages and more intermediate computation in cell calling step, you can specify --verbose
9- Re-run QCatch on modified h5ad file
If you re-run QCatch analysis on a modified .h5ad
file (i.e., an .h5ad
file with additional columns added for cell calling results), the existing cell calling-related columns will be removed and then replaced with new results. The new cell calling can be generated either through QCatch's internal method or based on a user-specified list of valid cell barcodes.
Example directory structures:
# simpleaf
parent_quant_dir/
├── af_map/
├── af_quant/
│ ├── alevin/
│ │ ├── quants_mat_cols.txt
│ │ ├── quants_mat_rows.txt
│ │ ├── quants_mat.mtx
│ │ └── quants.h5ad (available if you use simpleaf after v0.19.3)
│ │ ...
│ ├── featureDump.txt
│ └── quant.json
└── simpleaf_quant_log.json
# alevin-fry
parent_quant_dir/
├── alevin/
│ ├── quants_mat_cols.txt
│ ├── quants_mat_rows.txt
│ └── quants_mat.mtx
├── featureDump.txt
└── quant.json
For more advanced options and usage details, see the sections below.
Flag | Short | Type | Description |
---|---|---|---|
--input |
-i |
str (Required) |
Path to the input directory containing the quantification output files or to the HDF5 file itself. |
--output |
-o |
str (Required) |
Path to the output directory. |
--chemistry |
-c |
str (Optional but recommend) |
Specifies the chemistry used in the experiment, determining the range for the empty_drops step. Options: '10X_3p_v2' , '10X_3p_v3' , '10X_3p_v4' , '10X_3p_LT' ,'10X_5p_v3' ,'10X_HT' . Default: Will use the range for '10X_3p_v2' and '10X_3p_v3' . |
--save_filtered_h5ad |
-s |
flag (Optional) |
If enabled, qcatch will save a separate .h5ad file containing only the retained cells. |
--gene_id2name_file |
-g |
str (Optional) |
File provides a mapping from gene IDs to gene names. The file must be a TSV containing two columns—‘gene_id’ (e.g., ENSG00000284733) and ‘gene_name’ (e.g., OR4F29)—without a header row. If not provided, the program will attempt to retrieve the mapping from a remote registry. If that lookup fails, mitochondria plots will not be displayed. |
--valid_cell_list |
-l |
str (Optional) |
File provides a user-specified list of valid cell barcode. The file must be a TSV containing one column with cell barcodes without a header row. If provided, qcatch will skip the internal cell calling steps and and use the supplied list instead |
--n_partitions |
-n |
int (Optional) |
Number of partitions (max number of barcodes to consider for ambient estimation). Skip this step if you already specified --chemistry . Only use --n_partitions when your experiment uses a custom chemistry not listed in the predefined chemistry options. |
--skip_umap_tsne |
-u |
flag (Optional) |
If provided, skips generation of UMAP and t-SNE plots. |
--export_summary_table |
-x |
flag (Optional) |
If enabled, QCatch will export the summary metrics as a separate CSV file. |
--verbose |
-b |
flag (Optional) |
Enable verbose logging with debug-level messages. |
--version |
-v |
flag (Optional) |
Display the installed version of qcatch. |