|
| 1 | +Converting quantification results |
| 2 | +================================= |
| 3 | + |
| 4 | +The ``convert`` sub-command of ``pyroe`` can convert the output of `alevin-fry` into several common formats, such as |
| 5 | +the native `AnnData` format (``h5ad``). Further, when performing this conversion, it can organize the unspliced, |
| 6 | +spliced, and ambiguous counts as desired by the user. |
| 7 | + |
| 8 | +The sub-command takes as input a quantification directory produced by ``alevin-fry``, and an output location. |
| 9 | +Additionally, the user should pass in command line parameters to describe the desired output structure, and |
| 10 | +output format. The output structure defines how the ``U``, ``S``, and ``A`` layers of the input quantification should |
| 11 | +be represented in the converted matrix. The syntax for this flag exactly mimics the ``output_format`` argument of |
| 12 | +the ``load_fry`` function, which you can read about `here <https://pyroe.readthedocs.io/en/latest/building_splici_index.html#load-fry-notes>`_. |
| 13 | +Note that, if you pass in a custom output structure, you should enclose your format description in quotes. For |
| 14 | +example, to output to an object where the "main" layer (``X``) contains the sum of ``U``, ``S``, and ``A``, and where |
| 15 | +there is an additional layer named `unspliced` having just the unspliced counts, you would pass |
| 16 | +``--output-structure '{ "X" : ["U", "S", "A"], "unspliced" : ["U"]}'``. |
| 17 | + |
| 18 | +If you do not explicitly provide an ``--output-format``, the default of ``h5ad`` will be used. |
| 19 | + |
| 20 | +The *optional* ``--geneid-to-name`` parameter allows you to pass in a 2-column tab-separated filed mapping gene identifiers to gene names. |
| 21 | +If this is provided, then gene IDs will be converted to gene names in the output matrix. Gene names will be made unique using the ``var_names_make_unique()`` function of `ScanPy <https://scanpy-tutorials.readthedocs.io/en/latest/index.html>`_. |
| 22 | +It is also possible that some gene IDs do not have a mapped name. In this case, the ``convert`` subcommand will also write out a JSON format file, at the provided output path, with the additional suffix ``_unmapped_ids.json``. |
| 23 | +This file contains a list of the gene IDs that could not successfully be mapped to a name given the provided mapping. |
| 24 | + |
| 25 | +``convert`` command full usage |
| 26 | +------------------------------ |
| 27 | + |
| 28 | +.. code:: bash |
| 29 | +
|
| 30 | + usage: pyroe convert [-h] [--output-structure OUTPUT_STRUCTURE] [--output-format OUTPUT_FORMAT] [--geneid-to-name GENEID_TO_NAME] quant_dir output |
| 31 | +
|
| 32 | + positional arguments: |
| 33 | + quant_dir The input quantification directory containing the matrix to be converted. |
| 34 | + output The output name where the quantification matrix should be written. For `csvs` output format, this will be a directory. For all others, it will be a file. |
| 35 | +
|
| 36 | + optional arguments: |
| 37 | + -h, --help show this help message and exit |
| 38 | + --output-structure OUTPUT_STRUCTURE |
| 39 | + The structure that U,S and A counts should occupy in the output matrix. |
| 40 | + --output-format OUTPUT_FORMAT |
| 41 | + The format in which the output should be written, one of {'zarr', 'loom', 'csvs', 'h5ad'}. |
| 42 | + --geneid-to-name GENEID_TO_NAME |
| 43 | + A 2 column tab-separated list of gene ID to gene name mappings. Providing this file will project gene IDs to gene names in the output. |
0 commit comments