Merge

Sebastian Birk · Sebastian Birk · commit 093c45117edc · 2025-04-11T11:59:29.000+01:00
diff --git a/README.md b/README.md
@@ -1,13 +1,13 @@
-# inflow
+# MintFlow
 
 [![Tests][badge-tests]][link-tests]
 [![Documentation][badge-docs]][link-docs]
 
-[badge-tests]: https://img.shields.io/github/actions/workflow/status/sebastianbirk/inflow/test.yaml?branch=main
-[link-tests]: https://github.com/sebastianbirk/inflow/actions/workflows/test.yml
-[badge-docs]: https://img.shields.io/readthedocs/inflow
+[badge-tests]: https://img.shields.io/github/actions/workflow/status/sebastianbirk/mintflow/test.yaml?branch=main
+[link-tests]: https://github.com/sebastianbirk/mintflow/actions/workflows/test.yml
+[badge-docs]: https://img.shields.io/readthedocs/mintflow
 
-Cellular decomposition of intrinsic and neighborhood-induced omic effects
+Microenvironment-induced and INtrinsic Transcriptomic FLOWs
 
 ## Installing the Python Environment
  **SANGER INTERNAL**: The environment is already available on farm.
@@ -20,8 +20,8 @@ conda activate /nfs/team361/aa36/PythonEnvs_2/envinflowdec27/
 
 Alternatively, you can create the python environment yourself:
 ```commandline
-git clone https://github.com/Lotfollahi-lab/inflow.git  # clone the repo
-cd ./inflow/
+git clone https://github.com/Lotfollahi-lab/mintflow.git  # clone the repo
+cd ./mintflow/
 conda env create -f environment.yml --prefix SOME_EMPTY_PATH
 ```
 
@@ -30,24 +30,24 @@ It's highly recommended to setup wandb before proceeding.
 
 To do so:
 - Go to https://wandb.ai/ and create an account.
-- Create a project called "inFlow".
+- Create a project called "MintFlow".
 
 ## Quick Start
-You can use inflow as a local package, because it's not pip installable at the moment.
+You can use mintflow as a local package, because it's not pip installable at the moment.
 
 To do so:
 ```commandline
-git clone https://github.com/Lotfollahi-lab/inflow.git  # clone the repo
-cd ./inflow/
+git clone https://github.com/Lotfollahi-lab/mintflow.git  # clone the repo
+cd ./mintflow/
 ```
-The easiest way to run inflow is through the command line interface (CLI).
+The easiest way to run MintFlow is through the command line interface (CLI).
 This involves two steps
 1. Creating four config files (you duplicate/modify template config files).
-2. Running inflow with a single command line.
+2. Running mintflow with a single command line.
 
 ### Rule of thumbs §1 for modifying the config files
 In the template config files, there are `TODO`-s of different types that you may need to modify
-- Category 1: `TODO:ESSENTIAL:TUNE`: the basic/essential parts to run inflow.
+- Category 1: `TODO:ESSENTIAL:TUNE`: the basic/essential parts to run mintflow.
 - Category 2: `TODO:TUNE`: less essneitial and/or technical details.
 - Category 3: `TODO:check`: parameters of even less importance compared to category 1 and category 2.
 
@@ -58,7 +58,7 @@ If you are, for example, a biologist with no interest/experience in computationa
 Please follow these steps
 - Training data config file:
     - Make a copy of `./cli/SampleConfigFiles/config_data_train.yml` and rename it to `YOUR_CONFIG_DATA_TRAIN.yml`
-    - Read the block of comments tarting with *"# Inflow expects a list of .h5ad files stored on disk, ..."*.
+    - Read the block of comments tarting with *"# MintFlow expects a list of .h5ad files stored on disk, ..."*.
     - Modify some parts marked by `TODO:...` and according to *"Rule of thumbs §1"* explained above.
 
 
@@ -76,29 +76,29 @@ Please follow these steps
     - Make a copy of `./cli/SampleConfigFiles/config_training.yml` and rename it to `YOUR_CONFIG_TRAINING.yml`.
     - Modify some parts marked by `TODO:...` and according to *"Rule of thumbs §1"* explained above.
 
-### Step 2 of Using the CLI: Running inflow
+### Step 2 of Using the CLI: Running MintFlow
 
 ```commandline
-cd ./inflow/  # if you haven't already done it above.
+cd ./mintflow/  # if you haven't already done it above.
 cd ./cli/
 
-python inflow_cli.py \
+python mintflow_cli.py \
 --file_config_data_train YOUR_CONFIG_DATA_TRAIN.yml \
 --file_config_data_test YOUR_CONFIG_DATA_TEST.yml \
 --file_config_model YOUR_CONFIG_MODEL.yml \
 --file_config_training YOUR_CONFIG_TRAINING.yml \
 --path_output "./Your/Output/Path/ToDump/Results/" \
 --flag_verbose "True" \
 ```
-The recommended way of accessing inflow predictions is by `adata_inflowOutput_norm.h5ad` and `adata_inflowOutput_unnorm.h5ad` created in the provided `--path_output`and `adata.obsm` and `adata.uns` in these files.
-In the former file `..._norm.h5ad` the readcount matrix `adata.X` as well as inflow predictions Xint and Xspl are row normalised, while in the latter file `_unnorm.h5ad` they are not.
+The recommended way of accessing MintFlow predictions is by `adata_mintflowOutput_norm.h5ad` and `adata_mintflowOutput_unnorm.h5ad` created in the provided `--path_output`and `adata.obsm` and `adata.uns` in these files.
+In the former file `..._norm.h5ad` the readcount matrix `adata.X` as well as MintFlow predictions Xint and Xspl are row normalised, while in the latter file `_unnorm.h5ad` they are not.
 
-Inflow dumps a README file in the provided `--path_output`, as well as each subfolder therein.
+MintFlow dumps a README file in the provided `--path_output`, as well as each subfolder therein.
 
 ## Common Issues
-- Use absolute paths (and not relative paths like `../../some/path/`) in the config files, as well as when running `python inflow_cli.py ...`.
+- Use absolute paths (and not relative paths like `../../some/path/`) in the config files, as well as when running `python mintflow_cli.py ...`.
 - TODO: intro to the script for tune window width.
-- It's common to face out of memory issue in the very last step where the big anndata objects `adata_inflowOutput_norm.h5ad` and `adata_inflowOutput_unnorm.h5ad` are created and dumped.
+- It's common to face out of memory issue in the very last step where the big anndata objects `adata_mintflowOutput_norm.h5ad` and `adata_mintflowOutput_unnorm.h5ad` are created and dumped.
 If that step fails, the results are still accesible in the output path the subfolder `CheckpointAndPredictions/`.
 One can laod the `.pt` files by
 ```python
diff --git a/cli/SampleConfigFiles/config_training.yml b/cli/SampleConfigFiles/config_training.yml
@@ -69,3 +69,7 @@ flag_finaleval_createanndata_alltissuescombined: "True"  # TODO:check
 #   - adata_inflowOutput_unnorm: the anndata "before" applying `sc.pp.normalize_total` where `adata.X` is not row normalised --> inflow predictions `Xint` and `Xspl` sum up to the unnormalised version of `adata.X`.
 #   - adata_inflowOutput_norm: the anndata "after" applying `sc.pp.normalize_total` where `adata.X` is row normalised --> inflow predictions `Xint` and `Xspl` sum up to the normalised version of `adata.X`.
 
+
+method_ODE_solver: "dopri5"
+# The ODE solver, i.e. the `method` argument passed to the function `torchdiffeq.odeint`.
+# TODO: report the effect on running time.
diff --git a/cli/inflow_cli.py b/cli/inflow_cli.py
@@ -594,7 +594,8 @@ def _convert_TrueFalse_to_bool(dict_input):
     'coef_zinb_spl_loglik': 1.0,
     'dict_config_batchtoken': {
         'flag_enable_batchtoken_flowmodule': config_model['flag_enable_batchtoken_flowmodule']
-    }
+    },
+    'method_ODE_solver':config_training['method_ODE_solver']
 }
 
 # create a list of `AdjMatPredLoss`-s ====
@@ -1450,7 +1451,7 @@ def _convert_TrueFalse_to_bool(dict_input):
         if issparse(vects_sl['muxspl']):
             vects_sl['muxspl'] = vects_sl['muxspl'].toarray()  # TODO:implement visualizations directly for sparse Xspl.
 
-        list_predXspl.append(vects_sl['muxspl'])
+        list_predXspl.append(vects_sl['muxspl_before_sc_pp_normalize_total'])
 
         del vects_sl
         gc.collect()
diff --git a/src/inflow/cli/analresults/disentanglement_violinplot.py b/src/inflow/cli/analresults/disentanglement_violinplot.py
@@ -11,6 +11,7 @@
 import seaborn as sns
 import pandas as pd
 from tqdm.autonotebook import tqdm
+from scipy.sparse import issparse
 
 
 def func_eqeq(a, b):
@@ -50,25 +51,37 @@ def vis(
     ]
     list_geneindex_inLR.sort()
 
+    np_X = adata_unnorm.X
+    if issparse(np_X):
+        np_X = np_X.toarray()
+
     for cnt_vertical_slice in tqdm(range(min_cnt_vertical_slice, max_cnt_vertical_slice), desc="Creating violin plots for tissue {}".format(idx_slplus1)):
 
         for nameop, op_eqorbiggerthaneq, func_operator in zip(['eq', 'biggerthaneq'], ['==', '>='], [func_eqeq, func_biggerthaneq]):
 
-            mask_inLR = func_operator(adata_unnorm.X.toarray()[:, list_geneindex_inLR], cnt_vertical_slice)
+            mask_inLR = func_operator(np_X[:, list_geneindex_inLR], cnt_vertical_slice)
 
-            mask_notinLR = func_operator(adata_unnorm.X.toarray()[:, list(set(range(adata_unnorm.shape[1])) - set(list_geneindex_inLR))], cnt_vertical_slice)
+            mask_notinLR = func_operator(np_X[:, list(set(range(adata_unnorm.shape[1])) - set(list_geneindex_inLR))], cnt_vertical_slice)
 
-            mask_all = func_operator(adata_unnorm.X.toarray(), cnt_vertical_slice)
+            mask_all = func_operator(np_X, cnt_vertical_slice)
 
 
             slice_pred_inLR = pred_Xspl_rownormcorrected[:, list_geneindex_inLR][mask_inLR].flatten()
             slice_pred_notinLR = pred_Xspl_rownormcorrected[:, list(set(range(adata_unnorm.shape[1])) - set(list_geneindex_inLR))][mask_notinLR].flatten()
 
+            # make the denumerators `denum_notinLRDB` and `denum_inLRDB`
+            if op_eqorbiggerthaneq == '==':
+                denum_notinLRDB = cnt_vertical_slice
+                denum_inLRDB = cnt_vertical_slice
+            else:
+                denum_notinLRDB = np_X[:, list(set(range(adata_unnorm.shape[1])) - set(list_geneindex_inLR))][mask_notinLR].flatten()
+                denum_inLRDB = np_X[:, list_geneindex_inLR][mask_inLR].flatten()
+
             plt.figure()
             sns.violinplot(
                 data={
-                    'not in LR-DB': slice_pred_notinLR / ((cnt_vertical_slice + 0.0) if(op_eqorbiggerthaneq == '==') else adata_unnorm.X.toarray()[:, list(set(range(adata_unnorm.shape[1])) - set(list_geneindex_inLR))][mask_notinLR].flatten()),
-                    'in LR-DB': slice_pred_inLR / ((cnt_vertical_slice + 0.0) if(op_eqorbiggerthaneq == '==') else adata_unnorm.X.toarray()[:, list_geneindex_inLR][mask_inLR].flatten()),
+                    'not in LR-DB': slice_pred_notinLR / denum_notinLRDB,
+                    'in LR-DB': slice_pred_inLR / denum_inLRDB,
                 },
                 cut=0
             )
diff --git a/src/inflow/generativemodel.py b/src/inflow/generativemodel.py