-
Notifications
You must be signed in to change notification settings - Fork 0
Add a parallel_integration workflow
#15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
047d471
add multi_integration workflow
jakubmajercik fdc3922
parametrize output slot names per integration method
jakubmajercik 794dc71
add nextflow.config for multi_integration
jakubmajercik 80a1105
rename multi_integration to parallel_integration; clarify move_anndat…
jakubmajercik 96b5b75
run integration methods in true parallel via forked channels
jakubmajercik 63753ab
add layer params for each method
jakubmajercik 866ec3d
bump base requirements to match openpipeline v4.1.0
jakubmajercik 2d44746
prefix scVI training arguments with scvi_
jakubmajercik 46a5365
pin newest releases of openpipeline packages
jakubmajercik e3a0915
consolidate per-method input args into shared arguments
jakubmajercik f9ac9c2
add test component asserting expected output slots
jakubmajercik a0dbba8
verify both test cases complete in parallel_integration test
jakubmajercik 979094c
add scVI covariate args and expose trained scVI model output
jakubmajercik 4e21a85
add changelog entry for parallel_integration workflow (PR #15)
jakubmajercik 350c33e
add changelog entry for move_anndata_slots component (PR #15)
jakubmajercik File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| arguments: | ||
| - name: "--output_compression" | ||
| description: | | ||
| Compression format to use for the output AnnData and/or Mudata H5 files. | ||
| By default no compression is applied. | ||
| type: string | ||
| choices: ["gzip", "lzf"] | ||
| required: false | ||
| example: "gzip" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| packages: | ||
| - anndata~=0.12.16 | ||
| - awkward #Required for reading VDJ data stored in AIRR format | ||
| - scipy~=1.17.1 # Exclude scipy 1.17.0 because https://github.com/scverse/anndata/issues/339 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| __merge__: [/src/base/requirements/anndata.yaml, .] | ||
| packages: | ||
| - mudata~=0.3.8 | ||
| script: | | ||
| exec("try:\n import zarr; from importlib.metadata import version\nexcept ModuleNotFoundError:\n exit(0)\nelse: assert int(version(\"zarr\").partition(\".\")[0]) > 2") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| github: | ||
| - openpipelines-bio/core#subdirectory=packages/python/openpipeline_testutils |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| test_setup: | ||
| - type: apt | ||
| packages: | ||
| - git | ||
| - type: python | ||
| __merge__: | ||
| - /src/base/requirements/viashpy.yaml | ||
| - /src/base/requirements/openpipeline_testutils.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| packages: | ||
| - viashpy==0.10.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| name: move_anndata_slots | ||
| namespace: "dataflow" | ||
| scope: "public" | ||
| description: | | ||
| Move slots (.obs, .var, .obsm, .varm, .obsp, .varp, .uns) from a modality | ||
| in a source MuData file into a modality in a target MuData file. | ||
| The specified slots are copied from the source modality into the target | ||
| modality. By default, copying to a key that already exists in the target | ||
| raises an error; use --allow_overwrite to overwrite it instead. | ||
|
|
||
| authors: | ||
| - __merge__: /src/authors/jakub_majercik.yaml | ||
| roles: [ author ] | ||
|
|
||
| argument_groups: | ||
| - name: "Source" | ||
| arguments: | ||
| - name: "--input_source" | ||
| type: file | ||
| description: Source h5mu file to read slots from. | ||
| direction: input | ||
| required: true | ||
| example: source.h5mu | ||
| - name: "--source_modality" | ||
| type: string | ||
| description: Modality in the source h5mu file to read slots from. | ||
| default: "rna" | ||
| required: false | ||
|
|
||
| - name: "Target" | ||
| arguments: | ||
| - name: "--input_target" | ||
| type: file | ||
| description: Target h5mu file to write slots into. | ||
| direction: input | ||
| required: true | ||
| example: target.h5mu | ||
| - name: "--target_modality" | ||
| type: string | ||
| description: | | ||
| Modality in the target h5mu file to write slots into. | ||
| Defaults to the value of --source_modality. | ||
| required: false | ||
|
|
||
| - name: "Slots to move" | ||
| arguments: | ||
| - name: "--obs" | ||
| type: string | ||
| description: | | ||
| Column names from .obs to move from the source modality to the | ||
| target modality. If not provided, .obs is not moved. | ||
| multiple: true | ||
| required: false | ||
| - name: "--var" | ||
| type: string | ||
| description: | | ||
| Column names from .var to move from the source modality to the | ||
| target modality. If not provided, .var is not moved. | ||
| multiple: true | ||
| required: false | ||
| - name: "--obsm" | ||
| type: string | ||
| description: | | ||
| Keys from .obsm to move from the source modality to the target | ||
| modality. If not provided, .obsm is not moved. | ||
| multiple: true | ||
| required: false | ||
| - name: "--varm" | ||
| type: string | ||
| description: | | ||
| Keys from .varm to move from the source modality to the target | ||
| modality. If not provided, .varm is not moved. | ||
| multiple: true | ||
| required: false | ||
| - name: "--obsp" | ||
| type: string | ||
| description: | | ||
| Keys from .obsp to move from the source modality to the target | ||
| modality. If not provided, .obsp is not moved. | ||
| multiple: true | ||
| required: false | ||
| - name: "--varp" | ||
| type: string | ||
| description: | | ||
| Keys from .varp to move from the source modality to the target | ||
| modality. If not provided, .varp is not moved. | ||
| multiple: true | ||
| required: false | ||
| - name: "--uns" | ||
| type: string | ||
| description: | | ||
| Keys from .uns to move from the source modality to the target | ||
| modality. If not provided, .uns is not moved. | ||
| multiple: true | ||
| required: false | ||
|
|
||
| - name: "Options" | ||
| arguments: | ||
| - name: "--allow_overwrite" | ||
| type: boolean_true | ||
| description: | | ||
| Allow overwriting keys that already exist in the target modality. | ||
| By default, the component raises an error if a key already exists. | ||
| When enabled, existing keys are overwritten with a warning. | ||
|
|
||
| - name: "Output" | ||
| arguments: | ||
| - name: "--output" | ||
| alternatives: ["-o"] | ||
| type: file | ||
| description: Output h5mu file (the target with slots added from the source). | ||
| direction: output | ||
| required: true | ||
| example: output.h5mu | ||
| __merge__: [., /src/base/h5_compression_argument.yaml] | ||
|
|
||
| resources: | ||
| - type: python_script | ||
| path: script.py | ||
| - path: /src/utils/setup_logger.py | ||
| - path: /src/utils/compress_h5mu.py | ||
|
|
||
| test_resources: | ||
| - type: python_script | ||
| path: test.py | ||
|
|
||
| engines: | ||
| - type: docker | ||
| image: python:3.13-slim | ||
| setup: | ||
| - type: apt | ||
| packages: | ||
| - procps | ||
| - type: python | ||
| __merge__: /src/base/requirements/anndata_mudata.yaml | ||
| __merge__: [/src/base/requirements/python_test_setup.yaml, .] | ||
|
|
||
| runners: | ||
| - type: executable | ||
| - type: nextflow | ||
| directives: | ||
| label: [ singlecpu, lowmem ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,130 @@ | ||
| import sys | ||
| from mudata import read_h5ad | ||
|
|
||
| ## VIASH START | ||
| par = { | ||
| "input_source": "source.h5mu", | ||
| "source_modality": "rna", | ||
| "input_target": "target.h5mu", | ||
| "target_modality": None, | ||
| "obs": None, | ||
| "var": None, | ||
| "obsm": None, | ||
| "varm": None, | ||
| "obsp": None, | ||
| "varp": None, | ||
| "uns": None, | ||
| "allow_overwrite": False, | ||
| "output": "output.h5mu", | ||
| "output_compression": None, | ||
| } | ||
| meta = {"resources_dir": "src/utils/"} | ||
| ## VIASH END | ||
|
|
||
| sys.path.append(meta["resources_dir"]) | ||
| from setup_logger import setup_logger | ||
| from compress_h5mu import write_h5ad_to_h5mu_with_compression | ||
|
|
||
| logger = setup_logger() | ||
|
|
||
| target_modality = par["target_modality"] or par["source_modality"] | ||
|
|
||
| logger.info( | ||
| "Reading modality '%s' from source file '%s'", | ||
| par["source_modality"], | ||
| par["input_source"], | ||
| ) | ||
| try: | ||
| source_mod = read_h5ad(par["input_source"], mod=par["source_modality"]) | ||
| except KeyError: | ||
| raise ValueError( | ||
| f"Modality '{par['source_modality']}' does not exist in source file " | ||
| f"'{par['input_source']}'." | ||
| ) | ||
|
|
||
| logger.info( | ||
| "Reading modality '%s' from target file '%s'", | ||
| target_modality, | ||
| par["input_target"], | ||
| ) | ||
| try: | ||
| target_mod = read_h5ad(par["input_target"], mod=target_modality) | ||
| except KeyError: | ||
| raise ValueError( | ||
| f"Modality '{target_modality}' does not exist in target file " | ||
| f"'{par['input_target']}'." | ||
| ) | ||
|
|
||
| # Validate indices for the axes relevant to the requested slots. | ||
| needs_obs = any(par[s] for s in ("obs", "obsm", "obsp")) | ||
| needs_var = any(par[s] for s in ("var", "varm", "varp")) | ||
|
|
||
| mismatches = [] | ||
| if needs_obs and set(source_mod.obs_names) != set(target_mod.obs_names): | ||
| mismatches.append("obs") | ||
| if needs_var and set(source_mod.var_names) != set(target_mod.var_names): | ||
| mismatches.append("var") | ||
| if mismatches: | ||
| raise ValueError( | ||
| "Index mismatch between source and target modalities: " | ||
| + " and ".join(mismatches) | ||
| + " indices do not match." | ||
| ) | ||
|
|
||
| # Reindex source to match target order if needed. | ||
| if needs_obs and not (source_mod.obs_names == target_mod.obs_names).all(): | ||
| logger.info("Reindexing source observations to match target order.") | ||
| source_mod = source_mod[target_mod.obs_names, :] | ||
| if needs_var and not (source_mod.var_names == target_mod.var_names).all(): | ||
| logger.info("Reindexing source variables to match target order.") | ||
| source_mod = source_mod[:, target_mod.var_names] | ||
|
|
||
| # .obs/.var are DataFrames (column access), .obsm/.varm/.obsp/.varp are array | ||
| # containers, and .uns is a dict -- all support key-based get/set via getattr. | ||
| _slots = [ | ||
| ("obs", par["obs"]), | ||
| ("var", par["var"]), | ||
| ("obsm", par["obsm"]), | ||
| ("varm", par["varm"]), | ||
| ("obsp", par["obsp"]), | ||
| ("varp", par["varp"]), | ||
| ("uns", par["uns"]), | ||
| ] | ||
|
|
||
| for slot_name, keys in _slots: | ||
| if not keys: | ||
| continue | ||
| source_slot = getattr(source_mod, slot_name) | ||
| target_slot = getattr(target_mod, slot_name) | ||
| missing = [k for k in keys if k not in source_slot] | ||
| if missing: | ||
| raise ValueError( | ||
| f"The following .{slot_name} keys were not found in source " | ||
| f"modality '{par['source_modality']}': {missing}" | ||
| ) | ||
| existing = [k for k in keys if k in target_slot] | ||
| if existing and not par["allow_overwrite"]: | ||
| raise ValueError( | ||
| f"The following .{slot_name} keys already exist in the target " | ||
| f"modality '{target_modality}': {existing}. " | ||
| f"Use --allow_overwrite to overwrite them." | ||
| ) | ||
| if existing: | ||
| logger.warning("Overwriting existing .%s keys: %s", slot_name, existing) | ||
|
|
||
| logger.info("Moving .%s keys: %s", slot_name, keys) | ||
| for key in keys: | ||
| target_slot[key] = source_slot[key] | ||
|
|
||
| logger.info( | ||
| "Writing output to '%s' with compression '%s'", | ||
| par["output"], | ||
| par["output_compression"], | ||
| ) | ||
| write_h5ad_to_h5mu_with_compression( | ||
| output_file=par["output"], | ||
| h5mu=par["input_target"], | ||
| modality_name=target_modality, | ||
| modality_data=target_mod, | ||
| output_compression=par["output_compression"], | ||
| ) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.