Add a parallel_integration workflow#15
Merged
Merged
Conversation
Runs harmony, scvi, scanorama, and bbknn integration methods from openpipeline in parallel on a preprocessed h5mu input, then merges each method's annotations (.obs cluster labels, .obsm embeddings + UMAP, .obsp neighbor graphs, .uns neighbor params) into a single output h5mu. Copies openpipeline's dataflow/move_anndata_slots component from PR #1163 (not yet merged/released) as a local dependency. When the PR merges and a new openpipeline tag ships, src/dataflow/, src/utils/, and src/base/ can be removed and the local dep swapped for an openpipeline repository reference in multi_integration's config.
2 tasks
dorien-er
requested changes
Jun 1, 2026
jakubmajercik
added a commit
that referenced
this pull request
Jun 2, 2026
…a_slots overwrite behavior Addresses PR #15 review: - Rename the single_cell/multi_integration workflow to parallel_integration to disambiguate from "multi" in cellranger_multi and reflect that single- method runs are also supported. Updates the config name, test.nf include path and references, and the integration_test.sh main-script path. - Fix the move_anndata_slots description: by default an existing target key raises an error; --allow_overwrite opts into overwriting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…a_slots overwrite behavior Addresses PR #15 review: - Rename the single_cell/multi_integration workflow to parallel_integration to disambiguate from "multi" in cellranger_multi and reflect that single- method runs are also supported. Updates the config name, test.nf include path and references, and the integration_test.sh main-script path. - Fix the move_anndata_slots description: by default an existing target key raises an error; --allow_overwrite opts into overwriting.
f85b539 to
80a1105
Compare
Addresses PR #15 review: chaining the integration .run() calls with `|` made Nextflow treat each method as dependent on the previous one's output channel, so per sample the four methods ran sequentially even though each only reads the original preprocessed input. Fork `integration_ch` into one branch per method (harmony, scvi, scanorama, bbknn) so they have no mutual dependency and run concurrently, then re-sync the branches with mix + groupTuple(by: 0, size: 4). Each method's `*_output` is picked explicitly from the grouped states (never a blind state merge) so an unset output cannot clobber another branch's path; a method skipped via runIf still passes its event through, keeping the group size at 4. The sequential move_slots merge of per-method annotations is unchanged.
parallel_integration workflow
parallel_integration workflowparallel_integration workflow
dorien-er
requested changes
Jun 2, 2026
anndata 0.12.7->0.12.16, scipy !=1.17.*->~=1.17.1, mudata 0.3.2->0.3.8, viashpy 0.8.0->0.10.0
Renames --early_stopping*, --max_epochs, --reduce_lr_on_plateau, --lr_factor, --lr_patience to scvi_-prefixed names; updates main.nf fromState and test.nf.
Replace per-method layer/batch/covariate args with shared --layer_log_normalized_counts, --layer_raw_counts, --obs_batch, --obs_covariates, --var_input. Add a validation map that fails early when a selected method's required layer is missing. Mirrors the argument pattern in single_cell/process_integrate_annotate.
New test_workflows/assert_integration_output component reads the output h5mu and checks the expected .obs/.obsm/.obsp/.uns slots per method are present. Wired into parallel_integration/test.nf with expected slots derived from each case's methods.
Add a terminal toSortedList/map assertion so the per-event slot checks can't be silently skipped when no events are emitted. Mirrors the openpipeline totalvi_leiden test pattern.
dorien-er
requested changes
Jun 3, 2026
- Add --obs_categorical_covariates and --obs_numerical_covariates (none by default), mapped to scvi_leiden's obs_categorical_covariate / obs_continuous_covariate. - Expose --output_scvi_model; carry scvi_model through the branch sync and emit it when scVI is selected.
dorien-er
approved these changes
Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
single_cell/multi_integrationcomposed workflow that runs harmony, scvi, scanorama, and bbknn integration methods from openpipeline in parallel on a preprocessed h5mu input, then merges each method's annotations (.obscluster labels,.obsmembeddings + UMAP,.obspneighbor graphs,.unsneighbor params) back into a single output h5mu.dataflow/move_anndata_slotscomponent from openpipelines-bio/openpipeline#1163 (not yet merged/released) into this repo as a local dependency. Once the PR merges and a new openpipeline tag ships,src/dataflow/,src/utils/, andsrc/base/can be removed and the local dep swapped for anopenpipelinerepository reference in the workflow config.Parallel-then-merge pattern
Each integration
.run()call readsstate.input(the original preprocessed file), not the previous step's output — so Nextflow's DAG scheduler sees no dependencies between them and executes the four integrations concurrently. Four sequentialmove_anndata_slotscalls then accumulate each method's slots into one merged h5mu.Test plan
viash ns build --query "dataflow/move_anndata_slots|single_cell/multi_integration"— 3/3 configs built cleanviash test src/dataflow/move_anndata_slots/config.vsh.yaml— 18/18 unit tests passbash src/single_cell/multi_integration/integration_test.sh— full Nextflow integration test (requires S3 test data + Docker; run locally or via CI).h5muto confirm all four*_integration_leiden_*obs columns,X_*_integrated/X_*_umapobsm keys, and method-specific neighbor graphs are present--integration_methods harmony) emits only the harmony slots