Add MultiQC plugin for DCQC validation results #18

adamjtaylor · 2025-11-12T12:09:04Z

This pull request adds a new MultiQC reporting step to the DCQC workflow, enabling automated generation of validation reports from DCQC suite results. The update introduces a custom MultiQC plugin (multiqc_dcqc) for parsing and visualizing validation results, integrates its execution into the workflow, and provides configuration and documentation for usage and development.

MultiQC Integration and Workflow Updates

Added a new MULTIQC process to the workflow, which runs MultiQC using a custom plugin and configuration, and stages its output in a dedicated directory. (modules/local/multiqc.nf, conf/modules.config, workflows/dcqc.nf)

MultiQC DCQC Plugin Implementation

Introduced the multiqc_dcqc plugin, including its initialization, configuration, and entry point for MultiQC, to parse and display DCQC validation results in the report. (multiqc/multiqc_dcqc/__init__.py, multiqc/setup.py)
Added a sample MultiQC configuration file for the plugin to ensure correct module loading and search patterns for validation result files. (multiqc/multiqc_config.yaml)

Documentation and Development Support

Provided a comprehensive README for the MultiQC DCQC plugin, detailing features, installation, usage, expected input format, configuration, and output sections. (multiqc/multiqc_dcqc/README.md)
Added a .gitignore for the plugin directory to exclude Python build artifacts and development files. (multiqc/.gitignore)

…e redundant code

…organization

BryanFauble · 2025-11-12T17:33:23Z

multiqc/multiqc_dcqc/dcqc_validation.py

+        num_files = 0
+
+        for f in self.find_log_files("dcqc_validation", filehandles=True):
+            log.debug(f"Processing file: {f['fn']}")


This is a good candidate for an info level log message so it's clear to report back where in the process it currently is

BryanFauble · 2025-11-12T17:39:17Z

multiqc/multiqc_dcqc/dcqc_validation.py

+                log.warning(f"Could not parse JSON from {f['fn']}: {e}")
+                continue
+            except Exception as e:
+                log.warning(f"Error processing {f['fn']}: {e}")


Rather than stringifying the exception into the log message I would recommend leaving the exception out, but setting exc_info to True.

https://docs.python.org/3/library/logging.html#logging.Logger.debug

If exc_info does not evaluate as false, it causes exception information to be added to the logging message. If an exception tuple (in the format returned by sys.exc_info()) or an exception instance is provided, it is used; otherwise, sys.exc_info() is called to get the exception information.

BryanFauble · 2025-11-12T17:41:38Z

multiqc/multiqc_dcqc/dcqc_validation.py

+        for suite_type in sorted(self.suite_types.keys()):
+            self.add_suite_type_sections(suite_type)
+
+    def parse_suites_json(self):


Nit: Adding return typing to all of the method calls makes it a bit easier to understand what's going on when someone looks at the code later on. In this case I think it's just an int return type.

There is also quite a bit of nuance added as in-line comments throughout the code that describe what is going on - That is better served in the method docstring so it's all in one place.

BryanFauble · 2025-11-12T17:55:06Z

multiqc/setup.py

+Setup script for multiqc_dcqc MultiQC plugin
+"""
+
+from setuptools import setup, find_packages


Could we use pyproject.toml instead of setup.py? The latter is now deprecated in favor of the modern PEP 517/518 standards.

Claude gave me this as the replacement file:

[build-system] requires = ["setuptools>=61.0", "wheel"] build-backend = "setuptools.build_meta" [project] name = "multiqc_dcqc" version = "0.1.0" description = "MultiQC plugin for DCQC validation results" readme = {text = """ A MultiQC plugin that parses and visualizes validation results from the DCQC (Data Quality Control) suite. It displays validation status, test results, and detailed error messages for data files validated against various schema requirements (including HTAN and CELLxGENE). """, content-type = "text/plain"} authors = [ {name = "HTAN Data Coordinating Center"} ] license = {text = "MIT"} requires-python = ">=3.8" dependencies = [ "multiqc>=1.14", ] classifiers = [ "Development Status :: 4 - Beta", "Environment :: Console", "Environment :: Web Environment", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Operating System :: OS Independent", "Programming Language :: Python", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Topic :: Scientific/Engineering", "Topic :: Scientific/Engineering :: Bio-Informatics", ] [project.urls] Homepage = "https://github.com/ncihtan/dcqc" Repository = "https://github.com/ncihtan/dcqc" [project.entry-points."multiqc.modules.v1"] dcqc_validation = "multiqc_dcqc.dcqc_validation:MultiqcModule" [tool.setuptools] include-package-data = true zip-safe = false [tool.setuptools.packages.find] where = ["."] include = ["multiqc_dcqc*"]

https://github.com/ncihtan/dcqc doesn't seem to exist, and should swap over to https://github.com/Sage-Bionetworks-Workflows/nf-dcqc (I assume)

BryanFauble · 2025-11-12T17:58:30Z

modules/local/multiqc.nf

+    # Install the custom MultiQC plugin first
+    cd ${plugin_dir}
+    pip install --user --no-deps .
+    cd -


If this is going to be ran often it would make a lot more sense for the container to be published to GHCR that can be ran without needing to run this setup script each and every time. That way it's already baked into a published image that is used instead of:

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/multiqc:1.15--pyhdfd78af_0' : 'quay.io/biocontainers/multiqc:1.15--pyhdfd78af_0' }"

rxu17 · 2025-11-12T18:06:07Z

modules/local/multiqc.nf

+    label 'process_single'
+
+    conda "bioconda::multiqc=1.15"
+    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?


This seems fixed to a specific container version + id? Does it make sense to have it possible to pass a custom docker image so if it updates, can just pass that in over having to update this path, creating a PR and so forth? I was looking at here:

nf-dcqc/docs/usage.md

Lines 98 to 122 in baaf4c1

> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.

The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).

Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important!

They are loaded in sequence, so later profiles can overwrite earlier profiles.

If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment.

- `test`

- A profile with a complete configuration for automated testing

- Includes links to test data so needs no other parameters

- `docker`

- A generic configuration profile to be used with [Docker](https://docker.com/)

- `singularity`

- A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/)

- `podman`

- A generic configuration profile to be used with [Podman](https://podman.io/)

- `shifter`

- A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/)

- `charliecloud`

- A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/)

- `conda`

- A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud.

rxu17 · 2025-11-12T18:08:21Z

multiqc/multiqc_dcqc/README.md

+
+## Output
+
+The plugin generates three main sections in the MultiQC report:


Do you have a sample output screenshot you could add here?

rxu17 · 2025-11-12T18:11:07Z

multiqc/multiqc_dcqc/README.md

+
+## Data Format
+
+The plugin expects a `suites.json` file with the following structure:


Does it expect all of the following fields you listed to be filled out or which ones are optional?

rxu17 · 2025-11-12T18:21:22Z

multiqc/multiqc_dcqc/dcqc_validation.py

+        for f in self.find_log_files("dcqc_validation", filehandles=True):
+            log.debug(f"Processing file: {f['fn']}")
+            num_files += 1
+            try:


Does it make sense to add deeper validation for the suites.json file here while parsing for the user (e.g: more debugging messages to know where their structure is failing) or perhaps even suggest a simple json tool for validating? Or does the suites.json file input come pre-validated?

rxu17 · 2025-11-14T23:02:33Z

multiqc/multiqc_dcqc/dcqc_validation.py

May be worth adding a unit test suite for this python module and potentially adding it to the CI/CD outside of the usual dcqc integration tests (unless that is enough coverage)

andrewelamb · 2025-11-18T17:18:10Z

@adamjtaylor Thanks for this! That's quite a bit of Python code to live in a Nextflow tool. I think it would make a lot more sense to move most/all of the Python code over to PY-DCQC. This is the Python library that is solely for CLI tools that NF-DCQC uses.

Add MultiQC plugin for DCQC validation results

cf14f69

adamjtaylor requested a review from a team as a code owner November 12, 2025 12:09

adamjtaylor added 2 commits November 12, 2025 12:25

Refactor DCQC validation module: streamline status handling and remov…

8ec7d2d

…e redundant code

Move ModuleNoSamplesFound exception class definition to improve code …

bd823e0

…organization

BryanFauble reviewed Nov 12, 2025

View reviewed changes

BryanFauble requested a review from a team November 12, 2025 17:43

BryanFauble reviewed Nov 12, 2025

View reviewed changes

BryanFauble requested a review from a team November 12, 2025 18:00

rxu17 reviewed Nov 12, 2025

View reviewed changes

rxu17 reviewed Nov 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MultiQC plugin for DCQC validation results #18

Add MultiQC plugin for DCQC validation results #18

Uh oh!

adamjtaylor commented Nov 12, 2025

Uh oh!

BryanFauble Nov 12, 2025

Uh oh!

BryanFauble Nov 12, 2025 •

edited

Loading

Uh oh!

BryanFauble Nov 12, 2025

Uh oh!

BryanFauble Nov 12, 2025

Uh oh!

BryanFauble Nov 12, 2025

Uh oh!

BryanFauble Nov 12, 2025

Uh oh!

rxu17 Nov 12, 2025 •

edited

Loading

Uh oh!

rxu17 Nov 12, 2025

Uh oh!

rxu17 Nov 12, 2025

Uh oh!

rxu17 Nov 12, 2025

Uh oh!

rxu17 Nov 14, 2025

Uh oh!

andrewelamb commented Nov 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.

	The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).

	Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important!
	They are loaded in sequence, so later profiles can overwrite earlier profiles.

	If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended, since it can lead to different results on different machines dependent on the computer enviroment.

	- `test`
	- A profile with a complete configuration for automated testing
	- Includes links to test data so needs no other parameters
	- `docker`
	- A generic configuration profile to be used with [Docker](https://docker.com/)
	- `singularity`
	- A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/)
	- `podman`
	- A generic configuration profile to be used with [Podman](https://podman.io/)
	- `shifter`
	- A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/)
	- `charliecloud`
	- A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/)
	- `conda`
	- A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud.


		## Output

		The plugin generates three main sections in the MultiQC report:


		## Data Format

		The plugin expects a `suites.json` file with the following structure:

Add MultiQC plugin for DCQC validation results #18

Are you sure you want to change the base?

Add MultiQC plugin for DCQC validation results #18

Uh oh!

Conversation

adamjtaylor commented Nov 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BryanFauble Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxu17 Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewelamb commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BryanFauble Nov 12, 2025 •

edited

Loading

rxu17 Nov 12, 2025 •

edited

Loading

andrewelamb commented Nov 18, 2025 •

edited

Loading