Added Metamorpheus Datasets - MBR+NoMBR wNormalization to benchmarking #163

anshuman-raina · 2025-06-13T21:41:57Z

User description

Motivation and Context

Please include relevant motivation and context of the problem along with a short summary of the solution.

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

I have read the MSstats contributing guidelines
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
I have run the devtools::document() command after my changes and committed the added files

PR Type

Enhancement, Other

Description

Add Metamorpheus benchmarking R script
Create helper function file for metrics calculation
Update SLURM and workflow configs for new script
Add dataset entry in scriptController.json

Changes walkthrough 📝

Relevant files

Configuration changes

benchmark.yml `Update benchmark workflow GitHub Actions config` .github/workflows/benchmark.yml Switch branch filter to feature/metamorpheus-scripts Update SSH key secret and explorer host Change scp commands to recursive copy Adjust SSH/SCP paths to new cluster	+9/-9
config.slurm `Include Metamorpheus script in SLURM config` benchmark/config.slurm Include benchmark_Metamorpheus.R in R_SCRIPTS Adjust script array syntax for SLURM	+1/-1
scriptController.json `Extend dataset config for Metamorpheus` benchmark/scriptController.json Add "DDA-Solivais2024-Metamorpheus_NoMBR_LFQ" dataset Specify parent path, data folder, sample patterns	+15/-0

Enhancement

benchmark_Metamorpheus.R `Add Metamorpheus benchmarking R script` benchmark/benchmark_Metamorpheus.R New R script for Metamorpheus benchmarking Reads dataset config from scriptController.json Runs parallel dataProcess scenarios Aggregates and prints metric results	+80/-0
metamorpheus_Process.R `Add metrics calculation helper function` benchmark/metamorpheus_Process.R Define calculate_Metrics helper function Set up contrast matrix for groupComparison Compute and report False Discovery Rate	+41/-0

Miscellaneous

metamorpheus_code.R `Add example Metamorpheus analysis script` metamorpheus_code.R Provide standalone Metamorpheus analysis example Compare MBR vs no-MBR workflows and FDR Generate QC plots and compute metrics	+112/-0

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

Summary by CodeRabbit

New Features
- Added a parallelized MetaMorpheus benchmarking workflow that computes per-task FDR metrics and reports timing.
- Added two MetaMorpheus datasets and dataset sample-type mappings (Human vs Ecoli).
Chores
- Migrated benchmark execution to a different HPC host and updated remote storage/working paths and SSH secret.
- Updated job submission/runtime settings (reduced memory, updated R/runtime libs), revamped dependency installation, and enabled line‑buffered logging.

github-actions · 2025-06-13T21:42:58Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Array Formatting The R_SCRIPTS array contains a stray comma that could break the shell array syntax and prevent proper iteration over scripts. R_SCRIPTS=("benchmark_Dowell2021-HEqe408_LFQ.R" "benchmark_Puyvelde2022-HYE5600735_LFQ.R", "benchmark_Metamorpheus.R") Missing JSON Dependency The script calls fromJSON without loading jsonlite (or another JSON library), leading to a potential undefined function error. config <- fromJSON("scriptController.json", simplifyVector = FALSE) Argument Type Mismatch The normalization parameter is passed as the string "FALSE" instead of the logical FALSE, which may not be handled correctly by dataProcess. result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20)

github-actions · 2025-06-13T21:43:41Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix Bash array syntax Remove the comma in the array declaration so that Bash treats each script as a separate element. Bash arrays should be space-separated without commas. benchmark/config.slurm [30] -R_SCRIPTS=("benchmark_Dowell2021-HEqe408_LFQ.R" "benchmark_Puyvelde2022-HYE5600735_LFQ.R", "benchmark_Metamorpheus.R") +R_SCRIPTS=("benchmark_Dowell2021-HEqe408_LFQ.R" "benchmark_Puyvelde2022-HYE5600735_LFQ.R" "benchmark_Metamorpheus.R") Suggestion importance[1-10]: 8 __ Why: The comma in the `R_SCRIPTS` array breaks Bash parsing and will lead to incorrect script names, so removing it fixes script execution.	Medium
Possible issue	Use logical FALSE for normalization The `normalization` argument expects a logical, not a string. Change `"FALSE"` to `FALSE` to disable normalization correctly. benchmark/benchmark_Metamorpheus.R [35-51] data_process_tasks <- list( list( label = "Data process without Normalization", - result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20) + result = function() dataProcess(output, featureSubset = "topN", normalization = FALSE, n_top_feature = 20) ), list( label = "Data process without Normalization with MBImpute False", - result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20, MBimpute = FALSE) + result = function() dataProcess(output, featureSubset = "topN", normalization = FALSE, n_top_feature = 20, MBimpute = FALSE) ), … ) Suggestion importance[1-10]: 7 __ Why: Passing `normalization` as a string instead of a logical may cause unexpected behavior; using `FALSE` matches the `dataProcess` API.	Medium
General	Prevent division by zero Guard against division by zero when no proteins pass filtering. Compute the denominator first and set `FDR` to `NA` or zero if it is zero. benchmark/metamorpheus_Process.R [30] -FDR = nrow(human) / (nrow(ecoli) + nrow(human)) +denom <- nrow(ecoli) + nrow(human) +FDR <- if (denom > 0) nrow(human) / denom else NA Suggestion importance[1-10]: 7 __ Why: Calculating `FDR` without checking if the denominator is zero can cause a divide-by-zero error or `Inf`, so validating it ensures safe computation.	Medium
General	Ensure at least one core If `detectCores()` returns 1, `num_cores` becomes zero and `mclapply` will error. Ensure at least one core by using `max()`. benchmark/benchmark_Metamorpheus.R [64] -num_cores <- detectCores() - 1 +num_cores <- max(detectCores() - 1, 1) Suggestion importance[1-10]: 6 __ Why: If `detectCores()` returns 1 then `mc.cores=0` causes `mclapply` to error; ensuring at least one core prevents this runtime failure.	Low

benchmark/benchmark_Metamorpheus.R

tonywu1999 · 2025-06-26T20:28:36Z

benchmark/benchmark_Metamorpheus.R

+
+  input = input %>% filter(`Protein Group` %in% protein_mappings$`Protein Groups`)
+
+  output = MetamorpheusToMSstatsFormat(input, annot)


The MetamorpheusToMSstatsFormat function also has these two parameters. By default, they're set to TRUE

removeFewMeasurements, removeProtein_with1Feature

Could you double check there aren't major differences in empirical FDR when these two parameters are set to FALSE? I'm thinking due to the absence of PIP, certain proteins may be filtered out altogether, which could explain better empirical FDR with no PIP.

benchmark/metamorpheus_Process.R

metamorpheus_code.R

benchmark/metamorpheus_Process.R

coderabbitai · 2025-08-15T17:15:26Z

Walkthrough

Retargets CI to a different HPC host and secret, updates remote paths and Slurm job config and R environment, adds a MetaMorpheus benchmarking R harness and a metrics helper, and updates dataset entries in the benchmark controller JSON.

Changes

Cohort / File(s)	Summary
CI workflow retargeting to Explorer HPC `.github/workflows/benchmark.yml`	Branch filter changed to `feature/metamorpheus-scripts`; use `SSH_PRIVATE_KEY_EXPLORER` and `login.explorer.northeastern.edu`; rsync entire `benchmark` dir to `/projects/VitekLab/Projects/Benchmarking`; update remote working dir, sbatch invocation, monitoring host, and output retrieval paths; artifact step retained.
Slurm job config and R environment refresh `benchmark/config.slurm`	Change chdir to `/projects/VitekLab/Projects/Benchmarking/benchmark`; memory 256G→128G; simplify module loads to `R` and `cmake/3.30.2`; set R 4.4 lib paths, create `lib_fix` and symlink, extend `LD_LIBRARY_PATH`; replace inline installs with an Rscript that sets `.libPaths`, installs `nloptr` from source, Bioc packages, and force-installs MSstats from GitHub; add `benchmark_Metamorpheus.R` to script list; run Rscript via `stdbuf` for line-buffered output.
MetaMorpheus benchmarking harness `benchmark/benchmark_Metamorpheus.R`	Add `runBenchmarkForMetaMorpheusData(datasetPath, config)`: load config/files, filter multi-protein entries and decoys, map proteins by organism, convert to MSstats format, define six processing configs, execute them in parallel with `mclapply`, compute metrics per summarized result, aggregate results, and invoke for two MetaMorpheus datasets.
Metrics computation helper `benchmark/metamorpheus_Process.R`	Add `calculate_Metrics(QuantData, protein_mappings, task_label, alpha = 0.05)`; build 4×5 contrast matrix, run `groupComparison`, flag E. coli proteins via `protein_mappings`, compute per-label true/false positives and FDR, and return aggregated per-task FDRs.
Dataset controller updates `benchmark/scriptController.json`	Update dataset file paths from `/work/...` → `/projects/...`; add `samples` mapping for `DDA-Dowell2021-HEqe408_LFQ`; add datasets `DDA-Solivais2024-Metamorpheus_NoMBR_LFQ` and `DDA-Solivais2024-Metamorpheus_MBR_LFQ` with parent/data and `Human`/`Ecoli` sample patterns.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as GitHub Actions
  participant SSH as Explorer SSH
  participant Slurm as Slurm Scheduler
  participant Node as Compute Node

  Dev->>SSH: Setup SSH (new key/host)
  Dev->>SSH: rsync benchmark/ -> /projects/.../benchmark
  Dev->>SSH: sbatch /projects/.../benchmark/config.slurm
  SSH->>Slurm: Submit job
  Slurm-->>Dev: Job ID
  Slurm->>Node: Launch job
  Node->>Node: Setup R env, install deps
  Node->>Node: Run R scripts (line-buffered)
  Node-->>SSH: Write job_output.txt / job_error.txt
  Dev->>SSH: Retrieve outputs from /projects/.../benchmark/
  Dev-->>Dev: Upload artifact

sequenceDiagram
  autonumber
  participant R as benchmark_Metamorpheus.R
  participant CFG as scriptController.json
  participant Proc as MSstats/MSstatsConvert
  participant Met as calculate_Metrics

  R->>CFG: Load dataset config
  R->>R: Read QuantifiedPeaks/Proteins + annotation
  R->>R: Filter decoys/multiproteins and map organisms
  R->>Proc: MetamorpheusToMSstatsFormat()
  par Six processing configs
    R->>Proc: dataProcess(...) per config
  and
  end
  loop For each summarized result
    R->>Met: calculate_Metrics(summarized, mappings, label)
  end
  R->>R: Aggregate FDR metrics and print timing

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

I thump a paw on rsync lanes,
New hosts and scripts, and slurm-file trains.
Six tiny trials hop in parallel rows,
E. coli peeks where the protein wind blows.
R scripts hum — the benchmark burrow grows.

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/metamorpheus-scripts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 9

♻️ Duplicate comments (5)

metamorpheus_code.R (3)
21-26: Remove commented-out code blocks

These stale comments add noise.

Apply this diff:
-# input_no_mbr$`Protein Group` = ifelse(
-#   input_no_mbr$`Protein Group` %in% ecoli$`Protein Groups`, 
-#   paste(input_no_mbr$`Protein Group`, "|ECOLI", sep = ""), 
-#   paste(input_no_mbr$`Protein Group`, "|HUMAN", sep = ""))
-# write.csv(input_no_mbr, "QuantifiedPeaks.csv", row.names = FALSE)
and
-# input$`Protein Group` = ifelse(
-#   input$`Protein Group` %in% ecoli$`Protein Groups`,
-#   paste(input$`Protein Group`, "|ECOLI", sep = ""),
-#   paste(input$`Protein Group`, "|HUMAN", sep = ""))
-# write.csv(input, "QuantifiedPeaks-MBR.csv", row.names = FALSE)
Also applies to: 73-78

1-112: Please remove this exploratory script from the PR (env-specific, redundant, and not used in the CI path)

This file hard-codes absolute HPC paths, duplicates logic that now exists in the benchmarking harness, and includes ad-hoc visualization (hist) and commented code. It’s not invoked by the workflow, and a prior review already requested its removal.

Recommendation:

Remove this file from the PR (or move it to a docs/examples location with parameterized paths and no hard-coded env specifics).

The CI and Slurm pipeline already use benchmark/benchmark_Metamorpheus.R + benchmark/metamorpheus_Process.R.

49-56: Don’t hard-code a single comparison; reuse calculate_Metrics and compute robust FDRs

You’re filtering only “B-A” and doing one-off FDR logic here. The PR already introduces calculate_Metrics for exactly this. Also guard against division by zero.

Apply this diff:
-e_group_no_mbr = model_no_mbr$ComparisonResult %>% filter(Label == "B-A") %>% filter(is.na(issue))
-ecoli_no_mbr = e_group_no_mbr %>% filter(ecoli == TRUE)
-hist(ecoli_no_mbr$log2FC)
-
-ecoli_no_mbr = e_group_no_mbr %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == TRUE)
-human_no_mbr = e_group_no_mbr %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == FALSE)
-FDR_no_mbr = nrow(human_no_mbr) / (nrow(ecoli_no_mbr) + nrow(human_no_mbr))
+source("benchmark/metamorpheus_Process.R")
+metrics_no_mbr <- calculate_Metrics(QuantData_no_mbr, protein_mappings, task_label = "NoMBR", alpha = 0.05)
+print(metrics_no_mbr)
benchmark/benchmark_Metamorpheus.R (2)
31-35: Prefer using the Organism column in QuantifiedPeaks.tsv instead of joining via QuantifiedProteins.tsv

Per earlier feedback, newer MetaMorpheus outputs include Organism directly in QuantifiedPeaks.tsv. Filtering in-place avoids an extra read and potential mismatches between “Protein Group(s)” columns across files.

Apply this diff to inline the filter with a fallback to the old behavior if Organism is absent:
-protein_mappings = data.table::fread(file.path(filePath, "QuantifiedProteins.tsv"))
-protein_mappings = protein_mappings %>% filter(Organism %in% c("Escherichia coli (strain K12)", "Homo sapiens"))
-
-input = input %>% filter(`Protein Group` %in% protein_mappings$`Protein Groups`)
+if ("Organism" %in% names(input)) {
+  input <- input %>%
+    dplyr::filter(Organism %in% c("Escherichia coli (strain K12)", "Homo sapiens"))
+} else {
+  protein_mappings <- data.table::fread(file.path(filePath, "QuantifiedProteins.tsv"))
+  protein_mappings <- protein_mappings %>%
+    dplyr::filter(Organism %in% c("Escherichia coli (strain K12)", "Homo sapiens"))
+  input <- input %>%
+    dplyr::filter(`Protein Group` %in% protein_mappings$`Protein Groups`)
+}
If organism strings vary across datasets (e.g., “Escherichia coli str. K-12”), consider a regex-based match or drive organism filters from config.

36-36: Double-check MSstatsConvert pre-filtering options for FDR comparability

MetamorpheusToMSstatsFormat likely forwards or reimplements MSstatsConvert behavior. Please verify the effect of:

removeFewMeasurements

removeProtein_with1Feature

Defaults are typically TRUE and may change empirical FDR (especially without PIP). Consider exposing these via config or explicitly setting them here for the benchmarking runs to ensure apples-to-apples comparisons.

🧹 Nitpick comments (11)

benchmark/metamorpheus_Process.R (2)
19-22: Harden organism matching and avoid reliance on attached packages

Hard-coded exact string matching is brittle. Also, this function assumes dplyr is attached. Prefer explicit namespaces and robust matching.

Apply this diff:
-  ecoli_ids <- protein_mappings %>%
-    filter(Organism == "Escherichia coli (strain K12)") %>%
-    pull(`Protein Groups`)
+  ecoli_ids <- protein_mappings |>
+    dplyr::filter(grepl("Escherichia coli", Organism, fixed = TRUE)) |>
+    dplyr::pull(`Protein Groups`)
If you prefer to keep %>%, either import magrittr or switch the file to use |> consistently. Using namespaces (dplyr::) keeps this helper self-contained and safer when sourced.

23-26: Namespace dplyr calls to reduce hidden dependencies

This function uses dplyr verbs without importing. Namespace them to avoid runtime errors when the caller hasn't attached dplyr.

Apply this diff:
-  comp <- model$ComparisonResult %>%
-    mutate(ecoli = Protein %in% ecoli_ids) %>%
-    filter(is.na(issue))
+  comp <- model$ComparisonResult |>
+    dplyr::mutate(ecoli = Protein %in% ecoli_ids) |>
+    dplyr::filter(is.na(issue))
.github/workflows/benchmark.yml (4)
24-26: Sanity-check SSH key handling

Using echo for multi-line secrets often works, but printf is safer. Not mandatory, but reduces surprises with shell expansions.

Apply this diff:
-        echo "${{ secrets.SSH_PRIVATE_KEY_EXPLORER }}" > ~/.ssh/id_rsa
+        printf '%s' "${{ secrets.SSH_PRIVATE_KEY_EXPLORER }}" > ~/.ssh/id_rsa
34-37: Fix YAML trailing spaces flagged by linter

There are trailing spaces on these lines.

Apply this diff:
-        ssh [email protected] "cd /projects/VitekLab/Projects/Benchmarking/benchmark && sbatch config.slurm" | tee slurm_job_id.txt 
-        slurm_job_id=$(grep -oP '\d+' slurm_job_id.txt) 
+        ssh [email protected] "cd /projects/VitekLab/Projects/Benchmarking/benchmark && sbatch config.slurm" | tee slurm_job_id.txt
+        slurm_job_id=$(grep -oP '\d+' slurm_job_id.txt)
         echo "Slurm Job ID is $slurm_job_id"
-        echo "slurm_job_id=$slurm_job_id" >> $GITHUB_ENV  
+        echo "slurm_job_id=$slurm_job_id" >> $GITHUB_ENV
17-17: Update to checkout v4

Keep GA actions current.

Apply this diff:
-      uses: actions/checkout@v3
+      uses: actions/checkout@v4
31-48: Optional: export FDR threshold to Slurm job environment

You define FDR_THRESHOLD but never pass it along. If the R scripts read it, export via sbatch.

Apply this diff:
-        ssh [email protected] "cd /projects/VitekLab/Projects/Benchmarking/benchmark && sbatch config.slurm" | tee slurm_job_id.txt
+        ssh [email protected] "cd /projects/VitekLab/Projects/Benchmarking/benchmark && sbatch --export=ALL,FDR_THRESHOLD=${FDR_THRESHOLD} config.slurm" | tee slurm_job_id.txt
benchmark/config.slurm (1)

12-14: Confirm toolchain availability for source builds (nloptr) on the cluster

You removed the explicit gcc module but are forcing a source build of nloptr, which requires a compiler (and NLopt via cmake). If GCC isn’t loaded by default on Explorer, this will fail.

Options:

Load a compiler toolchain, e.g., module load gcc (and possibly gcc/12 or cluster default).

Or avoid forcing source with install.packages("nloptr", repos = "https://cloud.r-project.org") and let R decide.

Would you like me to adjust the script for a fully pinned toolchain?
benchmark/benchmark_Metamorpheus.R (4)
28-30: Decoy filtering: use a dedicated Decoy flag when present

If available, prefer the explicit Decoy/IsDecoy boolean column over string matching. It’s more robust across naming schemes.
-input = input %>% filter(!str_detect(`Protein Group`, ";")) # remove multiple protein group in same cell
-input = input %>% filter(!str_detect(`Protein Group`, "DECOY")) # remove decoys
+input <- input %>% dplyr::filter(!str_detect(`Protein Group`, ";")) # remove multiple protein group in same cell
+if ("Decoy" %in% names(input)) {
+  input <- input %>% dplyr::filter(!Decoy)
+} else if ("IsDecoy" %in% names(input)) {
+  input <- input %>% dplyr::filter(!IsDecoy)
+} else {
+  input <- input %>% dplyr::filter(!str_detect(`Protein Group`, "DECOY"))
+}
67-72: Guard against zero/oversubscribed cores in mclapply

detectCores() - 1 can be zero on small VMs, and oversubscription can hurt performance. Bound cores to [1, length(tasks)].
-num_cores <- detectCores() - 1 
+num_cores <- max(1L, min(length(data_process_tasks), detectCores(logical = TRUE) - 1L))
 
 summarized_results <- mclapply(data_process_tasks, function(task) {
   list(label = task$label, summarized = task$result())
-}, mc.cores = num_cores)	
+}, mc.cores = num_cores)
74-77: Carry dataset name into the metrics output for traceability

Downstream aggregation across datasets benefits from an explicit Dataset column.
-results_list <- mclapply(summarized_results, function(res) {
-  calculate_Metrics(res$summarized, protein_mappings, res$label)
-}, mc.cores = num_cores)
+results_list <- mclapply(summarized_results, function(res) {
+  df <- calculate_Metrics(res$summarized, protein_mappings, res$label)
+  df$Dataset <- dataset_config$name
+  df
+}, mc.cores = num_cores)
89-90: Optional: Drive datasets from the controller instead of hard-coding

To reduce maintenance and keep scripts DRY, consider deriving the targets from config (e.g., those with “Metamorpheus” in the key or a dedicated flag).

Example replacement:
-runBenchmarkForMetaMorpheusData("DDA-Solivais2024-Metamorpheus_MBR_LFQ", config)
-runBenchmarkForMetaMorpheusData("DDA-Solivais2024-Metamorpheus_NoMBR_LFQ", config)
+mm_keys <- names(config$datasets)
+mm_keys <- mm_keys[grepl("Metamorpheus", mm_keys, ignore.case = TRUE)]
+invisible(lapply(mm_keys, runBenchmarkForMetaMorpheusData, config = config))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b5d79e8 and ec9fe42.

📒 Files selected for processing (6)

.github/workflows/benchmark.yml (3 hunks)
benchmark/benchmark_Metamorpheus.R (1 hunks)
benchmark/config.slurm (1 hunks)
benchmark/metamorpheus_Process.R (1 hunks)
benchmark/scriptController.json (2 hunks)
metamorpheus_code.R (1 hunks)

🧰 Additional context used

🪛 YAMLlint (1.37.1)

.github/workflows/benchmark.yml

[error] 34-34: trailing spaces

(trailing-spaces)

[error] 35-35: trailing spaces

(trailing-spaces)

[error] 37-37: trailing spaces

(trailing-spaces)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: Benchmarking-pipeline

🔇 Additional comments (3)

benchmark/scriptController.json (2)

35-49: LGTM: new MetaMorpheus NoMBR dataset entry is coherent and consistent

The parent + data convention and sample patterns align with the new benchmarking harness. Patterns are properly anchored.

50-64: LGTM: new MetaMorpheus MBR dataset entry is consistent

Matches NoMBR structure; sample patterns and parent/data paths look good.

benchmark/config.slurm (1)

43-50: Nice: line-buffered Rscript output for Slurm logs

Line buffering via stdbuf improves observability; loop structure and logging are fine.

coderabbitai · 2025-08-15T17:25:49Z

benchmark/benchmark_Metamorpheus.R

+source("metamorpheus_Process.R")
+config <- fromJSON("scriptController.json", simplifyVector = FALSE)
+


💡 Verification agent

🧩 Analysis chain

Fix: MetamorpheusToMSstatsFormat likely undefined; also make sources/paths robust to working directory

MetamorpheusToMSstatsFormat isn’t defined in this script and isn’t exported by MSstats/MSstatsConvert. It appears to live in metamorpheus_code.R, which isn’t sourced here. Also, both the source() and fromJSON() calls are relative to the current working directory and will break if the script isn’t launched from benchmark/.

Apply this diff to source both helper files and load the controller JSON robustly:

- source("metamorpheus_Process.R") -config <- fromJSON("scriptController.json", simplifyVector = FALSE) +script_args <- commandArgs(trailingOnly = FALSE) +script_path <- sub("^--file=", "", script_args[grep("^--file=", script_args)]) +base_dir <- if (length(script_path)) dirname(normalizePath(script_path)) else "." +repo_root <- normalizePath(file.path(base_dir, ".."), mustWork = FALSE) + +source(file.path(base_dir, "metamorpheus_Process.R")) +# Ensure MetamorpheusToMSstatsFormat is available +if (file.exists(file.path(repo_root, "metamorpheus_code.R"))) { + source(file.path(repo_root, "metamorpheus_code.R")) +} +config <- fromJSON(file.path(base_dir, "scriptController.json"), simplifyVector = FALSE)

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "PWD: $(pwd)" echo echo "---- benchmark directory listing ----" ls -la benchmark || true echo echo "---- show benchmark/benchmark_Metamorpheus.R (first 200 lines) ----" sed -n '1,200p' benchmark/benchmark_Metamorpheus.R || true echo echo "---- rg: search for MetamorpheusToMSstatsFormat ----" rg -n --hidden --no-ignore -S "MetamorpheusToMSstatsFormat" || true echo echo "---- rg: search for metamorpheus_code.R references ----" rg -n --hidden --no-ignore -S "metamorpheus_code\\.R" || true echo echo "---- fd: find files named metamorpheus_code.R or metamorpheus_Process.R ----" fd -I -e R --hidden "metamorpheus_code.R" || true fd -I -e R --hidden "metamorpheus_Process.R" || true echo echo "---- rg: search for source(...) of metamorpheus_Process.R ----" rg -n --hidden --no-ignore -S "source\\(.*metamorpheus_Process\\.R" || true echo echo "---- rg: search for fromJSON\\(.*scriptController.json ----" rg -n --hidden --no-ignore -S "fromJSON\\(.*scriptController\\.json" || true echo echo "---- check for metamorpheus_code.R at repo root ----" if [ -f "metamorpheus_code.R" ]; then echo "FOUND: metamorpheus_code.R at repo root"; else echo "NOT FOUND: metamorpheus_code.R at repo root"; fi

Length of output: 6452

Fix required: source metamorpheus_code.R and make file paths robust to working directory

MetamorpheusToMSstatsFormat is defined in metamorpheus_code.R at the repo root (rg matches). benchmark/benchmark_Metamorpheus.R currently only does source("metamorpheus_Process.R") and fromJSON("scriptController.json"), which will break if the script is run from a different working directory — source the helper that defines MetamorpheusToMSstatsFormat and resolve paths relative to the script.

Files to update:

benchmark/benchmark_Metamorpheus.R — replace the relative source()/fromJSON() calls (lines ~8–10).

metamorpheus_code.R (repo root) — contains MetamorpheusToMSstatsFormat and must be sourced by the benchmark script.

benchmark/metamorpheus_Process.R — keep sourced, but source via computed script path.

Apply this diff to benchmark/benchmark_Metamorpheus.R:

- source("metamorpheus_Process.R") - config <- fromJSON("scriptController.json", simplifyVector = FALSE) + script_args <- commandArgs(trailingOnly = FALSE) + script_path <- sub("^--file=", "", script_args[grep("^--file=", script_args)]) + base_dir <- if (length(script_path)) dirname(normalizePath(script_path)) else "." + repo_root <- normalizePath(file.path(base_dir, ".."), mustWork = FALSE) + + source(file.path(base_dir, "metamorpheus_Process.R")) + # Ensure MetamorpheusToMSstatsFormat is available + if (file.exists(file.path(repo_root, "metamorpheus_code.R"))) { + source(file.path(repo_root, "metamorpheus_code.R")) + } + config <- fromJSON(file.path(base_dir, "scriptController.json"), simplifyVector = FALSE)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

source("metamorpheus_Process.R")

config <- fromJSON("scriptController.json", simplifyVector = FALSE)

script_args <- commandArgs(trailingOnly = FALSE)

script_path <- sub("^--file=", "", script_args[grep("^--file=", script_args)])

base_dir <- if (length(script_path)) dirname(normalizePath(script_path)) else "."

repo_root <- normalizePath(file.path(base_dir, ".."), mustWork = FALSE)

source(file.path(base_dir, "metamorpheus_Process.R"))

# Ensure MetamorpheusToMSstatsFormat is available

if (file.exists(file.path(repo_root, "metamorpheus_code.R"))) {

source(file.path(repo_root, "metamorpheus_code.R"))

}

config <- fromJSON(file.path(base_dir, "scriptController.json"), simplifyVector = FALSE)

🤖 Prompt for AI Agents

In benchmark/benchmark_Metamorpheus.R around lines 8–10, the script currently sources metamorpheus_Process.R and reads scriptController.json using relative paths which will break if run from a different working directory; change the top of the script to compute the script directory, source the repo-root metamorpheus_code.R and the local metamorpheus_Process.R via absolute paths, and read scriptController.json via that computed path (e.g. determine script_dir with a robust pattern that falls back to getwd(), then call source(file.path(script_dir, "metamorpheus_Process.R")) and source(file.path(script_dir, "..", "metamorpheus_code.R")) and config <- fromJSON(file.path(script_dir, "scriptController.json"), simplifyVector = FALSE)).

coderabbitai · 2025-08-15T17:25:50Z

benchmark/benchmark_Metamorpheus.R

+      label = "Data process with Normalization and MBImpute False",
+      result = function() dataProcess(output, featureSubset = "topN", n_top_feature = 20, MBimpute = FALSE)
+    ),
+    list(
+      label = "Data process without Normalization",
+      result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20)
+    ),
+    list(
+      label = "Data process without Normalization with MBImpute False",
+      result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20, MBimpute = FALSE)
+    ),
+    list(
+      label = "Data process without Normalization and Imputation On for all features",
+      result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = FALSE)
+    ),
+    list(
+      label = "Data process without Normalization and Imputation On for top3 features",
+      result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = FALSE)
+    )


⚠️ Potential issue

Correct MBimpute argument name and align “Imputation On” task behavior

Two issues:

Typo: MBImpute (capital I) is not a valid dataProcess() argument; it should be MBimpute.

Tasks labeled “…Imputation On…” currently pass FALSE, turning imputation off and contradicting the labels.

Apply this diff to fix both problems:

list( label = "Data process with Normalization and MBImpute False", - result = function() dataProcess(output, featureSubset = "topN", n_top_feature = 20, MBimpute = FALSE) + result = function() dataProcess(output, featureSubset = "topN", n_top_feature = 20, MBimpute = FALSE) ), @@ list( label = "Data process without Normalization with MBImpute False", - result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20, MBimpute = FALSE) + result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20, MBimpute = FALSE) ), list( label = "Data process without Normalization and Imputation On for all features", - result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBImpute = FALSE) + result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = TRUE) ), list( label = "Data process without Normalization and Imputation On for top3 features", - result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBImpute = FALSE) + result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = TRUE) )

Note: The first two blocks already use the correct argument spelling (MBimpute). The changes above fix the last two blocks to both correct the spelling and enable imputation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

label = "Data process with Normalization and MBImpute False",

result = function() dataProcess(output, featureSubset = "topN", n_top_feature = 20, MBimpute = FALSE)

),

list(

label = "Data process without Normalization",

result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20)

),

list(

label = "Data process without Normalization with MBImpute False",

result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20, MBimpute = FALSE)

),

list(

label = "Data process without Normalization and Imputation On for all features",

result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = FALSE)

),

list(

label = "Data process without Normalization and Imputation On for top3 features",

result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = FALSE)

)

label = "Data process with Normalization and MBImpute False",

result = function() dataProcess(output, featureSubset = "topN", n_top_feature = 20, MBimpute = FALSE)

),

list(

label = "Data process without Normalization",

result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20)

),

list(

label = "Data process without Normalization with MBImpute False",

result = function() dataProcess(output, featureSubset = "topN", normalization = "FALSE", n_top_feature = 20, MBimpute = FALSE)

),

list(

label = "Data process without Normalization and Imputation On for all features",

result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = TRUE)

),

list(

label = "Data process without Normalization and Imputation On for top3 features",

result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = TRUE)

)

🤖 Prompt for AI Agents

In benchmark/benchmark_Metamorpheus.R around lines 44 to 62, the dataProcess calls have a typo (MBImpute with a capital I) and the two tasks labeled "...Imputation On..." pass MBimpute = FALSE (disabling imputation) which contradicts the labels; fix by renaming the argument to MBimpute (lowercase i) where misspelled and set MBimpute = TRUE for the two "Imputation On" cases so their behavior matches their labels.

coderabbitai · 2025-08-15T17:25:50Z

benchmark/config.slurm

+Rscript -e '
+.libPaths("/home/raina.ans/R/x86_64-pc-linux-gnu-library/4.4")
+Sys.unsetenv("R_LIBS")
+cat(".libPaths():\n"); print(.libPaths())
+
+if (!requireNamespace("remotes", quietly = TRUE))
+    install.packages("remotes", repos = "https://cloud.r-project.org")
+
+install.packages("nloptr", type = "source", repos = "https://cloud.r-project.org")
+
+for (pkg in c("dplyr", "stringr", "ggplot2")) {
+    if (!requireNamespace(pkg, quietly = TRUE)) {
+        install.packages(pkg, repos = "https://cloud.r-project.org")
+    }
+}
+BiocManager::install(c("MSstatsConvert", "preprocessCore"), force = TRUE)
+remotes::install_github("Vitek-Lab/MSstats", ref = "devel", force = TRUE)
+'


⚠️ Potential issue

This will fail if BiocManager is missing; install it before calling BiocManager::install

BiocManager::install errors if BiocManager isn’t installed. Add a guard to install BiocManager first.

Apply this diff:

Rscript -e ' .libPaths("/home/raina.ans/R/x86_64-pc-linux-gnu-library/4.4") Sys.unsetenv("R_LIBS") cat(".libPaths():\n"); print(.libPaths()) if (!requireNamespace("remotes", quietly = TRUE)) install.packages("remotes", repos = "https://cloud.r-project.org") -install.packages("nloptr", type = "source", repos = "https://cloud.r-project.org") +install.packages("nloptr", type = "source", repos = "https://cloud.r-project.org") for (pkg in c("dplyr", "stringr", "ggplot2")) { if (!requireNamespace(pkg, quietly = TRUE)) { install.packages(pkg, repos = "https://cloud.r-project.org") } } -BiocManager::install(c("MSstatsConvert", "preprocessCore"), force = TRUE) +if (!requireNamespace("BiocManager", quietly = TRUE)) { + install.packages("BiocManager", repos = "https://cloud.r-project.org") +} +BiocManager::install(c("MSstatsConvert", "preprocessCore"), force = TRUE) remotes::install_github("Vitek-Lab/MSstats", ref = "devel", force = TRUE) '

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Rscript -e '

.libPaths("/home/raina.ans/R/x86_64-pc-linux-gnu-library/4.4")

Sys.unsetenv("R_LIBS")

cat(".libPaths():\n"); print(.libPaths())

if (!requireNamespace("remotes", quietly = TRUE))

install.packages("remotes", repos = "https://cloud.r-project.org")

install.packages("nloptr", type = "source", repos = "https://cloud.r-project.org")

for (pkg in c("dplyr", "stringr", "ggplot2")) {

if (!requireNamespace(pkg, quietly = TRUE)) {

install.packages(pkg, repos = "https://cloud.r-project.org")

}

}

BiocManager::install(c("MSstatsConvert", "preprocessCore"), force = TRUE)

remotes::install_github("Vitek-Lab/MSstats", ref = "devel", force = TRUE)

'

Rscript -e '

.libPaths("/home/raina.ans/R/x86_64-pc-linux-gnu-library/4.4")

Sys.unsetenv("R_LIBS")

cat(".libPaths():\n"); print(.libPaths())

if (!requireNamespace("remotes", quietly = TRUE))

install.packages("remotes", repos = "https://cloud.r-project.org")

install.packages("nloptr", type = "source", repos = "https://cloud.r-project.org")

for (pkg in c("dplyr", "stringr", "ggplot2")) {

if (!requireNamespace(pkg, quietly = TRUE)) {

install.packages(pkg, repos = "https://cloud.r-project.org")

}

}

if (!requireNamespace("BiocManager", quietly = TRUE)) {

install.packages("BiocManager", repos = "https://cloud.r-project.org")

}

BiocManager::install(c("MSstatsConvert", "preprocessCore"), force = TRUE)

remotes::install_github("Vitek-Lab/MSstats", ref = "devel", force = TRUE)

'

🤖 Prompt for AI Agents

In benchmark/config.slurm around lines 24 to 41, the script calls BiocManager::install without ensuring the BiocManager package is present; add a guard to check for BiocManager with requireNamespace("BiocManager", quietly=TRUE) and, if missing, install it (install.packages("BiocManager", repos="https://cloud.r-project.org")) before calling BiocManager::install, then proceed with the existing BiocManager::install call.

coderabbitai · 2025-08-15T17:25:50Z

benchmark/metamorpheus_Process.R

+  groups <- levels(QuantData$ProteinLevelData$GROUP)
+  colnames(comparison) <- groups[order(as.numeric(groups))]
+


🛠️ Refactor suggestion

Avoid fragile column ordering: stop coercing group labels to numeric

as.numeric(groups) will warn and is unreliable when GROUP levels are non-numeric (e.g., "A","B","C"...). You only need to align by names; just set the column names directly. Optionally validate the expected length.

Apply this diff:

- groups <- levels(QuantData$ProteinLevelData$GROUP) - colnames(comparison) <- groups[order(as.numeric(groups))] + groups <- levels(QuantData$ProteinLevelData$GROUP) + if (length(groups) != ncol(comparison)) { + stop(sprintf("Contrast matrix expects %d groups but found %d in data: %s", + ncol(comparison), length(groups), paste(groups, collapse = ", "))) + } + colnames(comparison) <- groups

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

groups <- levels(QuantData$ProteinLevelData$GROUP)

colnames(comparison) <- groups[order(as.numeric(groups))]

groups <- levels(QuantData$ProteinLevelData$GROUP)

if (length(groups) != ncol(comparison)) {

stop(sprintf("Contrast matrix expects %d groups but found %d in data: %s",

ncol(comparison), length(groups), paste(groups, collapse = ", ")))

}

colnames(comparison) <- groups

🤖 Prompt for AI Agents

In benchmark/metamorpheus_Process.R around lines 10 to 12, the code coerces group level labels to numeric via as.numeric(groups) which is fragile and warns for non-numeric levels; instead assign column names directly using the factor levels (or sorted/validated names if needed) and optionally verify the length matches ncol(comparison) before setting colnames to avoid misalignment. Ensure you remove the as.numeric() coercion and replace it with a direct assignment of groups (or a deterministic name ordering) and add a simple length check to throw a clear error if sizes differ.

coderabbitai · 2025-08-15T17:25:50Z

benchmark/metamorpheus_Process.R

+  labels <- unique(comp$Label)
+  result_rows <- lapply(labels, function(lbl) {
+    df <- comp %>% filter(Label == lbl)
+    sig <- df %>% filter(adj.pvalue < alpha)
+
+    tp <- sig %>% filter(ecoli) %>% nrow()
+    fp <- sig %>% filter(!ecoli) %>% nrow()
+    tot <- tp + fp
+    fdr <- if (tot > 0) fp / tot else NA_real_
+
+    data.frame(
+      Task       = task_label,
+      Comparison = lbl,
+      FDR        = fdr,
+      stringsAsFactors = FALSE
+    )
+  })
+
+  results <- do.call(rbind, result_rows)
+  return(results)


🛠️ Refactor suggestion

Guard against empty results to avoid rbind error

If comp is empty (e.g., no comparisons available), do.call(rbind, list()) errors. Return an empty data.frame instead to keep the pipeline resilient.

Apply this diff:

- labels <- unique(comp$Label) - result_rows <- lapply(labels, function(lbl) { + labels <- unique(comp$Label) + if (length(labels) == 0L) { + return(data.frame(Task = character(), Comparison = character(), FDR = numeric(), + stringsAsFactors = FALSE)) + } + result_rows <- lapply(labels, function(lbl) { df <- comp %>% filter(Label == lbl) sig <- df %>% filter(adj.pvalue < alpha) tp <- sig %>% filter(ecoli) %>% nrow() fp <- sig %>% filter(!ecoli) %>% nrow() tot <- tp + fp fdr <- if (tot > 0) fp / tot else NA_real_ data.frame( Task = task_label, Comparison = lbl, FDR = fdr, stringsAsFactors = FALSE ) }) results <- do.call(rbind, result_rows) return(results)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

labels <- unique(comp$Label)

result_rows <- lapply(labels, function(lbl) {

df <- comp %>% filter(Label == lbl)

sig <- df %>% filter(adj.pvalue < alpha)

tp <- sig %>% filter(ecoli) %>% nrow()

fp <- sig %>% filter(!ecoli) %>% nrow()

tot <- tp + fp

fdr <- if (tot > 0) fp / tot else NA_real_

data.frame(

Task = task_label,

Comparison = lbl,

FDR = fdr,

stringsAsFactors = FALSE

)

})

results <- do.call(rbind, result_rows)

return(results)

labels <- unique(comp$Label)

if (length(labels) == 0L) {

return(data.frame(Task = character(), Comparison = character(), FDR = numeric(),

stringsAsFactors = FALSE))

}

result_rows <- lapply(labels, function(lbl) {

df <- comp %>% filter(Label == lbl)

sig <- df %>% filter(adj.pvalue < alpha)

tp <- sig %>% filter(ecoli) %>% nrow()

fp <- sig %>% filter(!ecoli) %>% nrow()

tot <- tp + fp

fdr <- if (tot > 0) fp / tot else NA_real_

data.frame(

Task = task_label,

Comparison = lbl,

FDR = fdr,

stringsAsFactors = FALSE

)

})

results <- do.call(rbind, result_rows)

return(results)

🤖 Prompt for AI Agents

In benchmark/metamorpheus_Process.R around lines 27 to 46, the code calls do.call(rbind, result_rows) which will error when result_rows is empty; modify the logic to check if result_rows has length > 0 before binding and, if empty, return an empty data.frame with the same columns (Task, Comparison, FDR) and types (stringsAsFactors = FALSE) so the function returns a valid empty result instead of failing.

coderabbitai · 2025-08-15T17:25:50Z

benchmark/scriptController.json

      "name": "DDA-Dowell2021-HEqe408_LFQ",
-      "file": "/work/VitekLab/Data/MS/Benchmarking/DDA-Dowell2021-HEqe408_LFQ/FragPipe/TOP0/MSstats.csv",
+      "file": "/projects/VitekLab/Data/MS/Benchmarking/DDA-Dowell2021-HEqe408_LFQ/FragPipe/TOP0/MSstats.csv",
+      "samples": {
+        "Human": {
+          "pattern": "_HUMAN$",
+          "type": "insignificant"
+        },
+        "Ecoli": {
+          "pattern": "_ECOLI$",
+          "type": "significant"
+        }
+      }


💡 Verification agent

🧩 Analysis chain

Confirm consumer code reads new fields (parent and data)

Older datasets use a file field; the MetaMorpheus entries introduce parent + data. Ensure the benchmarking script branches correctly when these fields are present.

Run this to verify the harness reads parent and data and constructs file paths accordingly:

🏁 Script executed:

#!/bin/bash # Inspect benchmark scripts for usage of scriptController.json rg -n -C3 'scriptController\.json|parent|data"|samples' --type=r benchmark

Length of output: 4202

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Searching benchmark R scripts for dataset_config usage and path construction..." rg -n -C3 'dataset_config|filePath|file.path\(|dataset_config\$file|dataset_config\$parent|dataset_config\$data' --type=r benchmark || true echo echo "----- Full contents of the three benchmark scripts referenced in previous output -----" for f in benchmark/benchmark_Metamorpheus.R benchmark/benchmark_Dowell2021-HEqe408_LFQ.R benchmark/benchmark_Puyvelde2022-HYE5600735_LFQ.R; do echo "=== $f ===" sed -n '1,240p' "$f" || true echo done

Length of output: 14136

Action: Make benchmark scripts accept both file and parent+data config shapes

Verified: benchmark/benchmark_Metamorpheus.R already constructs paths from dataset_config$parent + dataset_config$data; benchmark_Dowell2021-HEqe408_LFQ.R and benchmark_Puyvelde2022-HYE5600735_LFQ.R still assume dataset_config$file only. This will break if scriptController.json uses the new parent+data shape for non‑Metamorpheus datasets.

Files to update:

benchmark/benchmark_Metamorpheus.R — lines ~18–21: add a fallback to dataset_config$file when parent/data are absent.

benchmark/benchmark_Dowell2021-HEqe408_LFQ.R — lines ~14–21: resolve a data_file from either dataset_config$file or parent+data, then use it for fread and logging.

benchmark/benchmark_Puyvelde2022-HYE5600735_LFQ.R — lines ~14–21: same change as Dowell.

Suggested minimal patches (apply the same pattern to both Dowell and Puyvelde):

Before:

cat("Processing Dataset:", dataset_config$name, "\n") cat("Dataset File Path:", dataset_config$file, "\n") ... fragpipe_raw <- data.table::fread(dataset_config$file)

After:

data_file <- if (!is.null(dataset_config$file)) { dataset_config$file } else if (!is.null(dataset_config$parent) && !is.null(dataset_config$data)) { file.path(dataset_config$parent, dataset_config$data) } else { stop("Dataset path not specified in config (expected 'file' or 'parent' + 'data')") } cat("Processing Dataset:", dataset_config$name, "\n") cat("Dataset File Path:", data_file, "\n") ... fragpipe_raw <- data.table::fread(data_file)

Suggested change for Metamorpheus (keep existing parent+data behavior, add fallback to file):

Before:

filePath <- file.path(dataset_config$parent, dataset_config$data) annotPath <- dataset_config$parent

After:

if (!is.null(dataset_config$file)) { # If a file path is supplied, use its directory as the dataset directory filePath <- dirname(dataset_config$file) annotPath <- dirname(filePath) } else { filePath <- file.path(dataset_config$parent, dataset_config$data) annotPath <- dataset_config$parent }

Reasoning: This makes each script robust to either config shape (old file or new parent+data) and fails fast with a clear error if neither is present.

Please apply these small changes to the three scripts (or ensure scriptController.json keeps the original file field for datasets consumed by the FragPipe-based scripts).

🤖 Prompt for AI Agents

In benchmark/scriptController.json around lines 22–33 and the three R scripts: benchmark/benchmark_Metamorpheus.R (~lines 18–21), benchmark/benchmark_Dowell2021-HEqe408_LFQ.R (~lines 14–21), and benchmark/benchmark_Puyvelde2022-HYE5600735_LFQ.R (~lines 14–21), update each script to accept either dataset_config$file OR dataset_config$parent + dataset_config$data: for Dowell and Puyvelde, resolve a single data_file by using dataset_config$file if present, otherwise build it with file.path(dataset_config$parent, dataset_config$data), stop with a clear error if neither, then use data_file for logging and fread; for Metamorpheus, keep parent+data behavior but add a fallback where if dataset_config$file is provided, set filePath to dirname(dataset_config$file) and annotPath to its parent (dirname(filePath)); ensure all logging uses the resolved path and fail fast with a descriptive message when no valid path is provided.

coderabbitai · 2025-08-15T17:25:50Z

metamorpheus_code.R

+groups = levels(QuantData_no_mbr$ProteinLevelData$GROUP)
+colnames(comparison) <- groups[order(as.numeric(groups))]
+model_no_mbr <- groupComparison(contrast.matrix=comparison, data=QuantData_no_mbr,


🛠️ Refactor suggestion

Fix contrast column naming; current coercion to numeric is fragile and warns

Identical issue as in the helper: as.numeric(groups) is unreliable for non-numeric labels.

Apply this diff:

-groups = levels(QuantData_no_mbr$ProteinLevelData$GROUP) -colnames(comparison) <- groups[order(as.numeric(groups))] +groups <- levels(QuantData_no_mbr$ProteinLevelData$GROUP) +colnames(comparison) <- groups

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

groups = levels(QuantData_no_mbr$ProteinLevelData$GROUP)

colnames(comparison) <- groups[order(as.numeric(groups))]

model_no_mbr <- groupComparison(contrast.matrix=comparison, data=QuantData_no_mbr,

groups <- levels(QuantData_no_mbr$ProteinLevelData$GROUP)

colnames(comparison) <- groups

model_no_mbr <- groupComparison(contrast.matrix=comparison, data=QuantData_no_mbr,

🤖 Prompt for AI Agents

In metamorpheus_code.R around lines 40 to 42, the contrast column naming uses order(as.numeric(groups)) which coerces non-numeric group labels to NA and emits warnings; replace that coercion with a safe ordering such as order(as.character(groups)) (or order(as.numeric(as.character(groups))) only if you are certain labels are numeric) and assign colnames(comparison) <- groups[order(as.character(groups))] so non-numeric labels are handled without warnings.

metamorpheus_code.R

coderabbitai · 2025-08-15T17:25:50Z

metamorpheus_code.R

+e_group = model$ComparisonResult %>% filter(Label == "B-A") %>% filter(is.na(issue))
+ecoli = e_group %>% filter(ecoli == TRUE)
+hist(ecoli$log2FC)
+
+ecoli = e_group %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == TRUE)
+human = e_group %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == FALSE)
+FDR = nrow(human) / (nrow(ecoli) + nrow(human))
+


🛠️ Refactor suggestion

Avoid hard-coded “B-A” and duplicated FDR code; delegate to calculate_Metrics

Mirror the No MBR fix here; remove ad-hoc histogram and use the shared helper.

Apply this diff:

-e_group = model$ComparisonResult %>% filter(Label == "B-A") %>% filter(is.na(issue)) -ecoli = e_group %>% filter(ecoli == TRUE) -hist(ecoli$log2FC) - -ecoli = e_group %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == TRUE) -human = e_group %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == FALSE) -FDR = nrow(human) / (nrow(ecoli) + nrow(human)) - -cat("FDR MBR", FDR, "\n") +metrics_mbr <- calculate_Metrics(QuantData, protein_mappings, task_label = "MBR", alpha = 0.05) +print(metrics_mbr)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

e_group = model$ComparisonResult %>% filter(Label == "B-A") %>% filter(is.na(issue))

ecoli = e_group %>% filter(ecoli == TRUE)

hist(ecoli$log2FC)

ecoli = e_group %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == TRUE)

human = e_group %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == FALSE)

FDR = nrow(human) / (nrow(ecoli) + nrow(human))

metrics_mbr <- calculate_Metrics(QuantData, protein_mappings, task_label = "MBR", alpha = 0.05)

print(metrics_mbr)

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

benchmark/benchmark_Metamorpheus.R (3)
31-37: Good use of Organism-based filtering (addresses prior feedback)

Filtering QuantifiedProteins.tsv to organisms present in QuantifiedPeaks.tsv is correct and aligns with earlier feedback to use the Organism column.

8-10: Fix brittle paths and ensure MetamorpheusToMSstatsFormat is available

Using source("...") and fromJSON("...") relative to the CWD will break when invoked outside benchmark/. Also, MetamorpheusToMSstatsFormat is not guaranteed to be in scope unless metamorpheus_code.R is sourced. Make paths robust and fail fast if the function is missing.

Apply this diff:
-source("metamorpheus_Process.R")
-config <- fromJSON("scriptController.json", simplifyVector = FALSE)
+## Resolve paths relative to this script; source helpers robustly
+script_args <- commandArgs(trailingOnly = FALSE)
+script_path <- sub("^--file=", "", script_args[grep("^--file=", script_args)])
+base_dir <- if (length(script_path)) dirname(normalizePath(script_path)) else normalizePath(".")
+repo_root <- normalizePath(file.path(base_dir, ".."), mustWork = FALSE)
+
+source(file.path(base_dir, "metamorpheus_Process.R"))
+
+# Ensure MetamorpheusToMSstatsFormat is available (defined in metamorpheus_code.R at repo root)
+mm_code <- file.path(repo_root, "metamorpheus_code.R")
+if (file.exists(mm_code)) {
+  source(mm_code)
+}
+if (!exists("MetamorpheusToMSstatsFormat")) {
+  stop("MetamorpheusToMSstatsFormat not found; please source metamorpheus_code.R or add it to the search path.")
+}
+
+# Load controller JSON from the benchmark directory
+config <- fromJSON(file.path(base_dir, "scriptController.json"), simplifyVector = FALSE)
58-64: “Imputation On” tasks currently disable imputation; set MBimpute = TRUE

Labels promise “Imputation On” but MBimpute is FALSE in both tasks, changing the experiment semantics and metrics.
     list(
       label = "Data process without Normalization and Imputation On for all features",
-      result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBImpute = FALSE)
+      result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = TRUE)
     ),
     list(
       label = "Data process without Normalization and Imputation On for top3 features",
-      result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = FALSE)
+      result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = TRUE)
     )

🧹 Nitpick comments (7)

benchmark/benchmark_Metamorpheus.R (7)
21-23: Add file existence checks and read only necessary columns to reduce I/O

Fail fast with clear messages if expected inputs are missing. Also, reading only “Protein Groups” and “Organism” trims memory and speeds up processing.
-  input = data.table::fread(file.path(filePath, "QuantifiedPeaks.tsv"))
-  annot = data.table::fread(file.path(annotPath, "annotation.csv"))
+  peaks_path <- file.path(filePath, "QuantifiedPeaks.tsv")
+  annot_path <- file.path(annotPath, "annotation.csv")
+  if (!file.exists(peaks_path)) stop("Missing QuantifiedPeaks.tsv at: ", peaks_path)
+  if (!file.exists(annot_path)) stop("Missing annotation.csv at: ", annot_path)
+  input <- data.table::fread(peaks_path)
+  annot <- data.table::fread(annot_path)
@@
-  protein_mappings = data.table::fread(file.path(filePath, "QuantifiedProteins.tsv"))
+  proteins_path <- file.path(filePath, "QuantifiedProteins.tsv")
+  if (!file.exists(proteins_path)) stop("Missing QuantifiedProteins.tsv at: ", proteins_path)
+  protein_mappings <- data.table::fread(proteins_path, select = c("Protein Groups", "Organism"))
Also applies to: 31-31

28-30: Use fixed() in str_detect to avoid regex overhead and edge cases

Treat ";" and "DECOY" as fixed strings instead of regex to speed up filtering and avoid unintended regex behavior.
-  input = input %>% filter(!str_detect(`Protein Group`, ";")) # remove multiple protein group in same cell
-  input = input %>% filter(!str_detect(`Protein Group`, "DECOY")) # remove decoys
+  input <- input %>% filter(!str_detect(`Protein Group`, fixed(";")))     # remove multiple protein group in same cell
+  input <- input %>% filter(!str_detect(`Protein Group`, fixed("DECOY"))) # remove decoys
16-17: Be resilient if dataset_config$name is missing

Some controller entries may not set name. Fall back to datasetPath for logging.
-  cat("Processing Dataset:", dataset_config$name, "\n")
+  ds_name <- if (!is.null(dataset_config$name)) dataset_config$name else datasetPath
+  cat("Processing Dataset:", ds_name, "\n")
40-47: Clarify task label to reflect actual settings

First task uses default MBimpute; consider making that explicit in the label to avoid confusion when comparing results.
-      label = "Data process with Normalized Data",
+      label = "Data process with Normalization (default MBimpute)",
38-38: Expose MSstatsConvert filters to config for FDR sensitivity analysis

MetamorpheusToMSstatsFormat defaults removeFewMeasurements = TRUE and removeProtein_with1Feature = TRUE. To investigate empirical FDR sensitivity (as raised previously), parameterize these via scriptController.json and pass through here. I can draft the wiring if you want.

81-86: Persist results to disk for downstream aggregation

Printing is transient. Consider writing CSV/TSV labeled by dataset/MBR mode to the dataset folder or a results directory for the workflow.
   final_results <- do.call(rbind, results_list)
   end_time <- Sys.time()
   total_time <- end_time - start_time
-  print(final_results)
-  print(paste("Total Execution Time:", total_time))
+  print(final_results)
+  print(paste("Total Execution Time:", total_time))
+  # Optional: persist results
+  out_file <- file.path(annotPath, paste0("benchmark_Metamorpheus_", gsub("\\s+", "_", ds_name), ".csv"))
+  data.table::fwrite(final_results, out_file)
+  message("Wrote results to: ", out_file)
91-92: Isolate dataset runs so one failure doesn’t stop the entire batch

Wrap each invocation in tryCatch to continue processing the remaining dataset if one fails.
-runBenchmarkForMetaMorpheusData("DDA-Solivais2024-Metamorpheus_MBR_LFQ", config)
-runBenchmarkForMetaMorpheusData("DDA-Solivais2024-Metamorpheus_NoMBR_LFQ", config)
+lapply(
+  c("DDA-Solivais2024-Metamorpheus_MBR_LFQ", "DDA-Solivais2024-Metamorpheus_NoMBR_LFQ"),
+  function(ds) {
+    tryCatch(
+      runBenchmarkForMetaMorpheusData(ds, config),
+      error = function(e) {
+        message("ERROR in ", ds, ": ", conditionMessage(e))
+      }
+    )
+  }
+)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ec9fe42 and c7e7b60.

📒 Files selected for processing (2)

benchmark/benchmark_Metamorpheus.R (1 hunks)
benchmark/metamorpheus_Process.R (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

benchmark/metamorpheus_Process.R

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: Benchmarking-pipeline

coderabbitai · 2025-08-15T18:16:54Z

benchmark/benchmark_Metamorpheus.R

+  num_cores <- detectCores() - 1 
+
+  summarized_results <- mclapply(data_process_tasks, function(task) {
+    list(label = task$label, summarized = task$result())
+  }, mc.cores = num_cores)	
+
+
+  results_list <- mclapply(summarized_results, function(res) {
+    calculate_Metrics(res$summarized, protein_mappings, res$label)
+  }, mc.cores = num_cores)
+


💡 Verification agent

🧩 Analysis chain

Guard against mc.cores == 0 and provide a Windows-safe fallback

detectCores() - 1 can be zero; mclapply will error. Also, mclapply with mc.cores > 1 is unavailable on Windows. Add a safe core calculation and fall back to lapply when needed.

- num_cores <- detectCores() - 1 - - summarized_results <- mclapply(data_process_tasks, function(task) { - list(label = task$label, summarized = task$result()) - }, mc.cores = num_cores) + num_cores <- max(1L, as.integer(detectCores() - 1L)) + if (.Platform$OS.type != "windows" && num_cores > 1L) { + summarized_results <- mclapply( + data_process_tasks, + function(task) list(label = task$label, summarized = task$result()), + mc.cores = num_cores + ) + } else { + summarized_results <- lapply( + data_process_tasks, + function(task) list(label = task$label, summarized = task$result()) + ) + } @@ - results_list <- mclapply(summarized_results, function(res) { - calculate_Metrics(res$summarized, protein_mappings, res$label) - }, mc.cores = num_cores) + if (.Platform$OS.type != "windows" && num_cores > 1L) { + results_list <- mclapply( + summarized_results, + function(res) calculate_Metrics(res$summarized, protein_mappings, res$label), + mc.cores = num_cores + ) + } else { + results_list <- lapply( + summarized_results, + function(res) calculate_Metrics(res$summarized, protein_mappings, res$label) + ) + }

Optional verification: please run a small dataset on a 1-core environment to confirm no regression when num_cores == 1.

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Checking for benchmark/benchmark_Metamorpheus.R..." if [ -f benchmark/benchmark_Metamorpheus.R ]; then echo "== File: benchmark/benchmark_Metamorpheus.R (first 220 lines) ==" sed -n '1,220p' benchmark/benchmark_Metamorpheus.R else echo "benchmark/benchmark_Metamorpheus.R not found" fi echo echo "== Searching repo for parallel-related usages ==" rg -n --hidden -S 'detectCores|mclapply|library\$parallel\$|require\$parallel\$|\\.Platform' || true echo echo "== Showing matches in benchmark/ directory with context ==" rg -n --hidden -S --context 3 'detectCores|mclapply|\\.Platform' benchmark || true

Length of output: 8050

Guard against mc.cores == 0 and add a Windows-safe fallback (applies to multiple benchmark scripts)

detectCores() - 1 can be 0 (causing mclapply to error) and mclapply’s multicore mode is not available on Windows — fall back to lapply when multicore isn’t usable.

Files to update:

benchmark/benchmark_Metamorpheus.R (lines ~69–79)

benchmark/benchmark_Puyvelde2022-HYE5600735_LFQ.R (lines ~47–56)

benchmark/benchmark_Dowell2021-HEqe408_LFQ.R (lines ~48–56)

Suggested change (apply the same pattern to each file):

- num_cores <- detectCores() - 1 - - summarized_results <- mclapply(data_process_tasks, function(task) { - list(label = task$label, summarized = task$result()) - }, mc.cores = num_cores) + num_cores <- max(1L, as.integer(detectCores() - 1L)) + if (.Platform$OS.type != "windows" && num_cores > 1L) { + summarized_results <- mclapply( + data_process_tasks, + function(task) list(label = task$label, summarized = task$result()), + mc.cores = num_cores + ) + } else { + summarized_results <- lapply( + data_process_tasks, + function(task) list(label = task$label, summarized = task$result()) + ) + } @@ - results_list <- mclapply(summarized_results, function(res) { - calculate_Metrics(res$summarized, protein_mappings, res$label) - }, mc.cores = num_cores) + if (.Platform$OS.type != "windows" && num_cores > 1L) { + results_list <- mclapply( + summarized_results, + function(res) calculate_Metrics(res$summarized, protein_mappings, res$label), + mc.cores = num_cores + ) + } else { + results_list <- lapply( + summarized_results, + function(res) calculate_Metrics(res$summarized, protein_mappings, res$label) + ) + }

Optional: verify on a 1-core / Windows-like environment to ensure no regressions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

num_cores <- detectCores() - 1

summarized_results <- mclapply(data_process_tasks, function(task) {

list(label = task$label, summarized = task$result())

}, mc.cores = num_cores)

results_list <- mclapply(summarized_results, function(res) {

calculate_Metrics(res$summarized, protein_mappings, res$label)

}, mc.cores = num_cores)

num_cores <- max(1L, as.integer(detectCores() - 1L))

if (.Platform$OS.type != "windows" && num_cores > 1L) {

summarized_results <- mclapply(

data_process_tasks,

function(task) list(label = task$label, summarized = task$result()),

mc.cores = num_cores

)

} else {

summarized_results <- lapply(

data_process_tasks,

function(task) list(label = task$label, summarized = task$result())

)

}

if (.Platform$OS.type != "windows" && num_cores > 1L) {

results_list <- mclapply(

summarized_results,

function(res) calculate_Metrics(res$summarized, protein_mappings, res$label),

mc.cores = num_cores

)

} else {

results_list <- lapply(

summarized_results,

function(res) calculate_Metrics(res$summarized, protein_mappings, res$label)

)

}

🤖 Prompt for AI Agents

In benchmark/benchmark_Metamorpheus.R around lines 69–79, detectCores() - 1 may be 0 and mclapply is not available on Windows; change to compute num_cores <- max(detectCores() - 1L, 1L) and then call either lapply when running on Windows or when num_cores == 1, otherwise call mclapply with mc.cores = num_cores; apply the same pattern to the other two files (benchmark_Puyvelde2022-HYE5600735_LFQ.R lines ~47–56 and benchmark_Dowell2021-HEqe408_LFQ.R lines ~48–56) so multicore is used only when supported and falls back to lapply safely.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (9)

metamorpheus_code.R (4)
1-1: Consider removing this standalone script from the PR

This file appears to be a standalone analysis script that duplicates functionality from the benchmarking framework. Since the benchmarking harness (benchmark/benchmark_Metamorpheus.R) already implements the same workflow in a more structured way, keeping this file may cause confusion and maintenance burden.

99-108: Use the shared calculate_Metrics helper for consistency

The FDR calculation is duplicated from the no-MBR section. Using the shared helper would improve maintainability.

Apply this diff:
-e_group = model$ComparisonResult %>% filter(Label == "B-A") %>% filter(is.na(issue))
-ecoli = e_group %>% filter(ecoli == TRUE)
-hist(ecoli$log2FC)
-
-ecoli = e_group %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == TRUE)
-human = e_group %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == FALSE)
-FDR = nrow(human) / (nrow(ecoli) + nrow(human))
-
-
-cat("FDR MBR", FDR, "\n")
+metrics_mbr <- calculate_Metrics(QuantData, protein_mappings, task_label = "MBR", alpha = 0.05)
+print(metrics_mbr)
40-42: Fix fragile group ordering for contrast column names

Using as.numeric(groups) on factor levels can produce NA values and warnings for non-numeric group labels. This approach is unreliable and should be replaced with a more robust ordering mechanism.

Apply this diff to fix the issue:
-groups = levels(QuantData_no_mbr$ProteinLevelData$GROUP)
-colnames(comparison) <- groups[order(as.numeric(groups))]
+groups <- levels(QuantData_no_mbr$ProteinLevelData$GROUP)
+colnames(comparison) <- groups
90-92: Fix fragile group ordering for MBR contrast column names

Same issue as in the no-MBR section - using as.numeric(groups) on factor levels is unreliable.

Apply this diff:
-groups = levels(QuantData$ProteinLevelData$GROUP)
-colnames(comparison) <- groups[order(as.numeric(groups))]
+groups <- levels(QuantData$ProteinLevelData$GROUP)
+colnames(comparison) <- groups
benchmark/benchmark_Metamorpheus.R (5)
31-33: Use the Organism column from QuantifiedPeaks.tsv if available

It's been noted that Metamorpheus now includes an "Organism" column in the QuantifiedPeaks.tsv file. Using this built-in column would be more robust than relying on a separate proteins file.

Consider checking if the Organism column exists in the input data and using it directly:
-protein_mappings = data.table::fread(file.path(filePath, "QuantifiedProteins.tsv"))
-protein_mappings = protein_mappings %>% filter(Organism %in% c("Escherichia coli (strain K12)", "Homo sapiens"))
+if ("Organism" %in% colnames(input)) {
+  # Use built-in Organism column
+  input = input %>% filter(Organism %in% c("Escherichia coli (strain K12)", "Homo sapiens"))
+  protein_mappings = input %>% distinct(`Protein Group`, Organism)
+} else {
+  # Fall back to QuantifiedProteins.tsv
+  protein_mappings = data.table::fread(file.path(filePath, "QuantifiedProteins.tsv"))
+  protein_mappings = protein_mappings %>% filter(Organism %in% c("Escherichia coli (strain K12)", "Homo sapiens"))
+}
36-36: Verify impact of removeFewMeasurements and removeProtein_with1Feature parameters

The MetamorpheusToMSstatsFormat function has two parameters (removeFewMeasurements, removeProtein_with1Feature) that are TRUE by default. These could filter out proteins and potentially explain FDR differences between MBR and no-MBR workflows.

Please verify if setting these parameters to FALSE affects the empirical FDR results:
#!/bin/bash
# Check if MetamorpheusToMSstatsFormat is defined and its default parameters
rg -n -A 10 "MetamorpheusToMSstatsFormat\s*<-\s*function" --type R
8-10: Make file paths robust to working directory changes

The script uses relative paths for sourcing files and loading JSON, which will fail if run from a different working directory. Additionally, MetamorpheusToMSstatsFormat appears to be defined in metamorpheus_code.R at the repo root but isn't sourced here.

Apply this diff to make paths robust:
-source("metamorpheus_Process.R")
-config <- fromJSON("scriptController.json", simplifyVector = FALSE)
+script_args <- commandArgs(trailingOnly = FALSE)
+script_path <- sub("^--file=", "", script_args[grep("^--file=", script_args)])
+base_dir <- if (length(script_path)) dirname(normalizePath(script_path)) else "."
+repo_root <- normalizePath(file.path(base_dir, ".."), mustWork = FALSE)
+
+source(file.path(base_dir, "metamorpheus_Process.R"))
+# Source MetamorpheusToMSstatsFormat definition
+if (file.exists(file.path(repo_root, "metamorpheus_code.R"))) {
+  source(file.path(repo_root, "metamorpheus_code.R"))
+}
+config <- fromJSON(file.path(base_dir, "scriptController.json"), simplifyVector = FALSE)
56-57: Fix inconsistent imputation settings in task labels

Tasks labeled "Imputation On" are actually passing MBimpute = FALSE, which disables imputation. This contradicts the task labels and will produce misleading benchmark results.

Apply this diff to enable imputation for the "Imputation On" tasks:
     list(
       label = "Data process without Normalization and Imputation On for all features",
-      result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = FALSE)
+      result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = TRUE)
     ),
     list(
       label = "Data process without Normalization and Imputation On for top3 features",
-      result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = FALSE)
+      result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = TRUE)
     )
Also applies to: 60-61

67-76: Add safeguards for parallel execution on limited-core or Windows systems

The code can fail in two scenarios: (1) when detectCores() - 1 returns 0 on single-core systems, and (2) on Windows where mclapply with mc.cores > 1 isn't available.

Apply this diff to add proper safeguards:
-  num_cores <- detectCores() - 1 
-
-  summarized_results <- mclapply(data_process_tasks, function(task) {
-    list(label = task$label, summarized = task$result())
-  }, mc.cores = num_cores)	
-
-
-  results_list <- mclapply(summarized_results, function(res) {
-    calculate_Metrics(res$summarized, protein_mappings, res$label)
-  }, mc.cores = num_cores)
+  num_cores <- max(1L, as.integer(detectCores() - 1L))
+  
+  if (.Platform$OS.type != "windows" && num_cores > 1L) {
+    summarized_results <- mclapply(
+      data_process_tasks,
+      function(task) list(label = task$label, summarized = task$result()),
+      mc.cores = num_cores
+    )
+    
+    results_list <- mclapply(
+      summarized_results,
+      function(res) calculate_Metrics(res$summarized, protein_mappings, res$label),
+      mc.cores = num_cores
+    )
+  } else {
+    summarized_results <- lapply(
+      data_process_tasks,
+      function(task) list(label = task$label, summarized = task$result())
+    )
+    
+    results_list <- lapply(
+      summarized_results,
+      function(res) calculate_Metrics(res$summarized, protein_mappings, res$label)
+    )
+  }

🧹 Nitpick comments (1)

metamorpheus_code.R (1)
49-57: Consider using the shared calculate_Metrics helper function

The FDR calculation logic is duplicated here and in the MBR section. Since benchmark/metamorpheus_Process.R provides a calculate_Metrics function for this exact purpose, consider using it for consistency and maintainability.

Apply this diff to use the shared helper:
-e_group_no_mbr = model_no_mbr$ComparisonResult %>% filter(Label == "B-A") %>% filter(is.na(issue))
-ecoli_no_mbr = e_group_no_mbr %>% filter(ecoli == TRUE)
-hist(ecoli_no_mbr$log2FC)
-
-ecoli_no_mbr = e_group_no_mbr %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == TRUE)
-human_no_mbr = e_group_no_mbr %>% filter(adj.pvalue < 0.05) %>% filter(ecoli == FALSE)
-FDR_no_mbr = nrow(human_no_mbr) / (nrow(ecoli_no_mbr) + nrow(human_no_mbr))
-
-cat("FDR no MBR", FDR_no_mbr, "\n")
+# Source the helper if not already done
+source("benchmark/metamorpheus_Process.R")
+metrics_no_mbr <- calculate_Metrics(QuantData_no_mbr, protein_mappings, task_label = "No MBR", alpha = 0.05)
+print(metrics_no_mbr)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d8bcb03 and ba65d2d.

📒 Files selected for processing (3)

benchmark/benchmark_Metamorpheus.R (1 hunks)
benchmark/metamorpheus_Process.R (1 hunks)
metamorpheus_code.R (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

benchmark/metamorpheus_Process.R

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: Benchmarking-pipeline

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

benchmark/benchmark_Metamorpheus.R (4)

40-41: Verify impact of removeFewMeasurements and removeProtein_with1Feature in MetamorpheusToMSstatsFormat

Prior feedback asked to double-check empirical FDR when these are set to FALSE. Current call relies on defaults; consider running a scenario with both flags FALSE for comparison.

Example variant to try (not necessarily for commit):

output_alt <- MetamorpheusToMSstatsFormat(input, annot, removeFewMeasurements = FALSE, removeProtein_with1Feature = FALSE)
Run the same tasks on output_alt and compare FDR metrics against output.

71-80: Guard against mc.cores == 0 and Windows fallback for mclapply

detectCores() - 1 can be 0, and multicore mclapply isn’t available on Windows. Fall back to lapply when needed.

-  num_cores <- detectCores() - 1 
-
-  summarized_results <- mclapply(data_process_tasks, function(task) {
-    list(label = task$label, summarized = task$result())
-  }, mc.cores = num_cores)	
+  num_cores <- max(1L, as.integer(detectCores() - 1L))
+  if (.Platform$OS.type != "windows" && num_cores > 1L) {
+    summarized_results <- mclapply(
+      data_process_tasks,
+      function(task) list(label = task$label, summarized = task$result()),
+      mc.cores = num_cores
+    )
+  } else {
+    summarized_results <- lapply(
+      data_process_tasks,
+      function(task) list(label = task$label, summarized = task$result())
+    )
+  }
@@
-  results_list <- mclapply(summarized_results, function(res) {
-    calculate_Metrics(res$summarized, protein_mappings, res$label)
-  }, mc.cores = num_cores)
+  if (.Platform$OS.type != "windows" && num_cores > 1L) {
+    results_list <- mclapply(
+      summarized_results,
+      function(res) calculate_Metrics(res$summarized, protein_mappings, res$label),
+      mc.cores = num_cores
+    )
+  } else {
+    results_list <- lapply(
+      summarized_results,
+      function(res) calculate_Metrics(res$summarized, protein_mappings, res$label)
+    )
+  }

8-10: Source metamorpheus_code.R and resolve paths relative to the script (prevents undefined function error and WD fragility)

MetamorpheusToMSstatsFormat is used later (Line 40) but this file doesn’t source metamorpheus_code.R where it’s defined. Also, the relative source()/fromJSON() calls will break if the working directory isn’t benchmark/. Make paths relative to this script and source metamorpheus_code.R from repo root.

Apply this diff:

-source("metamorpheus_Process.R")
-config <- fromJSON("scriptController.json", simplifyVector = FALSE)
+script_args <- commandArgs(trailingOnly = FALSE)
+script_path <- sub("^--file=", "", script_args[grep("^--file=", script_args)])
+base_dir <- if (length(script_path)) dirname(normalizePath(script_path)) else getwd()
+repo_root <- normalizePath(file.path(base_dir, ".."), mustWork = FALSE)
+
+source(file.path(base_dir, "metamorpheus_Process.R"))
+# Ensure MetamorpheusToMSstatsFormat is available
+mm_code <- file.path(repo_root, "metamorpheus_code.R")
+if (file.exists(mm_code)) {
+  source(mm_code)
+} else {
+  warning("metamorpheus_code.R not found at repo root; ensure MetamorpheusToMSstatsFormat is available on the search path")
+}
+config <- fromJSON(file.path(base_dir, "scriptController.json"), simplifyVector = FALSE)

60-66: Align “Imputation On” task behavior with label (set MBimpute = TRUE)

Both “Imputation On” tasks currently disable imputation.

   list(
     label = "Data process without Normalization and Imputation On for all features",
-      result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = FALSE)
+      result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = TRUE)
   ),
   list(
     label = "Data process without Normalization and Imputation On for top3 features",
-      result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = FALSE)
+      result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = TRUE)
   )

🧹 Nitpick comments (5)

benchmark/benchmark_Metamorpheus.R (5)
25-27: Clarify the annotation path printout

You print “Annotation File Path” but output the directory path, not the file. Print the file path to avoid confusion.
-  cat("Annotation File Path:", annotPath, "\n")
+  cat("Annotation File:", file.path(annotPath, "annotation.csv"), "\n")
33-37: Guard against NA organisms when filtering protein mappings

Avoid carrying NA into the filter set.
-  valid_organisms <- unique(input$Organism)
+  valid_organisms <- unique(stats::na.omit(input$Organism))
31-39: Harden mapping: confirm ‘Protein Groups’ column name and avoid hard-coded name

QuantifiedProteins.tsv may expose either “Protein Groups” or “Protein Group”. Use a guard and avoid column name mismatch.
-  input = input %>% filter(`Protein Group` %in% protein_mappings$`Protein Groups`)
+  protein_groups_col <- if ("Protein Groups" %in% names(protein_mappings)) "Protein Groups" else "Protein Group"
+  input = input %>% filter(`Protein Group` %in% protein_mappings[[protein_groups_col]])
If you can, please also verify the column names contained in QuantifiedProteins.tsv for your datasets to confirm this guard is sufficient.

83-88: Optionally persist metrics to disk per dataset

Printing is fine, but persisting results helps downstream CI/HPC collection.

For example:

out_file <- file.path(filePath, paste0("benchmark_results_", gsub("\s+", "_", dataset_config$name), ".tsv"))

data.table::fwrite(final_results, out_file, sep = "\t")

93-94: Protect per-dataset runs so one failure doesn’t abort the other

Wrap invocations in tryCatch to continue on error.
-runBenchmarkForMetaMorpheusData("DDA-Solivais2024-Metamorpheus_MBR_LFQ", config)
-runBenchmarkForMetaMorpheusData("DDA-Solivais2024-Metamorpheus_NoMBR_LFQ", config)
+for (ds in c("DDA-Solivais2024-Metamorpheus_MBR_LFQ", "DDA-Solivais2024-Metamorpheus_NoMBR_LFQ")) {
+  tryCatch(
+    runBenchmarkForMetaMorpheusData(ds, config),
+    error = function(e) message(sprintf("Dataset %s failed: %s", ds, conditionMessage(e)))
+  )
+}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9fc6a86 and 8518c44.

📒 Files selected for processing (1)

benchmark/benchmark_Metamorpheus.R (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: Benchmarking-pipeline

🔇 Additional comments (1)

benchmark/benchmark_Metamorpheus.R (1)

33-34: Good: leveraging Organism column from QuantifiedPeaks.tsv

Using input$Organism to constrain protein mappings addresses earlier feedback and improves specificity.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

benchmark/benchmark_Metamorpheus.R (4)

39-39: Double-check impact of removeFewMeasurements/removeProtein_with1Feature on empirical FDR

MetamorpheusToMSstatsFormat defaults to removing sparse proteins/features (both TRUE). Given no-PIP scenarios, toggling these to FALSE may materially change recall/FDR. Please validate and document the chosen setting.

I can wire these flags through from the controller JSON (and produce side-by-side metrics) if you want.

8-10: Source helpers and config using robust paths; ensure MetamorpheusToMSstatsFormat is loaded

Current relative paths will break when not launched from benchmark/, and MetamorpheusToMSstatsFormat isn’t defined in this file. Source files via the script’s directory and include metamorpheus_code.R from repo root.

Apply this diff:

- source("metamorpheus_Process.R")
- config <- fromJSON("scriptController.json", simplifyVector = FALSE)
+ script_args <- commandArgs(trailingOnly = FALSE)
+ script_path <- sub("^--file=", "", script_args[grep("^--file=", script_args)])
+ base_dir <- if (length(script_path)) dirname(normalizePath(script_path)) else getwd()
+ repo_root <- normalizePath(file.path(base_dir, ".."), mustWork = FALSE)
+
+ source(file.path(base_dir, "metamorpheus_Process.R"))
+ # Ensure MetamorpheusToMSstatsFormat is available
+ if (file.exists(file.path(repo_root, "metamorpheus_code.R"))) {
+   source(file.path(repo_root, "metamorpheus_code.R"))
+ }
+ config <- fromJSON(file.path(base_dir, "scriptController.json"), simplifyVector = FALSE)

Optional verification to confirm the function is available:

#!/bin/bash
set -euo pipefail
# Check the definition of MetamorpheusToMSstatsFormat
rg -nP '^\s*MetamorpheusToMSstatsFormat\s*<-' metamorpheus_code.R || rg -n 'MetamorpheusToMSstatsFormat' --hidden --no-ignore

59-65: Fix “Imputation On” tasks: wrong argument name and value

Both tasks labeled “Imputation On” pass MBImpute = FALSE and use a misspelled argument (MBImpute). Use MBimpute = TRUE so behavior matches labels.

-      label = "Data process without Normalization and Imputation On for all features",
-      result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBImpute = FALSE)
+      label = "Data process without Normalization and Imputation On for all features",
+      result = function() dataProcess(output, featureSubset = "all", normalization = "FALSE", MBimpute = TRUE)
@@
-      label = "Data process without Normalization and Imputation On for top3 features",
-      result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBImpute = FALSE)
+      label = "Data process without Normalization and Imputation On for top3 features",
+      result = function() dataProcess(output, featureSubset = "top3", normalization = "FALSE", MBimpute = TRUE)

70-79: Make parallel section Windows-safe and guard against 0 cores

detectCores() - 1 can be 0, and mclapply multicore isn’t available on Windows. Fallback to lapply when needed.

-  num_cores <- detectCores() - 1 
-
-  summarized_results <- mclapply(data_process_tasks, function(task) {
-    list(label = task$label, summarized = task$result())
-  }, mc.cores = num_cores)	
-
-
-  results_list <- mclapply(summarized_results, function(res) {
-    calculate_Metrics(res$summarized, protein_mappings, res$label)
-  }, mc.cores = num_cores)
+  num_cores <- max(1L, as.integer(detectCores() - 1L))
+  if (.Platform$OS.type != "windows" && num_cores > 1L) {
+    summarized_results <- mclapply(
+      data_process_tasks,
+      function(task) list(label = task$label, summarized = task$result()),
+      mc.cores = num_cores
+    )
+    results_list <- mclapply(
+      summarized_results,
+      function(res) calculate_Metrics(res$summarized, protein_mappings, res$label),
+      mc.cores = num_cores
+    )
+  } else {
+    summarized_results <- lapply(
+      data_process_tasks,
+      function(task) list(label = task$label, summarized = task$result())
+    )
+    results_list <- lapply(
+      summarized_results,
+      function(res) calculate_Metrics(res$summarized, protein_mappings, res$label)
+    )
+  }

🧹 Nitpick comments (3)

benchmark/benchmark_Metamorpheus.R (3)

31-34: Fail fast if expected columns are missing

Protect against schema differences to avoid opaque dplyr errors later.

   protein_mappings = data.table::fread(file.path(filePath, "QuantifiedProteins.tsv"))
-
-  protein_mappings = protein_mappings %>% filter(Organism %in% c("Escherichia coli (strain K12)", "Homo sapiens"))
+  # Defensive checks
+  if (!"Protein Group" %in% names(input)) {
+    stop("Expected column 'Protein Group' not found in QuantifiedPeaks.tsv")
+  }
+  required_cols <- c("Protein Groups", "Organism")
+  missing_cols <- setdiff(required_cols, names(protein_mappings))
+  if (length(missing_cols)) {
+    stop(sprintf("Expected columns missing in QuantifiedProteins.tsv: %s", paste(missing_cols, collapse = ", ")))
+  }
+  protein_mappings = protein_mappings %>% filter(Organism %in% c("Escherichia coli (strain K12)", "Homo sapiens"))

37-37: Prefer semi_join over %in% for clarity and potential performance

Makes the key relationship explicit and avoids vector materialization for large tables.

-  input = input %>% filter(`Protein Group` %in% protein_mappings$`Protein Groups`)
+  input = input %>% semi_join(protein_mappings, by = c("Protein Group" = "Protein Groups"))

35-35: Reduce noisy output for large protein_mappings

Printing the entire table can flood logs. Print a compact summary instead.

-  print(protein_mappings)
+  cat(
+    "protein_mappings:",
+    sprintf("%d rows x %d cols", nrow(protein_mappings), ncol(protein_mappings)),
+    " | Organisms:",
+    paste(unique(protein_mappings$Organism), collapse = ", "),
+    "\n"
+  )
+  print(utils::head(protein_mappings, 3))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f6ac63e and 90cf018.

📒 Files selected for processing (1)

benchmark/benchmark_Metamorpheus.R (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build
GitHub Check: Benchmarking-pipeline

Anshuman Raina added 5 commits June 12, 2025 18:08

Changes for metamorpheus #1

d21a720

Added changes for metamorpheus no mbr file

eaee20e

Explorer Migration Changes #1

b09fae2

Path correction

f5927d8

Transfer all benchmark files - Correction

8bd37a9

anshuman-raina changed the title ~~Feature/metamorpheus scripts~~ WIP : Feature/metamorpheus scripts Jun 13, 2025

Anshuman Raina added 22 commits June 13, 2025 17:46

Changes for benchmark folder

71f62f4

Changes for path correction in slurm file

d2275bd

Removed wrong character in script

7300bcc

Changes for gcc added

79e0a2d

Changing back R-LIBS_User env in config

434c1c0

changes done to fix env

83958b8

Changes for failing package

bf4938d

Added changes for path

6b784f1

Changes added for nolptr

959d271

Changes for Library path

138c802

Added changes for POC #1

1980ad9

Added symlink of error package in our directory

331ba24

Changes for slurm

5c497f4

Changes for lesser RAM

d78a1e9

Changes for MSStats Convert added

4917efe

Changes for MSStats

90ca387

Changes for Script with fix

09cacfa

Changes to debug output

e0472b9

Change for Script order

2d8e3b3

Changes to see file print

73e6bcc

Corrections added for metamorpheus script file

244d42f

Rerun metamorpheus benchmark

462925f

Anshuman Raina added 2 commits June 19, 2025 00:58

Changes for library

1b962cf

Changes for MBR

f0ec621

anshuman-raina changed the title ~~WIP : Feature/metamorpheus scripts~~ Added Metamorpheus Datasets - MBR+NoMBR wNormalization to benchmarking Jun 19, 2025

tonywu1999 reviewed Jun 26, 2025

View reviewed changes

benchmark/benchmark_Metamorpheus.R Show resolved Hide resolved

tonywu1999 reviewed Jun 26, 2025

View reviewed changes

benchmark/metamorpheus_Process.R Outdated Show resolved Hide resolved

tonywu1999 reviewed Jun 26, 2025

View reviewed changes

metamorpheus_code.R Outdated Show resolved Hide resolved

tonywu1999 reviewed Jun 26, 2025

View reviewed changes

benchmark/metamorpheus_Process.R Outdated Show resolved Hide resolved

Changes for calculate metrics

ec9fe42

coderabbitai bot reviewed Aug 15, 2025

View reviewed changes

All PR comments resolved

c7e7b60

coderabbitai bot reviewed Aug 15, 2025

View reviewed changes

Anshuman Raina added 2 commits August 15, 2025 14:41

Fix Bug : Unique comparisons not visible

d8bcb03

Changes reverted

ba65d2d

coderabbitai bot reviewed Aug 15, 2025

View reviewed changes

Anshuman Raina added 5 commits August 15, 2025 15:39

Removed unnecessary file

423b485

Changes for variable name correction

7485d95

Variable rename revert

4fb9d06

PR feedbacks

9fc6a86

Changes for Unique Organisms

8518c44

coderabbitai bot reviewed Aug 15, 2025

View reviewed changes

Anshuman Raina added 7 commits August 15, 2025 18:23

Changes for protein_mappings

8eefbff

Correction for Organism column

a9fbe80

Organisms

3be70e4

Changes for organisms column

85a41cd

Reverted changes

c4c0c86

Added new arguments

f6ac63e

Removed params

90cf018

coderabbitai bot reviewed Aug 18, 2025

View reviewed changes


		input = input %>% filter(`Protein Group` %in% protein_mappings$`Protein Groups`)

		output = MetamorpheusToMSstatsFormat(input, annot)

		source("metamorpheus_Process.R")
		config <- fromJSON("scriptController.json", simplifyVector = FALSE)

		groups <- levels(QuantData$ProteinLevelData$GROUP)
		colnames(comparison) <- groups[order(as.numeric(groups))]

Added Metamorpheus Datasets - MBR+NoMBR wNormalization to benchmarking #163

Are you sure you want to change the base?

Added Metamorpheus Datasets - MBR+NoMBR wNormalization to benchmarking #163

Conversation

anshuman-raina commented Jun 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Motivation and Context

Changes

Testing

Checklist Before Requesting a Review

PR Type

Description

Changes walkthrough 📝

Summary by CodeRabbit

Uh oh!

github-actions bot commented Jun 13, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Jun 13, 2025

PR Code Suggestions ✨

Uh oh!

Uh oh!

tonywu1999 Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

anshuman-raina commented Jun 13, 2025 •

edited by coderabbitai bot

Loading

tonywu1999 Jun 26, 2025 •

edited

Loading

coderabbitai bot commented Aug 15, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)