Skip to content

Conversation

Copy link

Copilot AI commented Oct 21, 2025

Problem

The PRISMA2020 package was failing to correctly parse inclusion/exclusion reasons that contained commas. This is a common real-world scenario where researchers write reasons like:

  • "Study design, wrong methodology, 15"
  • "Population too young, incorrect age range, 20"
  • "Wrong outcome, not relevant to review, 8"

The expected format is "Reason text, Count" where items are separated by semicolons: "Reason1, n1; Reason2, n2; Reason3, n3".

Root Cause

The previous implementation used gsub(",.*$", "", ...) to extract the reason and gsub(".*,", "", ...) to extract the count, which splits on the first comma instead of the last comma. This caused any reason containing a comma to be truncated incorrectly:

Before (Broken):

Input:  "Study design, wrong type, 15"
Reason: "Study design"           # ❌ Truncated
Count:  " wrong type, 15"        # ❌ Includes reason text

Solution

Added a helper function split_reason_n() inside the PRISMA_data() function that uses regex to split on the last comma only:

  • Pattern: ^(.*),\s*([^,]+)$
  • Captures everything before the last comma as the reason
  • Captures everything after the last comma as the count

After (Fixed):

Input:  "Study design, wrong type, 15"
Reason: "Study design, wrong type"  # ✅ Preserved
Count:  "15"                         # ✅ Correct

Changes

  • File modified: R/PRISMA_flowdiagram.R only
  • Code impact: Added 1 helper function, updated 4 parsing locations
  • Lines changed: +63 insertions, -132 deletions (net -69 lines, simplified code)
  • Affected fields:
    • database_specific_results
    • register_specific_results
    • dbr_excluded
    • other_excluded

Backward Compatibility

No breaking changes

  • Still works with reasons that don't contain commas
  • No changes to function signatures or return types
  • No changes to data structures
  • Existing code accessing results continues to work

Testing

All tests pass including:

  • ✅ Unit tests for the helper function
  • ✅ Integration tests with PRISMA_data function
  • ✅ Real-world tests using the example CSV
  • ✅ Edge cases (empty strings, multiple commas, special characters, Unicode)
  • ✅ R syntax validation

Examples

Simple case:

"Database A, old version, 100; Database B, 50"
# Before: ["Database A", " old version, 100"], ["Database B", " 50"]
# After:  ["Database A, old version", "100"], ["Database B", "50"]

Complex case:

"Study design, RCT, no control, 30; Age too young, 15; Simple, 5"
# Before: ["Study design", " RCT, no control, 30"], ["Age too young", " 15"], ["Simple", " 5"]
# After:  ["Study design, RCT, no control", "30"], ["Age too young", "15"], ["Simple", "5"]

This fix enables users to write more natural, descriptive exclusion reasons without worrying about comma placement.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cloud.r-project.org
    • Triggering command: /usr/lib/R/bin/exec/R -e install.packages(c('scales',~+~'stringr'),~+~repos='REDACTED',~+~lib='/tmp/Rlib') (dns block)
  • cran.rstudio.com
    • Triggering command: /usr/lib/R/bin/exec/R -e install.packages(c('scales',~+~'stringr'),~+~repos='REDACTED',~+~lib='/tmp/Rlib') (dns block)
  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Problem

The PRISMA2020 package fails to render inclusion/exclusion reasons correctly when a reason contains a comma. Current parsing uses gsub() to split on commas (e.g. gsub(",.$", "", ...) and gsub(".,", "", ...)) which splits at the first comma and therefore breaks any reason that itself contains a comma.

Files/locations identified (from code search):

  • R/PRISMA_flowdiagram.R: multiple occurrences where reason/count pairs are parsed from semicolon-separated lists (dbr_excluded, other_excluded, database_specific_results, register_specific_results, and similar blocks). See examples around lines ~1400-1570, ~228-267, ~368-402, ~103-131, and other nearby sections.
  • inst/shiny-examples/PRISMA_flowdiagram/app.R: UI fields provide the raw text but no parsing changes needed in the Shiny app; changes will remain internal to R/PRISMA_flowdiagram.R.

Goal

Make parsing robust so reasons that contain commas are preserved. The separator between reason and count is the LAST comma in each item (items are separated by "; "). We need to split each item on the last comma only, leaving earlier commas inside the reason intact.

Proposed changes (actionable)

  1. Add a helper function split_reason_n(vec) inside R/PRISMA_flowdiagram.R (placed near the top of the function or file where parsing is performed). The helper will:

    • accept a character vector of items like "Reason with, comma, 12" or "Reason without comma, 3" or possibly an empty string.
    • for each item, match the last comma using a regex like "^(.),\s([^,]+)$" to capture everything before the last comma as reason and everything after as n (count).
    • return a data.frame(reason = , n = , stringsAsFactors = FALSE).
    • handle edge cases: empty input, missing comma (treat whole string as reason and n as empty), NA values.
  2. Replace existing parsing blocks that use gsub(...)/strsplit(...) for the following variables with usage of split_reason_n applied to the semicolon-split items:

    • dbr_excluded
    • other_excluded
    • database_specific_results
    • register_specific_results

    Replace logic like:
    reason = gsub(",.$", "", unlist(strsplit(as.character(...), split = "; ")))
    n = gsub(".
    ,", "", unlist(strsplit(as.character(...), split = "; ")))
    with:
    raw <- unlist(strsplit(as.character(...), split = "; "))
    if (length(raw) == 1 && raw == "") { df <- empty df } else { df <- split_reason_n(raw) }

  3. Preserve downstream behavior:

    • The 'n' column will be returned as a trimmed character. The rest of the code often pastes it into labels. Where numeric conversion is needed, convert with as.numeric(gsub(",", "", n)). Leave any existing grouping/formatting using scales::comma unchanged elsewhere.
    • Maintain existing handling where object may be a single numeric comma-formatted string (for the non-data.frame case). The code already checks is.data.frame(...) in places where it distinguishes the two cases; keep that logic.
  4. Add minimal unit / manual test instructions in the PR description and commit message:

    • Test using Shiny example: in the UI "Reports excluded" or "Other reports excluded" fields, set value to something like
      "Not relevant, secondary outcome, 5; Other reason, 3"
      and confirm the flow diagram renders both reasons with counts and preserves the comma inside the first reason.
  5. Commit message and PR title

    • Title: "Fix parsing of exclusion reasons containing commas in PRISMA_flowdiagram"
    • Body: Summarize the problem, the helper added, the replacements made, and test instructions.
  6. Files to modify

    • R/PRISMA_flowdiagram.R: add helper function and replace parsing blocks. Keep formatting/style consistent with the file.
  7. Tests & QA

    • Run R CMD check (if possible) or run the Shiny example (inst/shiny-examples/PRISMA_flowdiagram) and confirm rendering with reasons containing commas.

Please create a pull request that implements the above changes in the repository rayyansys/PRISMA2020. Include a clear commit message and PR description with the testing steps listed above.

Note: I did not change Shiny UI files. Only the parsing logic in R/PRISMA_flowdiagram.R needs modification. If you prefer I also update code to coerce counts to numeric where appropriate, I can include that in this PR or follow up with a separate PR.

This pull request was created as a result of the following prompt from Copilot chat.

Problem

The PRISMA2020 package fails to render inclusion/exclusion reasons correctly when a reason contains a comma. Current parsing uses gsub() to split on commas (e.g. gsub(",.$", "", ...) and gsub(".,", "", ...)) which splits at the first comma and therefore breaks any reason that itself contains a comma.

Files/locations identified (from code search):

  • R/PRISMA_flowdiagram.R: multiple occurrences where reason/count pairs are parsed from semicolon-separated lists (dbr_excluded, other_excluded, database_specific_results, register_specific_results, and similar blocks). See examples around lines ~1400-1570, ~228-267, ~368-402, ~103-131, and other nearby sections.
  • inst/shiny-examples/PRISMA_flowdiagram/app.R: UI fields provide the raw text but no parsing changes needed in the Shiny app; changes will remain internal to R/PRISMA_flowdiagram.R.

Goal

Make parsing robust so reasons that contain commas are preserved. The separator between reason and count is the LAST comma in each item (items are separated by "; "). We need to split each item on the last comma only, leaving earlier commas inside the reason intact.

Proposed changes (actionable)

  1. Add a helper function split_reason_n(vec) inside R/PRISMA_flowdiagram.R (placed near the top of the function or file where parsing is performed). The helper will:

    • accept a character vector of items like "Reason with, comma, 12" or "Reason without comma, 3" or possibly an empty string.
    • for each item, match the last comma using a regex like "^(.),\s([^,]+)$" to capture everything before the last comma as reason and everything after as n (count).
    • return a data.frame(reason = , n = , stringsAsFactors = FALSE).
    • handle edge cases: empty input, missing comma (treat whole string as reason and n as empty), NA values.
  2. Replace existing parsing blocks that use gsub(...)/strsplit(...) for the following variables with usage of split_reason_n applied to the semicolon-split items:

    • dbr_excluded
    • other_excluded
    • database_specific_results
    • register_specific_results

    Replace logic like:
    reason = gsub(",.$", "", unlist(strsplit(as.character(...), split = "; ")))
    n = gsub(".
    ,", "", unlist(strsplit(as.character(...), split = "; ")))
    with:
    raw <- unlist(strsplit(as.character(...), split = "; "))
    if (length(raw) == 1 && raw == "") { df <- empty df } else { df <- split_reason_n(raw) }

  3. Preserve downstream behavior:

    • The 'n' column will be returned as a trimmed character. The rest of the code often pastes it into labels. Where numeric conversion is needed, convert with as.numeric(gsub(",", "", n)). Leave any existing grouping/formatting using scales::comma unchanged elsewhere.
    • Maintain existing handling where object may be a single numeric comma-formatted string (for the non-data.frame case). The code already checks is.data.frame(...) in places where it distinguishes the two cases; keep that logic.
  4. Add minimal unit / manual test instructions in the PR description and commit message:

    • Test using Shiny example: in the UI "Reports excluded" or "Other reports excluded" fields, set value to something like
      "Not relevant, secondary outcome, 5; Other reason, 3"
      and confirm the flow diagram renders both reasons with counts and preserves the comma inside the first reason.
  5. Commit message and PR title

    • Title: "Fix parsing of exclusion reasons containing commas in PRISMA_flowdiagram"
    • Body: Summarize the problem, the helper added, the replacements made, and test instructions.
  6. Files to modify

    • R/PRISMA_flowdiagram.R: add helper function and replace parsing blocks. Keep formatting/style consistent with the file.
  7. Tests & QA

    • Run R CMD check (if possible) or run the Shiny example (inst/shiny-examples/PRISMA_flowdiagram) and confirm rendering with reasons containing commas.

Please create a pull request that implements the above changes in the repository rayyansys/PRISMA2020. Include a clear commit message and PR description with the testing steps listed above.

Note: I did not change Shiny UI files. Only the parsing logic in R/PRISMA_flowdiagram.R needs modification. If you prefer I also update code to coerce counts to numeric where appropriate, I can include that in this PR or follow up with a separate PR.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Fix parsing issue for inclusion/exclusion reasons in PRISMA2020 Fix comma parsing in exclusion reasons for PRISMA flowdiagram data Oct 21, 2025
Copilot AI requested a review from OAbubakr October 21, 2025 15:32
Copilot finished work on behalf of OAbubakr October 21, 2025 15:32
@OAbubakr OAbubakr closed this Oct 22, 2025
@hammady hammady deleted the copilot/fix-prisma-flowdiagram-parsing branch October 22, 2025 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants