Skip to content

Closes #217: Add microbiology related SDTM (MB, MS, BE)#218

Merged
Lina2689 merged 39 commits intopharmaverse:mainfrom
Gero1999:217-add-mb-ms-be
Mar 20, 2026
Merged

Closes #217: Add microbiology related SDTM (MB, MS, BE)#218
Lina2689 merged 39 commits intopharmaverse:mainfrom
Gero1999:217-add-mb-ms-be

Conversation

@Gero1999
Copy link
Copy Markdown
Collaborator

@Gero1999 Gero1999 commented Dec 26, 2025

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the admiral codebase remains robust and consistent.
Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the devel branch until you have checked off each task.

Summary of the implementation

The PR introduces new synthetic microbiology datasets to the package, specifically adding the Biospecimen Events (BE), Microbiology Findings (MB), and Microbiology Susceptibility (MS) SDTM domains.

All the microbiology data utilized is in mb.R as a nested list (study_microb_data). The nested list structure follows the material process on how the data was collected in the lab (patient > visit for sample collection > aliquoting > culture > MB & MS tests). In order to standardize and simplify the generation of test results for MB and MS, helper functions have been created and can be used to generate new MB/MS test results inside study_microb_data.

In mb.R a nested loop will read study_microb_data and derive from it the corresponding SDTM variables for each domain (MB, MS, BE). This ensures that all variables across domains are correctly linked. This file is then sourced in ms.R & be.R to get the microbiology data. Each file is also responsible of ordering and labelling its own domain variables (MB - mb.R, MS - ms.R, BE - be.R), as well as of saving the object in data/.

Checklist

  • Place Closes Feature Request: Add Microbiology domains (MB, MS) #217 into the beginning of your Pull Request Title (Use Edit button in top-right if you need to update)
  • Code is formatted according to the tidyverse style guide. Run styler::style_file() to style R and Rmd files
  • Updated relevant unit tests or have written new unit tests, which should consider realistic data scenarios and edge cases, e.g. empty datasets, errors, boundary cases etc. - See Unit Test Guide
  • If you removed/replaced any function and/or function parameters, did you fully follow the deprecation guidance?
  • Update to all relevant roxygen headers and examples, including keywords and families. Refer to the categorization of functions to tag appropriate keyword/family.
  • Run devtools::document() so all .Rd files in the man folder and the NAMESPACE file in the project root are updated appropriately
  • Address any updates needed for vignettes and/or templates
  • Update NEWS.md if the changes pertain to a user-facing function (i.e. it has an @export tag) or documentation aimed at users (rather than developers)
  • Build admiral site pkgdown::build_site() and check that all affected examples are displayed correctly and that all new functions occur on the "Reference" page.
  • Address or fix all lintr warnings and errors - lintr::lint_package()
  • Run R CMD check locally and address all errors and warnings - devtools::check()
  • Link the issue in the Development Section on the right hand side.
  • Address all merge conflicts and resolve appropriately
  • Pat yourself on the back for a job well done! Much love to your accomplishment!

Gero1999 and others added 29 commits December 20, 2025 22:47
@Gero1999 Gero1999 marked this pull request as ready for review December 26, 2025 14:28
Gero1999 and others added 3 commits January 6, 2026 08:58
Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>
Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@Fanny-Gautier Fanny-Gautier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job on your first contribution! Please find some additional comments to help finalize this PR.
Please apply labels as per the comments and IG 3.4, and use the existing STUDYID / USUBJID variables from pharmaversesdtm::dm.
Also, run the following commands in the console to fix CI/CD checks for Code Style and Spelling:

  • styler::style_file() e.g. styler::style_file("data-raw/mb.R")
  • spelling::update_wordlist()

@Gero1999 Gero1999 requested a review from Fanny-Gautier January 6, 2026 19:08
Copy link
Copy Markdown
Collaborator

@Fanny-Gautier Fanny-Gautier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments to implement, and it should be ready to merge. I’ll leave the data content review to Gordon.

USUBJID = usubjid,
BESEQ = beseq,
BEREFID = specid,
BELNKID = NA,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected to be always missing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this collection BE process yes. But is my bad, because I forgot to change it later for cultured samples, where it should be culture_id (that way linking with MBLNKGRP and MSLNKID). I changed it accordingly

data-raw/mb.R Outdated
MBTESTCD = "Microbiology Test or Finding Short Name",
MBTEST = "Microbiology Test or Finding Name",
MBTSTDTL = "Measurement, Test or Examination Detail",
MBORRES = "Original Result",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement label as per IG

R/mb.R Outdated
#' \item{MBTESTCD}{Microbiology Test or Finding Short Name}
#' \item{MBTEST}{Microbiology Test or Finding Name}
#' \item{MBTSTDTL}{Measurement, Test or Examination Detail}
#' \item{MBORRES}{Original Result}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label not as per IG, please update as per comment in mb.R.

… BELNKID

Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@Lina2689 Lina2689 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gero1999 Thanks for adding the microbiology-related SDTM domains. Please update the _pkgdown.yml file for the reference page grouping


# Extraction Loop: Build BE, MB, MS Domains ----
dm <- pharmaversesdtm::dm
studyid <- unique(dm$STUDYID)[1]
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @Fanny-Gautier regarding this I hope is ok if I go back to a made-up name (e.g, "XYZ", similar to the nomenclature used in dm_vaccine). I am just a bit concerned that the name can confuse someone and make them think that this dataset comes from the CDISCPILOT01

Copy link
Copy Markdown
Collaborator

@Fanny-Gautier Fanny-Gautier Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave it as CDISCPILOT01, all extension packages are created with this STUDYID, except {admiralvaccine}. @arjoon-r do you know why {admiralvaccine} uses ABC as the STUDYID variable?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess it is because while the other domains were derived directly from those datasets (see ae_ophta), these ones were created from 0 like mine (see is_vaccine).

Perhaps it makes sense indeed to not use the same names? I would personally strongly prefer it

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gero1999 I think it's the vaccine one that needs updating, not yours.

Background: a key idea behind pharmaversesdtm is that any subset of the datasets available can be used as a pretend "study" to create test ADaMs etc. As such we need consistency because the SDTM datasets could merged by key vars (STUDYID, USUBJID), in the process of constructing ADaMs (think for instance of creating ADMB from MB, which would require at the very least a merge of ADSL and your MB).

I wouldn't worry about the confusion, this package is quite established in industry as test data and you can document the source of each dataset in the dataset roxygen.

@Lina2689
Copy link
Copy Markdown
Collaborator

@Gero1999, Please finish this PR so that it can be merged before the planned release at the end of March.

@Gero1999 Gero1999 requested a review from Fanny-Gautier March 17, 2026 07:33
@Gero1999
Copy link
Copy Markdown
Collaborator Author

hey @Lina2689 I think from my side is all done, I may just miss:

  • A positive review on the general code from @Fanny-Gautier, who I think may be close to it. She gave me already a lot of feedback back and forth (thanks a lot again!)

  • A positive review on the dataset content. I think the original intention was to get it from @millerg23, but I suspect he is not currently active on GitHub

  • As you mentioned, I added the section to _pkgdown.yaml (see my last commit and thanks for the review!)

Let me know in any case if there is something I missed to do or that I can help with to accelerate the process!

@Lina2689
Copy link
Copy Markdown
Collaborator

hey @Lina2689 I think from my side is all done, I may just miss:

  • A positive review on the general code from @Fanny-Gautier, who I think may be close to it. She gave me already a lot of feedback back and forth (thanks a lot again!)
  • A positive review on the dataset content. I think the original intention was to get it from @millerg23, but I suspect he is not currently active on GitHub
  • As you mentioned, I added the section to _pkgdown.yaml (see my last commit and thanks for the review!)

Let me know in any case if there is something I missed to do or that I can help with to accelerate the process!

Thanks @Gero1999 for the updates. @Fanny-Gautier and @manciniedoardo, please approve if everything looks good to you.

@Lina2689 Lina2689 requested a review from manciniedoardo March 17, 2026 10:58
Copy link
Copy Markdown
Collaborator

@manciniedoardo manciniedoardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Gero1999 looks awesome!

@Lina2689 did you want to add @Gero1999 to the list of authors in the DESCRIPTIONS? This is a big PR!

@Lina2689
Copy link
Copy Markdown
Collaborator

Thanks @Gero1999 looks awesome!

@Lina2689 did you want to add @Gero1999 to the list of authors in the DESCRIPTIONS? This is a big PR!

@manciniedoardo His name has already been added to the authors' list in one of the previous PRs.

@Gero1999
Copy link
Copy Markdown
Collaborator Author

Thanks for the reviews to everyone! @Fanny-Gautier if you approve feel also free to merge, as I may not be able due to the branch restrictions

Copy link
Copy Markdown
Collaborator

@Fanny-Gautier Fanny-Gautier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement as necessary, otherwise this is ok to merge after confirmation of below comments. Thank you.

_pkgdown.yml Outdated
contents:
- has_keyword("vaccine")

- title: "Microbiology Datasets"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want Microbiology at the bottom of the Reference page or do we want to order alphabetically the TAs?

Image

MSTESTCD,
MSTEST,
MSAGENT,
MSCONC,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are MSCONCand MSCONCU expected to be always missing?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did not realize but all methods I provided are quantitative for testing (EPSILOMETER, DISK DIFFUSION, NUCLEIC ACID AMPLIFICATION TEST) and they do not have MSCONC/U specifications on the SDTMIG examples.

Perhaps it is of interest in the future to include a new function to generate other MS method that has MSCONC (e.g. MACRO BROTH DILUTION) and we can create an issue? I leave it at your criteria :)

@Gero1999 Gero1999 requested a review from Fanny-Gautier March 17, 2026 20:51
Copy link
Copy Markdown
Collaborator

@Fanny-Gautier Fanny-Gautier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thank you!

@Gero1999
Copy link
Copy Markdown
Collaborator Author

Perfect! Feel any of you free to merge 😉

@Lina2689 Lina2689 merged commit d868fcf into pharmaverse:main Mar 20, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Microbiology domains (MB, MS)

4 participants