Closes #217: Add microbiology related SDTM (MB, MS, BE) by Gero1999 · Pull Request #218 · pharmaverse/pharmaversesdtm

Gero1999 · 2025-12-26T11:40:53Z

Thank you for your Pull Request! We have developed this task checklist from the Development Process Guide to help with the final steps of the process. Completing the below tasks helps to ensure our reviewers can maximize their time on your code as well as making sure the admiral codebase remains robust and consistent.
Please check off each taskbox as an acknowledgment that you completed the task or check off that it is not relevant to your Pull Request. This checklist is part of the Github Action workflows and the Pull Request will not be merged into the devel branch until you have checked off each task.

Summary of the implementation

The PR introduces new synthetic microbiology datasets to the package, specifically adding the Biospecimen Events (BE), Microbiology Findings (MB), and Microbiology Susceptibility (MS) SDTM domains.

All the microbiology data utilized is in mb.R as a nested list (study_microb_data). The nested list structure follows the material process on how the data was collected in the lab (patient > visit for sample collection > aliquoting > culture > MB & MS tests). In order to standardize and simplify the generation of test results for MB and MS, helper functions have been created and can be used to generate new MB/MS test results inside study_microb_data.

In mb.R a nested loop will read study_microb_data and derive from it the corresponding SDTM variables for each domain (MB, MS, BE). This ensures that all variables across domains are correctly linked. This file is then sourced in ms.R & be.R to get the microbiology data. Each file is also responsible of ordering and labelling its own domain variables (MB - mb.R, MS - ms.R, BE - be.R), as well as of saving the object in data/.

Checklist

Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>

Fanny-Gautier

Great job on your first contribution! Please find some additional comments to help finalize this PR.
Please apply labels as per the comments and IG 3.4, and use the existing STUDYID / USUBJID variables from pharmaversesdtm::dm.
Also, run the following commands in the console to fix CI/CD checks for Code Style and Spelling:

styler::style_file() e.g. styler::style_file("data-raw/mb.R")
spelling::update_wordlist()

R/mb.R

data-raw/mb.R

Fanny-Gautier

A few comments to implement, and it should be ready to merge. I’ll leave the data content review to Gordon.

Fanny-Gautier · 2026-01-19T16:04:20Z

data-raw/mb.R

+      USUBJID = usubjid,
+      BESEQ = beseq,
+      BEREFID = specid,
+      BELNKID = NA,


Is it expected to be always missing?

For this collection BE process yes. But is my bad, because I forgot to change it later for cultured samples, where it should be culture_id (that way linking with MBLNKGRP and MSLNKID). I changed it accordingly

Fanny-Gautier · 2026-01-19T16:06:13Z

data-raw/mb.R

+    MBTESTCD = "Microbiology Test or Finding Short Name",
+    MBTEST = "Microbiology Test or Finding Name",
+    MBTSTDTL = "Measurement, Test or Examination Detail",
+    MBORRES = "Original Result",


Please implement label as per IG

data-raw/mb.R

Fanny-Gautier · 2026-01-19T16:11:36Z

R/mb.R

+#'     \item{MBTESTCD}{Microbiology Test or Finding Short Name}
+#'     \item{MBTEST}{Microbiology Test or Finding Name}
+#'     \item{MBTSTDTL}{Measurement, Test or Examination Detail}
+#'     \item{MBORRES}{Original Result}


Label not as per IG, please update as per comment in mb.R.

… BELNKID Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>

Lina2689

@Gero1999 Thanks for adding the microbiology-related SDTM domains. Please update the _pkgdown.yml file for the reference page grouping

Gero1999 · 2026-02-10T13:54:04Z

data-raw/mb.R

+
+# Extraction Loop: Build BE, MB, MS Domains ----
+dm <- pharmaversesdtm::dm
+studyid <- unique(dm$STUDYID)[1]


hey @Fanny-Gautier regarding this I hope is ok if I go back to a made-up name (e.g, "XYZ", similar to the nomenclature used in dm_vaccine). I am just a bit concerned that the name can confuse someone and make them think that this dataset comes from the CDISCPILOT01

You can leave it as CDISCPILOT01, all extension packages are created with this STUDYID, except {admiralvaccine}. @arjoon-r do you know why {admiralvaccine} uses ABC as the STUDYID variable?

I would guess it is because while the other domains were derived directly from those datasets (see ae_ophta), these ones were created from 0 like mine (see is_vaccine).

Perhaps it makes sense indeed to not use the same names? I would personally strongly prefer it

@Gero1999 I think it's the vaccine one that needs updating, not yours.

Background: a key idea behind pharmaversesdtm is that any subset of the datasets available can be used as a pretend "study" to create test ADaMs etc. As such we need consistency because the SDTM datasets could merged by key vars (STUDYID, USUBJID), in the process of constructing ADaMs (think for instance of creating ADMB from MB, which would require at the very least a merge of ADSL and your MB).

I wouldn't worry about the confusion, this package is quite established in industry as test data and you can document the source of each dataset in the dataset roxygen.

Lina2689 · 2026-03-17T06:57:23Z

@Gero1999, Please finish this PR so that it can be merged before the planned release at the end of March.

Gero1999 · 2026-03-17T07:40:21Z

hey @Lina2689 I think from my side is all done, I may just miss:

A positive review on the general code from @Fanny-Gautier, who I think may be close to it. She gave me already a lot of feedback back and forth (thanks a lot again!)
A positive review on the dataset content. I think the original intention was to get it from @millerg23, but I suspect he is not currently active on GitHub
As you mentioned, I added the section to _pkgdown.yaml (see my last commit and thanks for the review!)

Let me know in any case if there is something I missed to do or that I can help with to accelerate the process!

Lina2689 · 2026-03-17T09:43:59Z

hey @Lina2689 I think from my side is all done, I may just miss:

A positive review on the general code from @Fanny-Gautier, who I think may be close to it. She gave me already a lot of feedback back and forth (thanks a lot again!)

A positive review on the dataset content. I think the original intention was to get it from @millerg23, but I suspect he is not currently active on GitHub

As you mentioned, I added the section to _pkgdown.yaml (see my last commit and thanks for the review!)

Let me know in any case if there is something I missed to do or that I can help with to accelerate the process!

Thanks @Gero1999 for the updates. @Fanny-Gautier and @manciniedoardo, please approve if everything looks good to you.

manciniedoardo

Thanks @Gero1999 looks awesome!

@Lina2689 did you want to add @Gero1999 to the list of authors in the DESCRIPTIONS? This is a big PR!

Lina2689 · 2026-03-17T11:12:10Z

Thanks @Gero1999 looks awesome!

@Lina2689 did you want to add @Gero1999 to the list of authors in the DESCRIPTIONS? This is a big PR!

@manciniedoardo His name has already been added to the authors' list in one of the previous PRs.

Gero1999 · 2026-03-17T11:31:36Z

Thanks for the reviews to everyone! @Fanny-Gautier if you approve feel also free to merge, as I may not be able due to the branch restrictions

Fanny-Gautier

Please implement as necessary, otherwise this is ok to merge after confirmation of below comments. Thank you.

Fanny-Gautier · 2026-03-17T16:32:23Z

_pkgdown.yml

    contents:
      - has_keyword("vaccine")

+  - title: "Microbiology Datasets"


Do we want Microbiology at the bottom of the Reference page or do we want to order alphabetically the TAs?

Fanny-Gautier · 2026-03-17T16:36:46Z

data-raw/ms.R

+    MSTESTCD,
+    MSTEST,
+    MSAGENT,
+    MSCONC,


Are MSCONCand MSCONCU expected to be always missing?

Yes, I did not realize but all methods I provided are quantitative for testing (EPSILOMETER, DISK DIFFUSION, NUCLEIC ACID AMPLIFICATION TEST) and they do not have MSCONC/U specifications on the SDTMIG examples.

Perhaps it is of interest in the future to include a new function to generate other MS method that has MSCONC (e.g. MACRO BROTH DILUTION) and we can create an issue? I leave it at your criteria :)

Fanny-Gautier

LGTM, Thank you!

Gero1999 · 2026-03-18T16:56:23Z

Perfect! Feel any of you free to merge 😉

Gero1999 and others added 29 commits December 20, 2025 22:47

add MS domain example (ms.R)

dd96ad3

add ms.rda

596cdc0

ms: order columns and add labels

4aca7dd

fix: env issues ms.R

093edc4

ms: limit samples, treatments and change units

a0fb2ff

mb: define MB domain based on MS domain info (ms)

e719e91

mb: define labels

ec70bf8

ms & mb: refine REFID and GRPID, rm from ms NHOID (organism)

317b96d

mb: remake domain defining organisms and samples

03f890d

ms: derive MS domain from MB domain (mb.R)

23a228c

ms: derive MS domain from MB domain (mb.R)

66ab626

create a common source of truth for BE, MB and MS domains

0c428d8

use nested list and looping to create be, mb, and ms

6fd98da

refactor: standardize funs, loop & list

2addb33

fix: typo MSORRES

21e2c23

arrange metadata final preparation in 3 files: mb.R, ms.R, be.R

fcd4ce7

run files and add metadata to /data (mb, be, ms)

915fd20

specs.json: add as "microbiology" therapeutic area for MB, MS and BE

b9093d4

codeowners: add datasets mb, ms, be in "others"

3923c90

style_file: ms.R, mb.R

38d1e73

run create_sdtms_data.R: generates be, mb, ms docs

b1ca4ab

run create_sdtms_data.R: unexpected udpated files (dm_neuro, nv_neuro)

e8c1c8e

news: inform of new microbiology datasets

2872fb1

rm: unneded R file

4b21746

Merge branch 'main' into 217-add-mb-ms-be

42c4229

spelling: correct mispelling (Measuremet -> Measurement)

55a8e35

spelling: update WORDLIST

e0d65d9

fix: potential issue with styler

de745d6

styler: fix check using last pkg version

1f3eaad

Gero1999 marked this pull request as ready for review December 26, 2025 14:28

Gero1999 and others added 3 commits January 6, 2026 08:58

Apply suggestions from code review

574abeb

Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>

apply suggestions to mb.R & update ms data/documentation

763c93f

Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>

change comments style based on admiraldev style guide

04d34ad

Fanny-Gautier requested changes Jan 6, 2026

View reviewed changes

R/mb.R Show resolved Hide resolved

data-raw/mb.R Show resolved Hide resolved

data-raw/mb.R Outdated Show resolved Hide resolved

data-raw/mb.R Outdated Show resolved Hide resolved

Gero1999 added 2 commits January 6, 2026 19:43

use STUDYID & USUBJIDs from pharmaversesdtm::dm

785d5ac

fix: mispelling Micoba(c)terium, add to WORDLIST & space with styler

2e34909

Gero1999 requested a review from Fanny-Gautier January 6, 2026 19:08

Merge branch 'main' into 217-add-mb-ms-be

c698f5e

Fanny-Gautier requested changes Jan 19, 2026

View reviewed changes

Fanny-Gautier requested a review from Lina2689 January 19, 2026 16:15

update mb: apply MBORRES correct label & populate cultured BE samples…

73e9e31

… BELNKID Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>

Lina2689 reviewed Jan 21, 2026

View reviewed changes

Add Microbiology Datasets section to pkgdown

062c3b1

Gero1999 mentioned this pull request Jan 31, 2026

Closes #146: Interactive exploration of datasets vignette in website #227

Merged

14 tasks

Gero1999 commented Feb 10, 2026

View reviewed changes

Gero1999 requested a review from Fanny-Gautier March 17, 2026 07:33

Lina2689 approved these changes Mar 17, 2026

View reviewed changes

Lina2689 requested a review from manciniedoardo March 17, 2026 10:58

manciniedoardo approved these changes Mar 17, 2026

View reviewed changes

Fanny-Gautier reviewed Mar 17, 2026

View reviewed changes

mv "Microbiology Datasets" after "Metabolism Datasets" in _pkgdown.yml

c5bb3f0

Gero1999 requested a review from Fanny-Gautier March 17, 2026 20:51

Fanny-Gautier approved these changes Mar 18, 2026

View reviewed changes

Lina2689 merged commit d868fcf into pharmaverse:main Mar 20, 2026
16 checks passed

Conversation

Gero1999 commented Dec 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of the implementation

Checklist

Uh oh!

Fanny-Gautier left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fanny-Gautier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lina2689 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fanny-Gautier Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lina2689 commented Mar 17, 2026

Uh oh!

Gero1999 commented Mar 17, 2026

Uh oh!

Lina2689 commented Mar 17, 2026

Uh oh!

manciniedoardo left a comment

Choose a reason for hiding this comment

Uh oh!

Lina2689 commented Mar 17, 2026

Uh oh!

Gero1999 commented Mar 17, 2026

Uh oh!

Fanny-Gautier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fanny-Gautier left a comment

Choose a reason for hiding this comment

Uh oh!

Gero1999 commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Gero1999 commented Dec 26, 2025 •

edited

Loading

Fanny-Gautier left a comment •

edited

Loading

Fanny-Gautier Feb 10, 2026 •

edited

Loading