Closes #217: Add microbiology related SDTM (MB, MS, BE)#218
Closes #217: Add microbiology related SDTM (MB, MS, BE)#218Lina2689 merged 39 commits intopharmaverse:mainfrom
Conversation
Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>
Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>
There was a problem hiding this comment.
Great job on your first contribution! Please find some additional comments to help finalize this PR.
Please apply labels as per the comments and IG 3.4, and use the existing STUDYID / USUBJID variables from pharmaversesdtm::dm.
Also, run the following commands in the console to fix CI/CD checks for Code Style and Spelling:
styler::style_file()e.g.styler::style_file("data-raw/mb.R")spelling::update_wordlist()
Fanny-Gautier
left a comment
There was a problem hiding this comment.
A few comments to implement, and it should be ready to merge. I’ll leave the data content review to Gordon.
| USUBJID = usubjid, | ||
| BESEQ = beseq, | ||
| BEREFID = specid, | ||
| BELNKID = NA, |
There was a problem hiding this comment.
Is it expected to be always missing?
There was a problem hiding this comment.
For this collection BE process yes. But is my bad, because I forgot to change it later for cultured samples, where it should be culture_id (that way linking with MBLNKGRP and MSLNKID). I changed it accordingly
data-raw/mb.R
Outdated
| MBTESTCD = "Microbiology Test or Finding Short Name", | ||
| MBTEST = "Microbiology Test or Finding Name", | ||
| MBTSTDTL = "Measurement, Test or Examination Detail", | ||
| MBORRES = "Original Result", |
There was a problem hiding this comment.
Please implement label as per IG
R/mb.R
Outdated
| #' \item{MBTESTCD}{Microbiology Test or Finding Short Name} | ||
| #' \item{MBTEST}{Microbiology Test or Finding Name} | ||
| #' \item{MBTSTDTL}{Measurement, Test or Examination Detail} | ||
| #' \item{MBORRES}{Original Result} |
There was a problem hiding this comment.
Label not as per IG, please update as per comment in mb.R.
… BELNKID Co-authored-by: Fanny Gautier <157114584+Fanny-Gautier@users.noreply.github.com>
|
|
||
| # Extraction Loop: Build BE, MB, MS Domains ---- | ||
| dm <- pharmaversesdtm::dm | ||
| studyid <- unique(dm$STUDYID)[1] |
There was a problem hiding this comment.
hey @Fanny-Gautier regarding this I hope is ok if I go back to a made-up name (e.g, "XYZ", similar to the nomenclature used in dm_vaccine). I am just a bit concerned that the name can confuse someone and make them think that this dataset comes from the CDISCPILOT01
There was a problem hiding this comment.
You can leave it as CDISCPILOT01, all extension packages are created with this STUDYID, except {admiralvaccine}. @arjoon-r do you know why {admiralvaccine} uses ABC as the STUDYID variable?
There was a problem hiding this comment.
I would guess it is because while the other domains were derived directly from those datasets (see ae_ophta), these ones were created from 0 like mine (see is_vaccine).
Perhaps it makes sense indeed to not use the same names? I would personally strongly prefer it
There was a problem hiding this comment.
@Gero1999 I think it's the vaccine one that needs updating, not yours.
Background: a key idea behind pharmaversesdtm is that any subset of the datasets available can be used as a pretend "study" to create test ADaMs etc. As such we need consistency because the SDTM datasets could merged by key vars (STUDYID, USUBJID), in the process of constructing ADaMs (think for instance of creating ADMB from MB, which would require at the very least a merge of ADSL and your MB).
I wouldn't worry about the confusion, this package is quite established in industry as test data and you can document the source of each dataset in the dataset roxygen.
|
@Gero1999, Please finish this PR so that it can be merged before the planned release at the end of March. |
|
hey @Lina2689 I think from my side is all done, I may just miss:
Let me know in any case if there is something I missed to do or that I can help with to accelerate the process! |
Thanks @Gero1999 for the updates. @Fanny-Gautier and @manciniedoardo, please approve if everything looks good to you. |
@manciniedoardo His name has already been added to the authors' list in one of the previous PRs. |
|
Thanks for the reviews to everyone! @Fanny-Gautier if you approve feel also free to merge, as I may not be able due to the branch restrictions |
Fanny-Gautier
left a comment
There was a problem hiding this comment.
Please implement as necessary, otherwise this is ok to merge after confirmation of below comments. Thank you.
_pkgdown.yml
Outdated
| contents: | ||
| - has_keyword("vaccine") | ||
|
|
||
| - title: "Microbiology Datasets" |
| MSTESTCD, | ||
| MSTEST, | ||
| MSAGENT, | ||
| MSCONC, |
There was a problem hiding this comment.
Are MSCONCand MSCONCU expected to be always missing?
There was a problem hiding this comment.
Yes, I did not realize but all methods I provided are quantitative for testing (EPSILOMETER, DISK DIFFUSION, NUCLEIC ACID AMPLIFICATION TEST) and they do not have MSCONC/U specifications on the SDTMIG examples.
Perhaps it is of interest in the future to include a new function to generate other MS method that has MSCONC (e.g. MACRO BROTH DILUTION) and we can create an issue? I leave it at your criteria :)
|
Perfect! Feel any of you free to merge 😉 |

Summary of the implementation
The PR introduces new synthetic microbiology datasets to the package, specifically adding the Biospecimen Events (BE), Microbiology Findings (MB), and Microbiology Susceptibility (MS) SDTM domains.
All the microbiology data utilized is in
mb.Ras a nested list (study_microb_data). The nested list structure follows the material process on how the data was collected in the lab (patient>visit for sample collection>aliquoting>culture>MB & MS tests). In order to standardize and simplify the generation of test results for MB and MS, helper functions have been created and can be used to generate new MB/MS test results insidestudy_microb_data.In
mb.Ra nested loop will readstudy_microb_dataand derive from it the corresponding SDTM variables for each domain (MB, MS, BE). This ensures that all variables across domains are correctly linked. This file is then sourced inms.R&be.Rto get the microbiology data. Each file is also responsible of ordering and labelling its own domain variables (MB - mb.R,MS - ms.R,BE - be.R), as well as of saving the object indata/.Checklist
styler::style_file()to style R and Rmd filesdevtools::document()so all.Rdfiles in themanfolder and theNAMESPACEfile in the project root are updated appropriatelyNEWS.mdif the changes pertain to a user-facing function (i.e. it has an@exporttag) or documentation aimed at users (rather than developers)pkgdown::build_site()and check that all affected examples are displayed correctly and that all new functions occur on the "Reference" page.lintr::lint_package()R CMD checklocally and address all errors and warnings -devtools::check()