-
Couldn't load subscription status.
- Fork 14
Restructured project: Moved files into case_studies/CARD, removed red… #107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5ef0a1d to
cfc7bf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this looks a lot better! I see that the check is passing, too, which makes sense considering that the package itself is unchanged. I'll have to rely on someone else to review the R code in detail, but it looks reasonable to me.
If this PR replaces #103, I'd suggest closing that one without merging to reduce the potential for confusion.
|
Noted. Thank you for the help🙏
…On Tue, 22 Oct 2024, 01:22 Faisal Alquaddoomi, ***@***.***> wrote:
***@***.**** commented on this pull request.
IMHO this looks a lot better! I see that the check is passing, too, which
makes sense considering that the package itself is unchanged. I'll have to
rely on someone else to review the R code in detail, but it looks
reasonable to me.
If this PR replaces #103 <#103>,
I'd suggest closing that one without merging to reduce the potential for
confusion.
—
Reply to this email directly, view it on GitHub
<#107 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BIQBLEIJY6OKJSQMEG6WSS3Z4V5DBAVCNFSM6AAAAABQIR5UWWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGOBTGQYDMNZUGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better!
If this script is meant to be generalized so you can enter the bug/drug combo of interest and get the results out, I think it needs some refactoring as I mention in line-by-line comments, but for this specific bug/drug combo, it looks like it works all right. If we want to expand this to be generalizable, I suggest we can merge this script and add a new issue to generalize it.
Another suggestion is actually to drop most of the CARD_data files from being included in this PR, though. Only aro_index.tsv, shortname_antibiotics.tsv, and shortname_pathogens.tsv are used from the set, so no point in adding the much larger .fasta sequence files to the PR as well, at least unless you want to refactor much more broadly to search those files for your sequences instead of using rentrez. That would be a major rewrite, and isn't what I'm suggesting here, so I'd just omit the extra CARD data files.
case_studies/CARD/Bug-Drug Code.R
Outdated
|
|
||
|
|
||
| # Mutate data | ||
| aro_index <- aro_index %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step yields a lot of NA values, so just clarifying that that's intended. Multi-pathogen genes like aadA or acrB don't parse into the "pathogen / gene / drug" pattern successfully, so you have things like
pathogen,gene,drug
Abau,ampC,BLA
Abau,Abaf,NA
acrB, NA, NA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest you can let go of the files like nucleotide...fasta, protein..fasta that you don't need for your R script.
|
Hey @Cateline, for future reference you can delete multiple files in a single commit; you don't need to create one commit per deletion. |
Oh, okay. Didn't know that. I was using the Github API to delete them |
Change combined FASTA sequences file name Co-authored-by: Evan Pierce Brenner <[email protected]>
Co-authored-by: Evan Pierce Brenner <[email protected]>
|
Agree with the broad comments of:
Any other outstanding Qs to fix/merge this issue? If so, I can look at it more carefully later this week. Thanks, @Cateline ! |
Updated package loading to use require() for conditional installation. Renamed fasta file and removed redundant lines (35-44). Removed decompression step along with renaming of the zipped file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the contribution, @Cateline. With feedback from @AbhirupaGhosh and others, I think this should be ready to go soon in an iteration or more. how are you planning on extending it beyond Saur and DAP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Cateline, thanks for adding this README. Out of curiosity, are these descriptions already paraphrased from the original source (CARD), or yet to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The descriptions are from the original source (CARD) and have not been paraphrased yet
Co-authored-by: Janani Ravi <[email protected]>
Co-authored-by: Janani Ravi <[email protected]>
Co-authored-by: Janani Ravi <[email protected]>
Co-authored-by: Janani Ravi <[email protected]>
Co-authored-by: Janani Ravi <[email protected]>
…mycin_sequences.fasta
Changed data import function from read.delim to read_delim
- Standardize 'Protein_Accession' naming conventions - Switch 'sapply' to 'purrr::map' functions - Rename 'aro_index' to 'resistance_profile' for better context - Use explicit column names instead of positional arguments where applicable
Update `extract_card_info` function to correctly categorize complex gene entries
-Improve merging process between extracted resistance profile data, antibiotics data, and pathogens data -Add logic to handle multi-species pathogens and multi-class drugs
- Implement `fetch_fasta_sequence` to retrieve FASTA sequences from Entrez using protein accession IDs. - Add loop to iterate over `filtered_data`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving minor comments. Defer to @AbhirupaGhosh @epbrenner & FA/DM for a full review.
| gene <- NA | ||
| drug <- drug_class # Default to Drug Class column | ||
|
|
||
| # Determine the information based on the split names and patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you share an example file (snippet pre and post name cleanup)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you share an example file (snippet pre and post name cleanup)?
Hello @jananiravi , by this do you mean I should use the View() function in R to allow for the visual inspection of the dataset before and after processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant snapshots or example data stored locally (as part of the commit) to be able to run the code and check locally.
Co-authored-by: Janani Ravi <[email protected]>
Co-authored-by: Janani Ravi <[email protected]>
Fixed tar file extraction by adding .bz2 suffix to line 16 for proper file handling
Update file paths for antibiotics and pathogens data in Lines 89-90 for proper loading.
… options for multiclass exclusion and species restriction This update introduces a filter_resistance_mechanisms function with customizable options for partial drug matches, exclusion of multiclass resistance, and species-specific filtering.
-Compared original dataset (`aro_index.tsv`) with cleaned dataset (`resistance_profile_data.tsv`). -Saved snippets of the pre-cleanup data (`aro_index.tsv`) and post-cleanup data (`resistance_profile_data.tsv`) for comparison.
Expanded Bug-Drug.R code to retrieve and save FASTA sequences for ESKAPE pathogens resistant to DAP (Daptomycin)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| gene <- NA | ||
| drug <- drug_class # Default to Drug Class column | ||
|
|
||
| # Determine the information based on the split names and patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant snapshots or example data stored locally (as part of the commit) to be able to run the code and check locally.
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Loop through each Protein Accession in the filtered data to fetch sequences | ||
| for (i in 1:nrow(filtered_data_saurdap)) { | ||
| # Get the Protein Accession ID | ||
| Protein_accession <- filtered_data_saurdap$Protein_Accession[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing alternating use of Protein_ vs. protein_accession. 🤔
|
|
||
|
|
||
| # Define the output file for the FASTA sequences | ||
| output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using short names for species (4-char) and drugs (antibiotics, 3-char).
arg = antibiotic resistance genes, for example.
Which shortnames are you planning to use?
cc: @AbhirupaGhosh @charmvang @awasyn @epbrenner
| output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta" | |
| output_fasta_file <- "Saur_dap_arg.fasta" |
|
|
||
|
|
||
| # Extract pathogen, gene, drug, and include Protein.Accession from 'CARD Short Name' | ||
| extract_card_info <- function(card_short_name, drug_class, `Protein Accession`, `DNA Accession`) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename all colnames with spaces and special characters to now include only _. Also avoid multiple cases.
@AbhirupaGhosh @charmvang @awasyn @epbrenner @the-mayer -- using camelCase for colnames or snake_case (without caps)?
| @@ -0,0 +1,76 @@ | |||
| AAC Abbreviation Molecule | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use this for short nomenclature, e.g., spp_dru_...
| # Install and Load dplyr and readr | ||
| packages <- c("dplyr", "readr") | ||
|
|
||
| for (pkg in packages) { | ||
| if (!require(pkg, character.only = TRUE)) { | ||
| install.packages(pkg) | ||
| library(pkg, character.only = TRUE) | ||
| } else { | ||
| library(pkg, character.only = TRUE) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed?
|
|
||
| # View the pre-cleanup snippet | ||
| View(aro_index_snippet) | ||
|
|
||
| # View the post-cleanup snippet | ||
| View(resistance_profile_data_snippet) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this was used to look through the dataset -- e.g., with glimpse. But I meant actual example input/output data to run and check.
|
Thank you for your review
I will definitely get in touch with @awasyn
…On Tue, 26 Nov 2024, 03:24 Janani Ravi, ***@***.***> wrote:
***@***.**** commented on this pull request.
Added some quick thoughts. Also, look through recent PRs from @awasyn
<https://github.com/awasyn> for cross-checks. Feel free to add a code
review for that PR as well (related to CARD #111
<#111>).
------------------------------
In case_studies/CARD/Bug-Drug Code.R
<#107 (comment)>:
> +
+
+
+
+
+
+
+
+
+
+
⬇️ Suggested change
-
-
-
-
-
-
-
-
-
-
-
------------------------------
In case_studies/CARD/Bug-Drug Code.R
<#107 (comment)>:
> + warning(paste("Error fetching FASTA sequence for protein accession:", protein_accession, ":", e$message))
+ return(NULL)
+ })
+}
+
+
+# Define the output file for the FASTA sequences
+output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta"
+
+# Initialize an empty character vector to store the sequences
+combined_sequences <- character()
+
+# Loop through each Protein Accession in the filtered data to fetch sequences
+for (i in 1:nrow(filtered_data_saurdap)) {
+ # Get the Protein Accession ID
+ Protein_accession <- filtered_data_saurdap$Protein_Accession[i]
Confusing alternating use of Protein_ vs. protein_accession. 🤔
------------------------------
In case_studies/CARD/Bug-Drug Code.R
<#107 (comment)>:
> + fasta_seq <- paste(lines, collapse = "\n")
+
+ return(fasta_seq)
+ } else {
+ warning(paste("Failed to retrieve FASTA sequence for protein accession:", protein_accession))
+ return(NULL)
+ }
+ }, error = function(e) {
+ warning(paste("Error fetching FASTA sequence for protein accession:", protein_accession, ":", e$message))
+ return(NULL)
+ })
+}
+
+
+# Define the output file for the FASTA sequences
+output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta"
If using short names for species (4-char) and drugs (antibiotics, 3-char).
arg = antibiotic resistance genes, for example.
Which shortnames are you planning to use?
cc: @AbhirupaGhosh <https://github.com/AbhirupaGhosh> @charmvang
<https://github.com/charmvang> @awasyn <https://github.com/awasyn>
@epbrenner <https://github.com/epbrenner>
⬇️ Suggested change
-output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta"
+output_fasta_file <- "Saur_dap_arg.fasta"
------------------------------
In case_studies/CARD/Bug-Drug Code.R
<#107 (comment)>:
> +
+# Extract the tar file
+untar("broadstreet-v3.3.0.tar.bz2", exdir = "CARD_data")
+
+
+# Map CARD Short Name
+
+# Parse the required files using readr::read_delim
+aro_index <- read_delim("CARD_data/aro_index.tsv", delim = "\t", col_names = TRUE)
+antibiotics_data <- read_delim("CARD_data/shortname_antibiotics.tsv", delim = "\t", col_names = TRUE)
+pathogens_data <- read_delim("CARD_data/shortname_pathogens.tsv", delim = "\t", col_names = TRUE)
+
+
+
+# Extract pathogen, gene, drug, and include Protein.Accession from 'CARD Short Name'
+extract_card_info <- function(card_short_name, drug_class, `Protein Accession`, `DNA Accession`) {
rename all colnames with spaces and special characters to now include only
_. Also avoid multiple cases.
@AbhirupaGhosh <https://github.com/AbhirupaGhosh> @charmvang
<https://github.com/charmvang> @awasyn <https://github.com/awasyn>
@epbrenner <https://github.com/epbrenner> @the-mayer
<https://github.com/the-mayer> -- using camelCase for colnames or
snake_case (without caps)?
------------------------------
In case_studies/CARD/Bug-Drug Code.R
<#107 (comment)>:
> + gene <- split_names[2]
+ } else if (length(split_names) == 3) {
+ # Pathogen-Gene-Drug scenario
+ pathogen <- split_names[1]
+ gene <- split_names[2]
+ drug <- split_names[3] # Assign drug from the split entry
+ }
+
+ # If both pathogen and gene are NA, classify as complex gene
+ if (is.na(pathogen) && is.na(gene)) {
+ gene <- card_short_name # Assign entire CARD Short Name as gene
+ pathogen <- "MULTI" # Default to MULTI for pathogen
+ }
+
+ # Handle Protein Accession
+ if (is.na(`Protein Accession`) || `Protein Accession` == "") {
if renamed above, there will be no colnames with spaces
------------------------------
In case_studies/CARD/Bug-Drug Code.R
<#107 (comment)>:
> +
+
+library(rentrez)
+library(XML)
+library(stringr)
+
+# Filter for the target drug (DAP) and pathogen (Staphylococcus aureus)
+filter_resistance_mechanisms <- function(data, drug, bug, exclude_multiclass = FALSE, species_restricted = TRUE) {
+
+ # Filter by drug using partial match to include multiclass entries containing the target drug
+ filtered_data <- data %>%
+ filter(grepl(drug, Drug, ignore.case = TRUE))
+
+ # Filter by pathogen, using partial match
+ filtered_data <- filtered_data %>%
+ filter(grepl(bug, Pathogen_Full_Name, ignore.case = TRUE))
if using snake_case, there will be no caps as in Pathogen_Full_Name.
------------------------------
In case_studies/CARD/Bug-Drug Code.R
<#107 (comment)>:
> +combined_sequences <- character()
+
+# Loop through each Protein Accession in the filtered data to fetch sequences
+for (i in 1:nrow(filtered_data_saurdap)) {
+ # Get the Protein Accession ID
+ Protein_accession <- filtered_data_saurdap$Protein_Accession[i]
+
+ cat("Fetching sequence for Protein Accession:", protein_accession, "\n") # Debugging message
+
+ # Fetch the FASTA sequence
+ fasta_sequence <- fetch_fasta_sequence(protein_accession)
+
+ # If the sequence was fetched successfully, add it to the combined_sequences vector
+ if (!is.null(fasta_sequence)) {
+ combined_sequences <- c(combined_sequences, fasta_sequence)
+ cat("Successfully fetched sequence for:", protein_accession, "\n")
Not sure if this is for multiple or single accession numbers. change
accordingly?
⬇️ Suggested change
- cat("Successfully fetched sequence for:", protein_accession, "\n")
+ cat("Successfully fetched sequences for:", protein_accession, "\n")
------------------------------
In case_studies/CARD/CARD_data/CARD-Download-README.txt
<#107 (comment)>:
> @@ -0,0 +1,32 @@
+# CARD README
+
+## Source:
⬇️ Suggested change
-## Source:
+## Source
------------------------------
In case_studies/CARD/CARD_data/CARD-Download-README.txt
<#107 (comment)>:
> @@ -0,0 +1,32 @@
+# CARD README
+
+## Source:
+This dataset was downloaded from the Comprehensive Antibiotic Resistance Database (CARD) in 2024-10 at https://card.mcmaster.ca/download/0/broadstreet-v3.3.0.tar.bz2
⬇️ Suggested change
-This dataset was downloaded from the Comprehensive Antibiotic Resistance Database (CARD) in 2024-10 at https://card.mcmaster.ca/download/0/broadstreet-v3.3.0.tar.bz2
+This dataset and associated README were downloaded from the Comprehensive Antibiotic Resistance Database (CARD) (2024-10) at https://card.mcmaster.ca/download/0/broadstreet-v3.3.0.tar.bz2.
------------------------------
In case_studies/CARD/CARD_data/CARD-Download-README.txt
<#107 (comment)>:
> +prediction at the Comprehensive Antibiotic Resistance Database" Nucleic Acids Research,
+51, D690-D699. https://pubmed.ncbi.nlm.nih.gov/36263822/
+
+## CARD SHORT NAMES
+
+The CARD database uses standardized abbreviations, known as CARD Short Names, for AMR gene names associated with Antibiotic Resistance Ontology terms. These names are created for compatibility across data files and outputs from the Resistance Gene Identifier (RGI). Short Names for genes with 15 or fewer characters retain the original gene name, while longer names are abbreviated to uniquely represent each gene or protein. All CARD Short Names replace whitespace with underscores. For pathogen names, CARD follows the convention of capitalizing the first letter of the genus followed by the first three letters of the species in lowercase. Where applicable, CARD Short Names adopt formats such as “pathogen_gene,” “pathogen_gene_drug,” or “gene_drug.” Full lists of these abbreviations are available in the provided files:
+
+shortname_antibiotics.tsv
+shortname_pathogens.tsv"
+
+
+## FASTA
+
+The FASTA files included here contain retrieved sequences of antimicrobial resistance genes.
+
+## Data Files Downloaded
⬇️ Suggested change
-## Data Files Downloaded
+## Data files downloaded
------------------------------
In case_studies/CARD/CARD_data/CARD-Download-README.txt
<#107 (comment)>:
> +51, D690-D699. https://pubmed.ncbi.nlm.nih.gov/36263822/
+
+## CARD SHORT NAMES
+
+The CARD database uses standardized abbreviations, known as CARD Short Names, for AMR gene names associated with Antibiotic Resistance Ontology terms. These names are created for compatibility across data files and outputs from the Resistance Gene Identifier (RGI). Short Names for genes with 15 or fewer characters retain the original gene name, while longer names are abbreviated to uniquely represent each gene or protein. All CARD Short Names replace whitespace with underscores. For pathogen names, CARD follows the convention of capitalizing the first letter of the genus followed by the first three letters of the species in lowercase. Where applicable, CARD Short Names adopt formats such as “pathogen_gene,” “pathogen_gene_drug,” or “gene_drug.” Full lists of these abbreviations are available in the provided files:
+
+shortname_antibiotics.tsv
+shortname_pathogens.tsv"
+
+
+## FASTA
+
+The FASTA files included here contain retrieved sequences of antimicrobial resistance genes.
+
+## Data Files Downloaded
+aro_index.tsv
⬇️ Suggested change
-aro_index.tsv
+`aro_index.tsv`
------------------------------
In case_studies/CARD/CARD_data/CARD-Download-README.txt
<#107 (comment)>:
> +## CARD SHORT NAMES
+
+The CARD database uses standardized abbreviations, known as CARD Short Names, for AMR gene names associated with Antibiotic Resistance Ontology terms. These names are created for compatibility across data files and outputs from the Resistance Gene Identifier (RGI). Short Names for genes with 15 or fewer characters retain the original gene name, while longer names are abbreviated to uniquely represent each gene or protein. All CARD Short Names replace whitespace with underscores. For pathogen names, CARD follows the convention of capitalizing the first letter of the genus followed by the first three letters of the species in lowercase. Where applicable, CARD Short Names adopt formats such as “pathogen_gene,” “pathogen_gene_drug,” or “gene_drug.” Full lists of these abbreviations are available in the provided files:
+
+shortname_antibiotics.tsv
+shortname_pathogens.tsv"
+
+
+## FASTA
+
+The FASTA files included here contain retrieved sequences of antimicrobial resistance genes.
+
+## Data Files Downloaded
+aro_index.tsv
+This file contains an index of ARO (Antibiotic Resistance Ontology) identifiers with associated GenBank accessions. Each entry includes information used to link antibiotic resistance genes to GenBank sequences.
+shortname_antibiotics.tsv
⬇️ Suggested change
-shortname_antibiotics.tsv
+`shortname_antibiotics.tsv`
------------------------------
In case_studies/CARD/CARD_data/CARD-Download-README.txt
<#107 (comment)>:
> +
+shortname_antibiotics.tsv
+shortname_pathogens.tsv"
+
+
+## FASTA
+
+The FASTA files included here contain retrieved sequences of antimicrobial resistance genes.
+
+## Data Files Downloaded
+aro_index.tsv
+This file contains an index of ARO (Antibiotic Resistance Ontology) identifiers with associated GenBank accessions. Each entry includes information used to link antibiotic resistance genes to GenBank sequences.
+shortname_antibiotics.tsv
+Contains standardized abbreviations for antibiotics used in CARD’s short names. These abbreviations, which follow conventions from the American Society for Microbiology (ASM) and additional custom terms, provide a uniform naming system for antibiotics referenced within CARD data.
+
+shortname_pathogens.tsv
⬇️ Suggested change
-shortname_pathogens.tsv
+`shortname_pathogens.tsv`
------------------------------
In case_studies/CARD/CARD_data/shortname_antibiotics.tsv
<#107 (comment)>:
> @@ -0,0 +1,76 @@
+AAC Abbreviation Molecule
can use this for short nomenclature, e.g., spp_dru_...
------------------------------
In case_studies/CARD/CARD_data/shortname_pathogens.tsv
<#107 (comment)>:
> @@ -0,0 +1,94 @@
+Abbreviation Pathogen
short names for species.
------------------------------
In case_studies/CARD/ESKAPE Pathogens Code.R
<#107 (comment)>:
> +# Install and Load dplyr and readr
+packages <- c("dplyr", "readr")
+
+for (pkg in packages) {
+ if (!require(pkg, character.only = TRUE)) {
+ install.packages(pkg)
+ library(pkg, character.only = TRUE)
+ } else {
+ library(pkg, character.only = TRUE)
+ }
+}
needed?
------------------------------
In case_studies/CARD/ESKAPE Pathogens Code.R
<#107 (comment)>:
> @@ -0,0 +1,321 @@
+# config.R
+url <- "https://card.mcmaster.ca/download/0/broadstreet-v3.3.0.tar.bz2"
check for duplicated code across config.R and bug_drug.R
------------------------------
In case_studies/CARD/data_cleanup_comparison.R
<#107 (comment)>:
> +
+# View the pre-cleanup snippet
+View(aro_index_snippet)
+
+# View the post-cleanup snippet
+View(resistance_profile_data_snippet)
+
Not sure if this was used to look through the dataset -- e.g., with
glimpse. But I meant actual example input/output data to run and check.
------------------------------
In case_studies/CARD/Bug-Drug Code.R
<#107 (comment)>:
> +# Extract pathogen, gene, drug, and include Protein.Accession from 'CARD Short Name'
+library(dplyr)
+library(purrr)
+library(stringr)
+
+# Extract pathogen, gene, drug, and include Protein.Accession from 'CARD Short Name'
+extract_card_info <- function(card_short_name, drug_class, `Protein Accession`, `DNA Accession`) {
+ # Split the CARD Short Name by underscores
+ split_names <- unlist(strsplit(card_short_name, "_"))
+
+ # Initialize variables with defaults
+ pathogen <- NA
+ gene <- NA
+ drug <- drug_class # Default to Drug Class column
+
+ # Determine the information based on the split names and patterns
No, I meant snapshots or example data stored locally (as part of the
commit) to be able to run the code and check locally.
—
Reply to this email directly, view it on GitHub
<#107 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BIQBLEIDV66UYPHZGOPRXY32CO5ULAVCNFSM6AAAAABQIR5UWWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDINJZHA3DIMBUGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
…undant files from R folder
Description
What kind of change(s) are included?
Checklist
Please ensure that all boxes are checked before indicating that this pull request is ready for review.
@jananiravi @falquaddoomi @epbrenner @AbhirupaGhosh