Restructured project: Moved files into case_studies/CARD, removed red… #107

Cateline · 2024-10-20T14:19:05Z

…undant files from R folder

Description

What kind of change(s) are included?

Feature (adds or updates new capabilities)
Bug fix (fixes an issue).
Enhancement (adds functionality).
Breaking change (these changes would cause existing functionality to not work as expected).

Checklist

Please ensure that all boxes are checked before indicating that this pull request is ready for review.

I have read and followed the CONTRIBUTING.md guidelines.
I have searched for existing content to ensure this is not a duplicate.
I have performed a self-review of these additions (including spelling, grammar, and related).
I have added comments to my code to help provide understanding.
I have added a test which covers the code changes found within this PR.
I have deleted all non-relevant text in this pull request template.
Reviewer assignment: Tag a relevant team member to review and approve the changes.
@jananiravi @falquaddoomi @epbrenner @AbhirupaGhosh

…viLab#27

falquaddoomi

IMHO this looks a lot better! I see that the check is passing, too, which makes sense considering that the package itself is unchanged. I'll have to rely on someone else to review the R code in detail, but it looks reasonable to me.

If this PR replaces #103, I'd suggest closing that one without merging to reduce the potential for confusion.

Cateline · 2024-10-22T13:29:58Z

Noted. Thank you for the help🙏

…

On Tue, 22 Oct 2024, 01:22 Faisal Alquaddoomi, ***@***.***> wrote: ***@***.**** commented on this pull request. IMHO this looks a lot better! I see that the check is passing, too, which makes sense considering that the package itself is unchanged. I'll have to rely on someone else to review the R code in detail, but it looks reasonable to me. If this PR replaces #103 <#103>, I'd suggest closing that one without merging to reduce the potential for confusion. — Reply to this email directly, view it on GitHub <#107 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BIQBLEIJY6OKJSQMEG6WSS3Z4V5DBAVCNFSM6AAAAABQIR5UWWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGOBTGQYDMNZUGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

epbrenner

Looking better!

If this script is meant to be generalized so you can enter the bug/drug combo of interest and get the results out, I think it needs some refactoring as I mention in line-by-line comments, but for this specific bug/drug combo, it looks like it works all right. If we want to expand this to be generalizable, I suggest we can merge this script and add a new issue to generalize it.

Another suggestion is actually to drop most of the CARD_data files from being included in this PR, though. Only aro_index.tsv, shortname_antibiotics.tsv, and shortname_pathogens.tsv are used from the set, so no point in adding the much larger .fasta sequence files to the PR as well, at least unless you want to refactor much more broadly to search those files for your sequences instead of using rentrez. That would be a major rewrite, and isn't what I'm suggesting here, so I'd just omit the extra CARD data files.

case_studies/CARD/Bug-Drug Code.R

epbrenner · 2024-10-22T15:46:19Z

case_studies/CARD/Bug-Drug Code.R

+
+
+# Mutate data
+aro_index <- aro_index %>%


This step yields a lot of NA values, so just clarifying that that's intended. Multi-pathogen genes like aadA or acrB don't parse into the "pathogen / gene / drug" pattern successfully, so you have things like

pathogen,gene,drug Abau,ampC,BLA Abau,Abaf,NA acrB, NA, NA

case_studies/CARD/Bug-Drug Code.R

AbhirupaGhosh

I would suggest you can let go of the files like nucleotide...fasta, protein..fasta that you don't need for your R script.

case_studies/CARD/Bug-Drug Code.R

falquaddoomi · 2024-10-22T20:07:52Z

Hey @Cateline, for future reference you can delete multiple files in a single commit; you don't need to create one commit per deletion.

Cateline · 2024-10-22T20:12:06Z

Hey @Cateline, for future reference you can delete multiple files in a single commit; you don't need to create one commit per deletion.

Oh, okay. Didn't know that. I was using the Github API to delete them

Change combined FASTA sequences file name Co-authored-by: Evan Pierce Brenner <[email protected]>

Co-authored-by: Evan Pierce Brenner <[email protected]>

jananiravi · 2024-10-22T22:32:54Z

Agree with the broad comments of:

Cleaning up for said bug-drug combo
Rm unnecessary large files
Pooling related commits
Creating a new but related issue (tagging this issue + pr) to generalize for any new bug-drug combo (while bearing in mind companion file requirements and which of these GitHub friendly in terms of size). Could @AbhirupaGhosh @epbrenner create this new issue based on how this structured?
Guess this is the reason Fixes Phase 1 of Issue #27 #103 is closed?

Any other outstanding Qs to fix/merge this issue? If so, I can look at it more carefully later this week.

Thanks, @Cateline !

Updated package loading to use require() for conditional installation. Renamed fasta file and removed redundant lines (35-44). Removed decompression step along with renaming of the zipped file

jananiravi

thanks for the contribution, @Cateline. With feedback from @AbhirupaGhosh and others, I think this should be ready to go soon in an iteration or more. how are you planning on extending it beyond Saur and DAP?

case_studies/CARD/CARD_data/shortname_antibiotics.tsv

jananiravi · 2024-10-24T20:09:47Z

case_studies/CARD/CARD_data/CARD-Download-README.txt

@Cateline, thanks for adding this README. Out of curiosity, are these descriptions already paraphrased from the original source (CARD), or yet to be?

The descriptions are from the original source (CARD) and have not been paraphrased yet

case_studies/CARD/CARD_data/CARD-Download-README.txt

case_studies/CARD/Bug-Drug Code.R

Co-authored-by: Janani Ravi <[email protected]>

…mycin_sequences.fasta

Changed data import function from read.delim to read_delim

- Standardize 'Protein_Accession' naming conventions - Switch 'sapply' to 'purrr::map' functions - Rename 'aro_index' to 'resistance_profile' for better context - Use explicit column names instead of positional arguments where applicable

Update `extract_card_info` function to correctly categorize complex gene entries

-Improve merging process between extracted resistance profile data, antibiotics data, and pathogens data -Add logic to handle multi-species pathogens and multi-class drugs

- Implement `fetch_fasta_sequence` to retrieve FASTA sequences from Entrez using protein accession IDs. - Add loop to iterate over `filtered_data`

jananiravi

Leaving minor comments. Defer to @AbhirupaGhosh @epbrenner & FA/DM for a full review.

case_studies/CARD/Bug-Drug Code.R

jananiravi · 2024-11-01T18:05:37Z

case_studies/CARD/Bug-Drug Code.R

+  gene <- NA
+  drug <- drug_class  # Default to Drug Class column
+
+  # Determine the information based on the split names and patterns


can you share an example file (snippet pre and post name cleanup)?

can you share an example file (snippet pre and post name cleanup)?

Hello @jananiravi , by this do you mean I should use the View() function in R to allow for the visual inspection of the dataset before and after processing

No, I meant snapshots or example data stored locally (as part of the commit) to be able to run the code and check locally.

Co-authored-by: Janani Ravi <[email protected]>

Fixed tar file extraction by adding .bz2 suffix to line 16 for proper file handling

Update file paths for antibiotics and pathogens data in Lines 89-90 for proper loading.

… options for multiclass exclusion and species restriction This update introduces a filter_resistance_mechanisms function with customizable options for partial drug matches, exclusion of multiclass resistance, and species-specific filtering.

-Compared original dataset (`aro_index.tsv`) with cleaned dataset (`resistance_profile_data.tsv`). -Saved snippets of the pre-cleanup data (`aro_index.tsv`) and post-cleanup data (`resistance_profile_data.tsv`) for comparison.

Expanded Bug-Drug.R code to retrieve and save FASTA sequences for ESKAPE pathogens resistant to DAP (Daptomycin)

jananiravi

Added some quick thoughts. Also, look through recent PRs from @awasyn for cross-checks. Feel free to add a code review for that PR as well (related to CARD #111).

jananiravi · 2024-11-26T00:04:35Z

case_studies/CARD/Bug-Drug Code.R

+  gene <- NA
+  drug <- drug_class  # Default to Drug Class column
+
+  # Determine the information based on the split names and patterns


No, I meant snapshots or example data stored locally (as part of the commit) to be able to run the code and check locally.

jananiravi · 2024-11-26T00:05:07Z

case_studies/CARD/Bug-Drug Code.R

+
+
+
+
+
+
+
+
+
+
+


Suggested change

jananiravi · 2024-11-26T00:08:05Z

case_studies/CARD/Bug-Drug Code.R

+# Loop through each Protein Accession in the filtered data to fetch sequences
+for (i in 1:nrow(filtered_data_saurdap)) {
+  # Get the Protein Accession ID
+  Protein_accession <- filtered_data_saurdap$Protein_Accession[i]


Confusing alternating use of Protein_ vs. protein_accession. 🤔

jananiravi · 2024-11-26T00:08:25Z

case_studies/CARD/Bug-Drug Code.R

+
+
+# Define the output file for the FASTA sequences
+output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta"


If using short names for species (4-char) and drugs (antibiotics, 3-char).
arg = antibiotic resistance genes, for example.
Which shortnames are you planning to use?
cc: @AbhirupaGhosh @charmvang @awasyn @epbrenner

Suggested change

output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta"

output_fasta_file <- "Saur_dap_arg.fasta"

jananiravi · 2024-11-26T00:10:56Z

case_studies/CARD/Bug-Drug Code.R

+
+
+# Extract pathogen, gene, drug, and include Protein.Accession from 'CARD Short Name'
+extract_card_info <- function(card_short_name, drug_class, `Protein Accession`, `DNA Accession`) {


rename all colnames with spaces and special characters to now include only _. Also avoid multiple cases.

@AbhirupaGhosh @charmvang @awasyn @epbrenner @the-mayer -- using camelCase for colnames or snake_case (without caps)?

jananiravi · 2024-11-26T00:16:51Z

case_studies/CARD/CARD_data/shortname_antibiotics.tsv

@@ -0,0 +1,76 @@
+AAC Abbreviation	Molecule


can use this for short nomenclature, e.g., spp_dru_...

case_studies/CARD/CARD_data/shortname_pathogens.tsv

jananiravi · 2024-11-26T00:18:57Z

case_studies/CARD/ESKAPE Pathogens Code.R

+# Install and Load dplyr and readr
+packages <- c("dplyr", "readr")
+
+for (pkg in packages) {
+  if (!require(pkg, character.only = TRUE)) {
+    install.packages(pkg)
+    library(pkg, character.only = TRUE)
+  } else {
+    library(pkg, character.only = TRUE)
+  }
+}


case_studies/CARD/ESKAPE Pathogens Code.R

jananiravi · 2024-11-26T00:22:33Z

case_studies/CARD/data_cleanup_comparison.R

+
+# View the pre-cleanup snippet
+View(aro_index_snippet)
+
+# View the post-cleanup snippet
+View(resistance_profile_data_snippet)
+


Not sure if this was used to look through the dataset -- e.g., with glimpse. But I meant actual example input/output data to run and check.

Cateline · 2024-11-26T13:41:34Z

Thank you for your review I will definitely get in touch with @awasyn

…

On Tue, 26 Nov 2024, 03:24 Janani Ravi, ***@***.***> wrote: ***@***.**** commented on this pull request. Added some quick thoughts. Also, look through recent PRs from @awasyn <https://github.com/awasyn> for cross-checks. Feel free to add a code review for that PR as well (related to CARD #111 <#111>). ------------------------------ In case_studies/CARD/Bug-Drug Code.R <#107 (comment)>: > + + + + + + + + + + + ⬇️ Suggested change - - - - - - - - - - - ------------------------------ In case_studies/CARD/Bug-Drug Code.R <#107 (comment)>: > + warning(paste("Error fetching FASTA sequence for protein accession:", protein_accession, ":", e$message)) + return(NULL) + }) +} + + +# Define the output file for the FASTA sequences +output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta" + +# Initialize an empty character vector to store the sequences +combined_sequences <- character() + +# Loop through each Protein Accession in the filtered data to fetch sequences +for (i in 1:nrow(filtered_data_saurdap)) { + # Get the Protein Accession ID + Protein_accession <- filtered_data_saurdap$Protein_Accession[i] Confusing alternating use of Protein_ vs. protein_accession. 🤔 ------------------------------ In case_studies/CARD/Bug-Drug Code.R <#107 (comment)>: > + fasta_seq <- paste(lines, collapse = "\n") + + return(fasta_seq) + } else { + warning(paste("Failed to retrieve FASTA sequence for protein accession:", protein_accession)) + return(NULL) + } + }, error = function(e) { + warning(paste("Error fetching FASTA sequence for protein accession:", protein_accession, ":", e$message)) + return(NULL) + }) +} + + +# Define the output file for the FASTA sequences +output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta" If using short names for species (4-char) and drugs (antibiotics, 3-char). arg = antibiotic resistance genes, for example. Which shortnames are you planning to use? cc: @AbhirupaGhosh <https://github.com/AbhirupaGhosh> @charmvang <https://github.com/charmvang> @awasyn <https://github.com/awasyn> @epbrenner <https://github.com/epbrenner> ⬇️ Suggested change -output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta" +output_fasta_file <- "Saur_dap_arg.fasta" ------------------------------ In case_studies/CARD/Bug-Drug Code.R <#107 (comment)>: > + +# Extract the tar file +untar("broadstreet-v3.3.0.tar.bz2", exdir = "CARD_data") + + +# Map CARD Short Name + +# Parse the required files using readr::read_delim +aro_index <- read_delim("CARD_data/aro_index.tsv", delim = "\t", col_names = TRUE) +antibiotics_data <- read_delim("CARD_data/shortname_antibiotics.tsv", delim = "\t", col_names = TRUE) +pathogens_data <- read_delim("CARD_data/shortname_pathogens.tsv", delim = "\t", col_names = TRUE) + + + +# Extract pathogen, gene, drug, and include Protein.Accession from 'CARD Short Name' +extract_card_info <- function(card_short_name, drug_class, `Protein Accession`, `DNA Accession`) { rename all colnames with spaces and special characters to now include only _. Also avoid multiple cases. @AbhirupaGhosh <https://github.com/AbhirupaGhosh> @charmvang <https://github.com/charmvang> @awasyn <https://github.com/awasyn> @epbrenner <https://github.com/epbrenner> @the-mayer <https://github.com/the-mayer> -- using camelCase for colnames or snake_case (without caps)? ------------------------------ In case_studies/CARD/Bug-Drug Code.R <#107 (comment)>: > + gene <- split_names[2] + } else if (length(split_names) == 3) { + # Pathogen-Gene-Drug scenario + pathogen <- split_names[1] + gene <- split_names[2] + drug <- split_names[3] # Assign drug from the split entry + } + + # If both pathogen and gene are NA, classify as complex gene + if (is.na(pathogen) && is.na(gene)) { + gene <- card_short_name # Assign entire CARD Short Name as gene + pathogen <- "MULTI" # Default to MULTI for pathogen + } + + # Handle Protein Accession + if (is.na(`Protein Accession`) || `Protein Accession` == "") { if renamed above, there will be no colnames with spaces ------------------------------ In case_studies/CARD/Bug-Drug Code.R <#107 (comment)>: > + + +library(rentrez) +library(XML) +library(stringr) + +# Filter for the target drug (DAP) and pathogen (Staphylococcus aureus) +filter_resistance_mechanisms <- function(data, drug, bug, exclude_multiclass = FALSE, species_restricted = TRUE) { + + # Filter by drug using partial match to include multiclass entries containing the target drug + filtered_data <- data %>% + filter(grepl(drug, Drug, ignore.case = TRUE)) + + # Filter by pathogen, using partial match + filtered_data <- filtered_data %>% + filter(grepl(bug, Pathogen_Full_Name, ignore.case = TRUE)) if using snake_case, there will be no caps as in Pathogen_Full_Name. ------------------------------ In case_studies/CARD/Bug-Drug Code.R <#107 (comment)>: > +combined_sequences <- character() + +# Loop through each Protein Accession in the filtered data to fetch sequences +for (i in 1:nrow(filtered_data_saurdap)) { + # Get the Protein Accession ID + Protein_accession <- filtered_data_saurdap$Protein_Accession[i] + + cat("Fetching sequence for Protein Accession:", protein_accession, "\n") # Debugging message + + # Fetch the FASTA sequence + fasta_sequence <- fetch_fasta_sequence(protein_accession) + + # If the sequence was fetched successfully, add it to the combined_sequences vector + if (!is.null(fasta_sequence)) { + combined_sequences <- c(combined_sequences, fasta_sequence) + cat("Successfully fetched sequence for:", protein_accession, "\n") Not sure if this is for multiple or single accession numbers. change accordingly? ⬇️ Suggested change - cat("Successfully fetched sequence for:", protein_accession, "\n") + cat("Successfully fetched sequences for:", protein_accession, "\n") ------------------------------ In case_studies/CARD/CARD_data/CARD-Download-README.txt <#107 (comment)>: > @@ -0,0 +1,32 @@ +# CARD README + +## Source: ⬇️ Suggested change -## Source: +## Source ------------------------------ In case_studies/CARD/CARD_data/CARD-Download-README.txt <#107 (comment)>: > @@ -0,0 +1,32 @@ +# CARD README + +## Source: +This dataset was downloaded from the Comprehensive Antibiotic Resistance Database (CARD) in 2024-10 at https://card.mcmaster.ca/download/0/broadstreet-v3.3.0.tar.bz2 ⬇️ Suggested change -This dataset was downloaded from the Comprehensive Antibiotic Resistance Database (CARD) in 2024-10 at https://card.mcmaster.ca/download/0/broadstreet-v3.3.0.tar.bz2 +This dataset and associated README were downloaded from the Comprehensive Antibiotic Resistance Database (CARD) (2024-10) at https://card.mcmaster.ca/download/0/broadstreet-v3.3.0.tar.bz2. ------------------------------ In case_studies/CARD/CARD_data/CARD-Download-README.txt <#107 (comment)>: > +prediction at the Comprehensive Antibiotic Resistance Database" Nucleic Acids Research, +51, D690-D699. https://pubmed.ncbi.nlm.nih.gov/36263822/ + +## CARD SHORT NAMES + +The CARD database uses standardized abbreviations, known as CARD Short Names, for AMR gene names associated with Antibiotic Resistance Ontology terms. These names are created for compatibility across data files and outputs from the Resistance Gene Identifier (RGI). Short Names for genes with 15 or fewer characters retain the original gene name, while longer names are abbreviated to uniquely represent each gene or protein. All CARD Short Names replace whitespace with underscores. For pathogen names, CARD follows the convention of capitalizing the first letter of the genus followed by the first three letters of the species in lowercase. Where applicable, CARD Short Names adopt formats such as “pathogen_gene,” “pathogen_gene_drug,” or “gene_drug.” Full lists of these abbreviations are available in the provided files: + +shortname_antibiotics.tsv +shortname_pathogens.tsv" + + +## FASTA + +The FASTA files included here contain retrieved sequences of antimicrobial resistance genes. + +## Data Files Downloaded ⬇️ Suggested change -## Data Files Downloaded +## Data files downloaded ------------------------------ In case_studies/CARD/CARD_data/CARD-Download-README.txt <#107 (comment)>: > +51, D690-D699. https://pubmed.ncbi.nlm.nih.gov/36263822/ + +## CARD SHORT NAMES + +The CARD database uses standardized abbreviations, known as CARD Short Names, for AMR gene names associated with Antibiotic Resistance Ontology terms. These names are created for compatibility across data files and outputs from the Resistance Gene Identifier (RGI). Short Names for genes with 15 or fewer characters retain the original gene name, while longer names are abbreviated to uniquely represent each gene or protein. All CARD Short Names replace whitespace with underscores. For pathogen names, CARD follows the convention of capitalizing the first letter of the genus followed by the first three letters of the species in lowercase. Where applicable, CARD Short Names adopt formats such as “pathogen_gene,” “pathogen_gene_drug,” or “gene_drug.” Full lists of these abbreviations are available in the provided files: + +shortname_antibiotics.tsv +shortname_pathogens.tsv" + + +## FASTA + +The FASTA files included here contain retrieved sequences of antimicrobial resistance genes. + +## Data Files Downloaded +aro_index.tsv ⬇️ Suggested change -aro_index.tsv +`aro_index.tsv` ------------------------------ In case_studies/CARD/CARD_data/CARD-Download-README.txt <#107 (comment)>: > +## CARD SHORT NAMES + +The CARD database uses standardized abbreviations, known as CARD Short Names, for AMR gene names associated with Antibiotic Resistance Ontology terms. These names are created for compatibility across data files and outputs from the Resistance Gene Identifier (RGI). Short Names for genes with 15 or fewer characters retain the original gene name, while longer names are abbreviated to uniquely represent each gene or protein. All CARD Short Names replace whitespace with underscores. For pathogen names, CARD follows the convention of capitalizing the first letter of the genus followed by the first three letters of the species in lowercase. Where applicable, CARD Short Names adopt formats such as “pathogen_gene,” “pathogen_gene_drug,” or “gene_drug.” Full lists of these abbreviations are available in the provided files: + +shortname_antibiotics.tsv +shortname_pathogens.tsv" + + +## FASTA + +The FASTA files included here contain retrieved sequences of antimicrobial resistance genes. + +## Data Files Downloaded +aro_index.tsv +This file contains an index of ARO (Antibiotic Resistance Ontology) identifiers with associated GenBank accessions. Each entry includes information used to link antibiotic resistance genes to GenBank sequences. +shortname_antibiotics.tsv ⬇️ Suggested change -shortname_antibiotics.tsv +`shortname_antibiotics.tsv` ------------------------------ In case_studies/CARD/CARD_data/CARD-Download-README.txt <#107 (comment)>: > + +shortname_antibiotics.tsv +shortname_pathogens.tsv" + + +## FASTA + +The FASTA files included here contain retrieved sequences of antimicrobial resistance genes. + +## Data Files Downloaded +aro_index.tsv +This file contains an index of ARO (Antibiotic Resistance Ontology) identifiers with associated GenBank accessions. Each entry includes information used to link antibiotic resistance genes to GenBank sequences. +shortname_antibiotics.tsv +Contains standardized abbreviations for antibiotics used in CARD’s short names. These abbreviations, which follow conventions from the American Society for Microbiology (ASM) and additional custom terms, provide a uniform naming system for antibiotics referenced within CARD data. + +shortname_pathogens.tsv ⬇️ Suggested change -shortname_pathogens.tsv +`shortname_pathogens.tsv` ------------------------------ In case_studies/CARD/CARD_data/shortname_antibiotics.tsv <#107 (comment)>: > @@ -0,0 +1,76 @@ +AAC Abbreviation Molecule can use this for short nomenclature, e.g., spp_dru_... ------------------------------ In case_studies/CARD/CARD_data/shortname_pathogens.tsv <#107 (comment)>: > @@ -0,0 +1,94 @@ +Abbreviation Pathogen short names for species. ------------------------------ In case_studies/CARD/ESKAPE Pathogens Code.R <#107 (comment)>: > +# Install and Load dplyr and readr +packages <- c("dplyr", "readr") + +for (pkg in packages) { + if (!require(pkg, character.only = TRUE)) { + install.packages(pkg) + library(pkg, character.only = TRUE) + } else { + library(pkg, character.only = TRUE) + } +} needed? ------------------------------ In case_studies/CARD/ESKAPE Pathogens Code.R <#107 (comment)>: > @@ -0,0 +1,321 @@ +# config.R +url <- "https://card.mcmaster.ca/download/0/broadstreet-v3.3.0.tar.bz2" check for duplicated code across config.R and bug_drug.R ------------------------------ In case_studies/CARD/data_cleanup_comparison.R <#107 (comment)>: > + +# View the pre-cleanup snippet +View(aro_index_snippet) + +# View the post-cleanup snippet +View(resistance_profile_data_snippet) + Not sure if this was used to look through the dataset -- e.g., with glimpse. But I meant actual example input/output data to run and check. ------------------------------ In case_studies/CARD/Bug-Drug Code.R <#107 (comment)>: > +# Extract pathogen, gene, drug, and include Protein.Accession from 'CARD Short Name' +library(dplyr) +library(purrr) +library(stringr) + +# Extract pathogen, gene, drug, and include Protein.Accession from 'CARD Short Name' +extract_card_info <- function(card_short_name, drug_class, `Protein Accession`, `DNA Accession`) { + # Split the CARD Short Name by underscores + split_names <- unlist(strsplit(card_short_name, "_")) + + # Initialize variables with defaults + pathogen <- NA + gene <- NA + drug <- drug_class # Default to Drug Class column + + # Determine the information based on the split names and patterns No, I meant snapshots or example data stored locally (as part of the commit) to be able to run the code and check locally. — Reply to this email directly, view it on GitHub <#107 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BIQBLEIDV66UYPHZGOPRXY32CO5ULAVCNFSM6AAAAABQIR5UWWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDINJZHA3DIMBUGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jananiravi requested review from AbhirupaGhosh, falquaddoomi and the-mayer October 20, 2024 19:39

jananiravi added enhancement New feature or request outreachy for outreachy interns bioinfo Bioinformatics related labels Oct 20, 2024

Fixes Phase 1 of Issue JRaviLab#27 and Issue JRaviLab#103

cfc7bf6

Cateline force-pushed the CARD_Analysis branch from 5ef0a1d to cfc7bf6 Compare October 21, 2024 09:02

Added link to MolEvolvR Case Study report. Fixes Phase 2 of Issue JRa…

9615773

…viLab#27

falquaddoomi reviewed Oct 21, 2024

View reviewed changes

epbrenner reviewed Oct 22, 2024

View reviewed changes

AbhirupaGhosh reviewed Oct 22, 2024

View reviewed changes

case_studies/CARD/Bug-Drug Code.R Outdated Show resolved Hide resolved

case_studies/CARD/Bug-Drug Code.R Outdated Show resolved Hide resolved

case_studies/CARD/Bug-Drug Code.R Outdated Show resolved Hide resolved

Cateline added 12 commits October 22, 2024 12:37

Delete unnecessary files

9f06bb2

Remove unnecessary CARD data files

69916b8

Remove unnecessary CARD data files

0a3572e

Remove unnecessary CARD data files

08ed58f

Remove unnecessary CARD data files

a2643f1

Remove unnecessary CARD data files

9be2e3b

Remove unnecessary CARD data files

b0dbb23

Remove unnecessary CARD data files

8ddf883

Remove unnecessary CARD data files

b0c5dfa

Remove unnecessary CARD data files

2eb20ce

Remove unnecessary CARD data files

a532154

Remove unnecessary CARD data files

7aa8917

Cateline and others added 2 commits October 22, 2024 13:15

Update case_studies/CARD/Bug-Drug Code.R

52ce540

Change combined FASTA sequences file name Co-authored-by: Evan Pierce Brenner <[email protected]>

Update case_studies/CARD/Bug-Drug Code.R

4177654

Co-authored-by: Evan Pierce Brenner <[email protected]>

Cateline added 4 commits October 23, 2024 18:12

Update Bug-Drug Code.R

444b520

Updated package loading to use require() for conditional installation. Renamed fasta file and removed redundant lines (35-44). Removed decompression step along with renaming of the zipped file

Add HTML report file to reports folder

e223f86

Delete case_studies/CARD/reports/download.htm

56addcc

Add HTML Report File

f2af6f4

jananiravi requested changes Oct 24, 2024

View reviewed changes

Cateline and others added 9 commits October 25, 2024 00:01

Update case_studies/CARD/CARD_data/CARD-Download-README.txt

f590d94

Co-authored-by: Janani Ravi <[email protected]>

Update case_studies/CARD/CARD_data/CARD-Download-README.txt

5d174be

Co-authored-by: Janani Ravi <[email protected]>

Update case_studies/CARD/CARD_data/CARD-Download-README.txt

54e7b5b

Co-authored-by: Janani Ravi <[email protected]>

Update case_studies/CARD/CARD_data/CARD-Download-README.txt

1195e1e

Co-authored-by: Janani Ravi <[email protected]>

Update case_studies/CARD/CARD_data/CARD-Download-README.txt

2d80ab5

Co-authored-by: Janani Ravi <[email protected]>

Update CARD-Download-README.txt

b709416

Rename Staph_aureus_Daptomycin_sequences5.fasta to Staph_aureus_Dapto…

eca5d37

…mycin_sequences.fasta

Update Bug-Drug Code.R

993bc09

Changed data import function from read.delim to read_delim

Update Bug-Drug Code.R

ab67c1c

- Standardize 'Protein_Accession' naming conventions - Switch 'sapply' to 'purrr::map' functions - Rename 'aro_index' to 'resistance_profile' for better context - Use explicit column names instead of positional arguments where applicable

Cateline marked this pull request as draft October 31, 2024 22:34

Cateline added 4 commits October 31, 2024 16:05

Enhance logic for determining pathogen, gene, and drug fields

13a6e8b

Update `extract_card_info` function to correctly categorize complex gene entries

Enhance data mapping logic

9a7688d

-Improve merging process between extracted resistance profile data, antibiotics data, and pathogens data -Add logic to handle multi-species pathogens and multi-class drugs

Add function to fetch and save protein FASTA sequences from Entrez

14992a3

- Implement `fetch_fasta_sequence` to retrieve FASTA sequences from Entrez using protein accession IDs. - Add loop to iterate over `filtered_data`

Update Bug-Drug Code.R

e105319

jananiravi reviewed Nov 1, 2024

View reviewed changes

Cateline and others added 8 commits November 1, 2024 12:42

Update case_studies/CARD/Bug-Drug Code.R

f6b87e7

Co-authored-by: Janani Ravi <[email protected]>

Update case_studies/CARD/Bug-Drug Code.R

bbb8c91

Co-authored-by: Janani Ravi <[email protected]>

Update Bug-Drug Code.R

8e68be7

Fixed tar file extraction by adding .bz2 suffix to line 16 for proper file handling

Update Bug-Drug Code.R

8afcba8

Update file paths for antibiotics and pathogens data in Lines 89-90 for proper loading.

Data Cleanup Comparison

aee86b7

-Compared original dataset (`aro_index.tsv`) with cleaned dataset (`resistance_profile_data.tsv`). -Saved snippets of the pre-cleanup data (`aro_index.tsv`) and post-cleanup data (`resistance_profile_data.tsv`) for comparison.

Automate Case-Studies Issue JRaviLab#27

1dc5c81

Expanded Bug-Drug.R code to retrieve and save FASTA sequences for ESKAPE pathogens resistant to DAP (Daptomycin)

Rename Bug-Drug Code.R to bug_drug.R

4ddc8e1

jananiravi reviewed Nov 26, 2024

View reviewed changes



		# Define the output file for the FASTA sequences
		output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta"

	output_fasta_file <- "Staph_aureus_Daptomycin_sequences.fasta"
	output_fasta_file <- "Saur_dap_arg.fasta"



		# Extract pathogen, gene, drug, and include Protein.Accession from 'CARD Short Name'
		extract_card_info <- function(card_short_name, drug_class, `Protein Accession`, `DNA Accession`) {

Uh oh!

Restructured project: Moved files into case_studies/CARD, removed red… #107

Are you sure you want to change the base?

Restructured project: Moved files into case_studies/CARD, removed red… #107

Uh oh!

Conversation

Cateline commented Oct 20, 2024

Description

What kind of change(s) are included?

Checklist

Uh oh!

falquaddoomi left a comment

Choose a reason for hiding this comment

Uh oh!

Cateline commented Oct 22, 2024 via email

Uh oh!

epbrenner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AbhirupaGhosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

falquaddoomi commented Oct 22, 2024

Uh oh!

Cateline commented Oct 22, 2024

Uh oh!

jananiravi commented Oct 22, 2024

Uh oh!

jananiravi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jananiravi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jananiravi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jananiravi left a comment •

edited

Loading