Skip to content

Latest commit

 

History

History
100 lines (98 loc) · 5.25 KB

File metadata and controls

100 lines (98 loc) · 5.25 KB

bakta_frequent_features_5000.txt

Relative Path: data/meta/bakta_frequent_features_5000.txt Date Added: <2026-02-24 Tue> Source: Custom Top 5000 most frequent bakta features (Product, Gene) across all 5000 batches of the AST browser genomes Produced with the find_common_features.py script

reference_genomes.tsv

Relative Path: data/meta/reference_genomes.tsv Date Added: <2025-12-04 Thu> Source: Custom TSV file of reference genomes to use the TaxIDs present in the AST browser samples. For each TaxID, metadata for all available NCBI genomes were downloaded and a score for each assembly was computed based on the assembly level and ANI matching status. Up to three of the top-scoring assemblies were taken to be the reference genome for that TaxID. Atypical assemblies were filtered out. See full details in reports/view_dataset.py

ncbi_datasets_lookup.tsv

Relative Path: data/meta/ncbi_datasets_lookup.tsv Date Added: <2025-12-04 Thu> Source: Custom TSV file containing genome lookups for the TaxIDs of the AST browser samples. Produced by find_reference_genomes.py using calls to the NCBI datasets tool

ncbi_datasets_bioprojects.tsv

Relative Path: data/meta/ncbi_datasets_bioprojects.tsv Date Added: <2025-12-04 Thu> Source: Custom TSV file containing genome lookups for all bioprojects in the available AST browser sammples (biosample_mapping_2025-11-21.csv). Produced by find_reference_genomes.py using calls to the NCBI datasets tool

biosample_attributes.tsv

Relative Path: data/meta/biosample_attributes.tsv Date Added: <2025-11-26 Wed> Source: Custom — scripts/format_ncbi_metadata.R Attributes for the biosample accessions in biosample_mapping_2025-11-21.csv, obtained by parsing output of efetch using get_biosample_attrs in src/utils.R. Required because some of these metadata are not provided in the AST browser or in the mapping file

biosample_mapping_2025-11-21.csv

Relative Path: data/meta/biosample_mapping_2025-11-21.csv Date Added: <2025-11-21 Wed> Source: Custom — src/map_biosample.R File mapping BioSample accessions in AST browser to SRA runs and BioProject accessions, produced with calls to esearch and efetch.

Required because the downloadable table from AST browser doesn’t include SRA runs

ADB_all_compounds.csv

Relative Path: data/meta/ADB_all_compounds.csv Date Added: <2025-11-26 Wed> Source: AntibioticDB All compounds in the AntibioticDB database, downloaded [2025-11-26 Wed]

moradigaravand_samples.tsv

Relative Path: data/meta/moradigaravand_samples.tsv Date Added: <2025-10-24 Fri> Source: S1 Table Isolate metadata from the 2018 study Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data - PMC by Morardigaravand et. al. Tested concentrations using EUCAST

  • Acronyms
    • penicillin: ampicillin (AMP, C.B. (clinical breakpoint): 6μg/ml)
    • cephalosporins: cefuroxime (CXM, C.B.: 8μg/ml)
    • cefotaxime (CTX, C.B.: 4μg/ml)
    • cephalothin (CET, C.B.: 20μg/ml)
    • ceftazidime (CTZ, C.B.: 0.25μg/ml))
    • aminoglycosides (gentamicin (GEN, C.B.: 4μg/ml)
    • tobramycin (TBM, C.B.: 8μg/ml))
    • fluoroquinolones (ciprofloxacin (CIP, C.B.: 1μg/ml))
    • amoxicillin-clavulanate (AMC)
    • amoxicillin (AMX)
    • trimethoprim (TMP)

jia_sample_mapping.tsv

Relative Path: data/meta/jia_sample_mapping.tsv Date Added: <2025-09-22 Mon> Source: Custom TSV file mapping Jia sample Assembly accessions to their BioSample accessions. For renaming

jia_samples.tsv

Relative Path: data/meta/jia_samples.tsv Date Added: <2025-09-22 Mon> Source: Paper supplemental data Sample metadata from Jia et al., extracted from the supplemental data word document. Tested concentrations using EUCAST

  • Acronyms
    • imipenem (IPM)
    • meropenem (MEM)
    • gentamicin (GEN)
    • amikacin (AMK)
    • cefepime (FEP)
    • ceftriaxone (CRO)
    • ceftazidime (CAZ)
    • minocycline (MIN)
    • tigecycline (TGC)
    • colistin (CST)

BVBRC_genome_amr.csv

Relative Path: data/raw/BVBRC_genome_amr.csv Date Added: <2025-09-04 Thu> Source: BV-BRC AMR Phenotypes Complete catalog of BV-BRC’s AMR Phenotype data (downloaded from the explorer without filters) Downloaded on `Date Added`, contains 352356 entries

BVBRC_genome.csv

Relative Path: data/raw/BVBRC_genome.csv Date Added: <2025-09-04 Thu> Source: BV-BRC Genomes Complete catalog of genomes from BV-BRC (downloaded from the explorer without filters) Downloaded on `Date Added`, contains 22959 samples

asts.tsv

Absolute path: data/raw/asts.tsv Date added: <2025-09-01 Mon> Source: AST browser data The complete catalog of samples stored in NCBI’s AST browser (downloaded without filters) Downloaded on `Date Added`, contains 466453 samples