Relative Path: data/meta/bakta_frequent_features_5000.txt
Date Added: <2026-02-24 Tue>
Source: Custom
Top 5000 most frequent bakta features (Product, Gene) across all 5000 batches of the AST browser genomes
Produced with the find_common_features.py script
Relative Path: data/meta/reference_genomes.tsv
Date Added: <2025-12-04 Thu>
Source: Custom
TSV file of reference genomes to use the TaxIDs present in the AST browser samples. For each TaxID, metadata for all available NCBI genomes were downloaded and a score for each assembly was computed based on the assembly level and ANI matching status. Up to three of the top-scoring assemblies were taken to be the reference genome for that TaxID.
Atypical assemblies were filtered out.
See full details in reports/view_dataset.py
Relative Path: data/meta/ncbi_datasets_lookup.tsv
Date Added: <2025-12-04 Thu>
Source: Custom
TSV file containing genome lookups for the TaxIDs of the AST browser samples.
Produced by find_reference_genomes.py using calls to the NCBI datasets tool
Relative Path: data/meta/ncbi_datasets_bioprojects.tsv
Date Added: <2025-12-04 Thu>
Source: Custom
TSV file containing genome lookups for all bioprojects in the available AST browser sammples (biosample_mapping_2025-11-21.csv).
Produced by find_reference_genomes.py using calls to the NCBI datasets tool
Relative Path: data/meta/biosample_attributes.tsv
Date Added: <2025-11-26 Wed>
Source: Custom — scripts/format_ncbi_metadata.R
Attributes for the biosample accessions in biosample_mapping_2025-11-21.csv, obtained by parsing output of efetch using get_biosample_attrs in src/utils.R. Required because some of these metadata are not provided in the AST browser or in the mapping file
Relative Path: data/meta/biosample_mapping_2025-11-21.csv
Date Added: <2025-11-21 Wed>
Source: Custom — src/map_biosample.R
File mapping BioSample accessions in AST browser to SRA runs and BioProject accessions, produced with calls to esearch and efetch.
Required because the downloadable table from AST browser doesn’t include SRA runs
Relative Path: data/meta/ADB_all_compounds.csv
Date Added: <2025-11-26 Wed>
Source: AntibioticDB
All compounds in the AntibioticDB database, downloaded [2025-11-26 Wed]
Relative Path: data/meta/moradigaravand_samples.tsv
Date Added: <2025-10-24 Fri>
Source: S1 Table
Isolate metadata from the 2018 study Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data - PMC by Morardigaravand et. al. Tested concentrations using EUCAST
- Acronyms
- penicillin: ampicillin (AMP, C.B. (clinical breakpoint): 6μg/ml)
- cephalosporins: cefuroxime (CXM, C.B.: 8μg/ml)
- cefotaxime (CTX, C.B.: 4μg/ml)
- cephalothin (CET, C.B.: 20μg/ml)
- ceftazidime (CTZ, C.B.: 0.25μg/ml))
- aminoglycosides (gentamicin (GEN, C.B.: 4μg/ml)
- tobramycin (TBM, C.B.: 8μg/ml))
- fluoroquinolones (ciprofloxacin (CIP, C.B.: 1μg/ml))
- amoxicillin-clavulanate (AMC)
- amoxicillin (AMX)
- trimethoprim (TMP)
Relative Path: data/meta/jia_sample_mapping.tsv
Date Added: <2025-09-22 Mon>
Source: Custom
TSV file mapping Jia sample Assembly accessions to their BioSample accessions. For renaming
Relative Path: data/meta/jia_samples.tsv
Date Added: <2025-09-22 Mon>
Source: Paper supplemental data
Sample metadata from Jia et al., extracted from the supplemental data word document. Tested concentrations using EUCAST
- Acronyms
- imipenem (IPM)
- meropenem (MEM)
- gentamicin (GEN)
- amikacin (AMK)
- cefepime (FEP)
- ceftriaxone (CRO)
- ceftazidime (CAZ)
- minocycline (MIN)
- tigecycline (TGC)
- colistin (CST)
Relative Path: data/raw/BVBRC_genome_amr.csv
Date Added: <2025-09-04 Thu>
Source: BV-BRC AMR Phenotypes
Complete catalog of BV-BRC’s AMR Phenotype data (downloaded from the explorer without filters)
Downloaded on `Date Added`, contains 352356 entries
Relative Path: data/raw/BVBRC_genome.csv
Date Added: <2025-09-04 Thu>
Source: BV-BRC Genomes
Complete catalog of genomes from BV-BRC (downloaded from the explorer without filters)
Downloaded on `Date Added`, contains 22959 samples
Absolute path: data/raw/asts.tsv
Date added: <2025-09-01 Mon>
Source: AST browser data
The complete catalog of samples stored in NCBI’s AST browser (downloaded without filters)
Downloaded on `Date Added`, contains 466453 samples