Skip to content

chore: update data catalog 2025-06-01 #551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions catalog/build/intermediate/genomes-from-ncbi.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -345,7 +345,7 @@ Berenice 5693 GCA_013358655.1 False Scaffold 40801262 923 156193 61 69 51.0 T
NRRL 25331 48490 GCA_013396185.1 False Scaffold 42547896 1222 96195 134 50 48.5 Fusarium circinatum 48490 1,131567,2759,33154,4751,451864,4890,716545,147538,716546,715989,147550,222543,5125,110618,5506,171627,48490 Ascomycota Eukaryota Fungi Ascomycota Sordariomycetes Hypocreales Nectriaceae Fusarium Fusarium circinatum 2759 4751 4890 147550 5125 110618 5506 48490 https://genome.ucsc.edu/h/GCA_013396185.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/013/396/185/GCA_013396185.1/genes/GCA_013396185.1_ASM1339618v1.ncbiGene.gtf.gz
Klein Grass 34611 GCA_013436015.1 True Contig 2762431811 16339 436999 1685 59 45.5 Rhipicephalus annulatus 34611 1,131567,2759,33154,33208,6072,33213,33317,1206794,88770,6656,6843,6854,6933,6934,6935,297308,6939,426437,34630,6940,34611 Arthropoda Eukaryota Metazoa Arthropoda Arachnida Ixodida Ixodidae Rhipicephalus Rhipicephalus annulatus 2759 33208 6656 6854 6935 6939 34630 34611 https://genome.ucsc.edu/h/GCA_013436015.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/013/436/015/GCA_013436015.1/genes/GCA_013436015.1_TxGen_Rann.augustus.gtf.gz
Fo5176 100902 GCA_014154955.1 False Chromosome 18.0 67983296 19 4090771 7 120 48.0 Fusarium oxysporum 5507 1,131567,2759,33154,4751,451864,4890,716545,147538,716546,715989,147550,222543,5125,110618,5506,171631,5507,100902 Ascomycota Eukaryota Fungi Ascomycota Sordariomycetes Hypocreales Nectriaceae Fusarium Fusarium oxysporum 2759 4751 4890 147550 5125 110618 5506 5507 https://genome.ucsc.edu/h/GCA_014154955.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/014/154/955/GCA_014154955.1/genes/GCA_014154955.1_SMRT_HiC_Fo5176.ncbiGene.gtf.gz
RL4 483707 GCA_014183025.1 True Scaffold 34626204 169 3109663 5 150 55.5 Harringtonia lauricola 483707 1,131567,2759,33154,4751,451864,4890,716545,147538,716546,715989,147550,222544,5151,5152,2933754,483707 Ascomycota Eukaryota Fungi Ascomycota Sordariomycetes Ophiostomatales Ophiostomataceae Harringtonia Harringtonia lauricola 2759 4751 4890 147550 5151 5152 2933754 483707 https://genome.ucsc.edu/h/GCA_014183025.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/014/183/025/GCA_014183025.1/genes/GCA_014183025.1_ASM1418302v1.augustus.gtf.gz
RL4 483707 GCA_014183025.1 False Scaffold 34626204 169 3109663 5 150 55.5 Harringtonia lauricola 483707 1,131567,2759,33154,4751,451864,4890,716545,147538,716546,715989,147550,222544,5151,5152,2933754,483707 Ascomycota Eukaryota Fungi Ascomycota Sordariomycetes Ophiostomatales Ophiostomataceae Harringtonia Harringtonia lauricola 2759 4751 4890 147550 5151 5152 2933754 483707 https://genome.ucsc.edu/h/GCA_014183025.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/014/183/025/GCA_014183025.1/genes/GCA_014183025.1_ASM1418302v1.augustus.gtf.gz
BG2 5478 GCA_014217725.1 False Complete Genome 13.0 12696838 13 1058141 5 40 39.0 Nakaseomyces glabratus 5478 1,131567,2759,33154,4751,451864,4890,716545,147537,4891,4892,4893,374468,5478 Ascomycota Eukaryota Fungi Ascomycota Saccharomycetes Saccharomycetales Saccharomycetaceae Nakaseomyces Nakaseomyces glabratus 2759 4751 4890 4891 4892 4893 374468 5478 https://genome.ucsc.edu/h/GCA_014217725.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/014/217/725/GCA_014217725.1/genes/GCA_014217725.1_ASM1421772v1.ncbiGene.gtf.gz
HN6 176275 GCA_014607475.1 False Complete Genome 12.0 37134214 12 4613348 4 200 49.0 Beauveria bassiana 176275 1,131567,2759,33154,4751,451864,4890,716545,147538,716546,715989,147550,222543,5125,474943,5581,176275 Ascomycota Eukaryota Fungi Ascomycota Sordariomycetes Hypocreales Cordycipitaceae Beauveria Beauveria bassiana 2759 4751 4890 147550 5125 474943 5581 176275 https://genome.ucsc.edu/h/GCA_014607475.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/014/607/475/GCA_014607475.1/genes/GCA_014607475.1_ASM1460747v1.augustus.gtf.gz
164912 GCA_014805555.1 True Scaffold 10381894 1391 27037 106 1250 27.0 Astathelohania contejeani 164912 1,131567,2759,33154,4751,112252,6029,6036,2932329,2932330,164912 Microsporidia Eukaryota Fungi Microsporidia Astathelohaniidae Astathelohania Astathelohania contejeani 2759 4751 6029 2932329 2932330 164912 https://genome.ucsc.edu/h/GCA_014805555.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/014/805/555/GCA_014805555.1/genes/GCA_014805555.1_ASM1480555v1.ncbiGene.gtf.gz
Expand Down Expand Up @@ -478,7 +478,7 @@ Rahman 294381 GCA_917563895.1 False Scaffold 25196438 18523 15672 368 13 30.5
5697 GCA_917563935.1 True Complete Genome 13.0 25432160 13 2439084 4 57 46.5 Trypanosoma evansi 5697 1,131567,2759,2611352,33682,5653,2704647,2704949,5654,5690,39700,5697 Kinetoplastea Eukaryota Euglenozoa Kinetoplastea Trypanosomatida Trypanosomatidae Trypanosoma Trypanosoma evansi 2759 33682 5653 2704949 5654 5690 5697 https://genome.ucsc.edu/h/GCA_917563935.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/917/563/935/GCA_917563935.1/genes/GCA_917563935.1_Assembly1.augustus.gtf.gz
5855 GCA_949152365.1 False Contig 29436530 28 1999332 6 820 39.5 Plasmodium vivax 5855 1,131567,2759,2698737,33630,5794,422676,5819,1639119,5820,418103,5855 Apicomplexa Eukaryota Apicomplexa Aconoidasida Haemosporida Plasmodiidae Plasmodium Plasmodium vivax 2759 5794 422676 5819 1639119 5820 5855 https://genome.ucsc.edu/h/GCA_949152365.1 https://hgdownload.soe.ucsc.edu/hubs/GCA/949/152/365/GCA_949152365.1/genes/GCA_949152365.1_PVPAM.ncbiGene.gtf.gz
7227 GCF_000001215.4 True Chromosome 7.0 143706478 1869 25286936 3 42.0 Full annotation GCA_000001215.4 Drosophila melanogaster 7227 1,131567,2759,33154,33208,6072,33213,33317,1206794,88770,6656,197563,197562,6960,50557,85512,7496,33340,33392,7147,7203,43733,480118,480117,43738,43741,43746,7214,43845,46877,7215,32341,32346,32351,7227 Arthropoda Eukaryota Metazoa Arthropoda Insecta Diptera Drosophilidae Drosophila Drosophila melanogaster 2759 33208 6656 50557 7147 7214 7215 7227 https://genome.ucsc.edu/h/GCF_000001215.4 https://hgdownload.soe.ucsc.edu/hubs/GCF/000/001/215/GCF_000001215.4/genes/GCF_000001215.4_Release_6_plus_ISO1_MT.ncbiRefSeq.gtf.gz
9606 GCF_000001405.40 True Chromosome 24.0 3099441038 470 67794873 16 41.0 Updated annotation GCA_000001405.29 Homo sapiens 9606 1,131567,2759,33154,33208,6072,33213,33511,7711,89593,7742,7776,117570,117571,8287,1338369,32523,32524,40674,32525,9347,1437010,314146,9443,376913,314293,9526,314295,9604,207598,9605,9606 Vertebrata Eukaryota Metazoa Chordata Mammalia Primates Hominidae Homo Homo sapiens 2759 33208 7711 40674 9443 9604 9605 9606 https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38 https://hgdownload.soe.ucsc.edu/hubs/GCF/000/001/405/GCF_000001405.40/genes/GCF_000001405.40_GRCh38.p14.ncbiRefSeq.gtf.gz
9606 GCF_000001405.40 True Chromosome 24.0 3099441038 470 67794873 16 41.0 Updated annotation GCA_000001405.29 Homo sapiens 9606 1,131567,2759,33154,33208,6072,33213,33511,7711,89593,7742,7776,117570,117571,8287,1338369,32523,32524,40674,32525,9347,1437010,314146,9443,376913,314293,9526,314295,9604,207598,9605,9606 Vertebrata Eukaryota Metazoa Chordata Mammalia Primates Hominidae Homo Homo sapiens 2759 33208 7711 40674 9443 9604 9605 9606 https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38
C57BL/6J 10090 GCF_000001635.27 True Chromosome 21.0 2728206152 101 106145001 11 42.0 Updated annotation GCA_000001635.9 Mus musculus 10090 1,131567,2759,33154,33208,6072,33213,33511,7711,89593,7742,7776,117570,117571,8287,1338369,32523,32524,40674,32525,9347,1437010,314146,314147,9989,1963758,337687,10066,39107,10088,862507,10090 Vertebrata Eukaryota Metazoa Chordata Mammalia Rodentia Muridae Mus Mus musculus 2759 33208 7711 40674 9989 10066 10088 10090 mouse https://genome.ucsc.edu/cgi-bin/hgTracks?db=mm39 https://hgdownload.soe.ucsc.edu/hubs/GCF/000/001/635/GCF_000001635.27/genes/GCF_000001635.27_GRCm39.ncbiRefSeq.gtf.gz
ATCC 18224 441960 GCF_000001985.1 False Scaffold 28643865 452 3339384 4 8.8 46.5 Full annotation GCA_000001985.1 Talaromyces marneffei 37727 1,131567,2759,33154,4751,451864,4890,716545,147538,716546,147545,451871,5042,28568,5094,2752537,37727,441960 Ascomycota Eukaryota Fungi Ascomycota Eurotiomycetes Eurotiales Trichocomaceae Talaromyces Talaromyces marneffei Talaromyces marneffei ATCC 18224 2759 4751 4890 147545 5042 28568 5094 37727 441960 https://genome.ucsc.edu/h/GCF_000001985.1 https://hgdownload.soe.ucsc.edu/hubs/GCF/000/001/985/GCF_000001985.1/genes/GCF_000001985.1_JCVI-PMFA1-2.0.ncbiRefSeq.gtf.gz
Salvador I 5855 GCF_000002415.2 True Chromosome 14.0 27007701 2747 1678596 6 42.5 Full annotation GCA_000002415.2 Plasmodium vivax 5855 1,131567,2759,2698737,33630,5794,422676,5819,1639119,5820,418103,5855 Apicomplexa Eukaryota Apicomplexa Aconoidasida Haemosporida Plasmodiidae Plasmodium Plasmodium vivax 2759 5794 422676 5819 1639119 5820 5855 https://genome.ucsc.edu/h/GCF_000002415.2 https://hgdownload.soe.ucsc.edu/hubs/GCF/000/002/415/GCF_000002415.2/genes/GCF_000002415.2_ASM241v2.ncbiRefSeq.gtf.gz
Expand Down
34 changes: 17 additions & 17 deletions catalog/build/intermediate/outbreak-taxonomy-mapping.tsv
Original file line number Diff line number Diff line change
@@ -1,25 +1,25 @@
taxonomy_id name rank
11018 Togaviridae FAMILY
199306 Coccidioides posadasii SPECIES
12058 Picornaviridae FAMILY
11158 Paramyxoviridae FAMILY
38574 Leishmania donovani species complex SPECIES_GROUP
1773 Mycobacterium tuberculosis SPECIES
11266 Filoviridae FAMILY
5833 Plasmodium falciparum SPECIES
1980418 Phenuiviridae FAMILY
3418604 Betacoronavirus pandemicum SPECIES
498019 Candidozyma auris SPECIES
1980413 Hantaviridae FAMILY
11050 Flaviviridae FAMILY
11266 Filoviridae FAMILY
5037 Histoplasma capsulatum SPECIES
199306 Coccidioides posadasii SPECIES
1980416 Peribunyaviridae FAMILY
11018 Togaviridae FAMILY
5763 Naegleria fowleri SPECIES
5052 Aspergillus GENUS
1980415 Nairoviridae FAMILY
11320 Influenza A virus
10244 Monkeypox virus SPECIES
1980413 Hantaviridae FAMILY
5207 Cryptococcus neoformans SPECIES
5807 Cryptosporidium parvum SPECIES
11320 Influenza A virus
5052 Aspergillus GENUS
1980416 Peribunyaviridae FAMILY
11050 Flaviviridae FAMILY
1980415 Nairoviridae FAMILY
38574 Leishmania donovani species complex SPECIES_GROUP
1980418 Phenuiviridae FAMILY
11158 Paramyxoviridae FAMILY
1773 Mycobacterium tuberculosis SPECIES
12058 Picornaviridae FAMILY
5833 Plasmodium falciparum SPECIES
4827 Mucorales ORDER
498019 Candidozyma auris SPECIES
11617 Arenaviridae FAMILY
5037 Histoplasma capsulatum SPECIES
26 changes: 13 additions & 13 deletions catalog/output/qc-report.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,18 @@ None
- Acanthamoeba castellanii strain Neff: 1257118, 5755
- Blumeria graminis: 1689686, 62690
- Candida tropicalis strain MYA-3404: 5482, 294747
- Cryptococcus neoformans strain H99: 235443, 5207
- Cryptosporidium parvum: 5807, 353152
- Enterovirus A: 150846, 156647
- Glossina fuscipes: 201502, 7396
- Cryptococcus neoformans strain H99: 5207, 235443
- Cryptosporidium parvum: 353152, 5807
- Enterovirus A: 156647, 150846
- Glossina fuscipes: 7396, 201502
- Neospora caninum strain Liverpool: 572307, 29176
- Norwalk virus: 122929, 1529924, 1246677, 490039, 122928, 1529909, 1529918
- Orthoflavivirus denguei: 11069, 11053, 11070
- Norwalk virus: 1246677, 122928, 122929, 1529918, 1529909, 1529924, 490039
- Orthoflavivirus denguei: 11070, 11053, 11069
- Orthomarburgvirus marburgense: 3052505, 448086
- Plasmodium falciparum: 5833, 36329
- Plasmodium vinckei: 138298, 5860, 119398, 54757, 138297
- Trypanosoma brucei: 185431, 5702
- Trypanosoma cruzi strain Dm28c: 5693, 1416333, 85057
- Plasmodium vinckei: 138297, 119398, 138298, 54757, 5860
- Trypanosoma brucei: 5702, 185431
- Trypanosoma cruzi strain Dm28c: 85057, 5693, 1416333
- Vesicular exanthema of swine virus: 35612, 146073

## Assemblies without ploidy information
Expand All @@ -44,13 +44,13 @@ None

## Outbreak descendant taxonomy IDs not found in genomes data

- 1980456
- 3052686
- 3052560
- 138949
- 463676
- 3052599
- 3052518
- 3052686
- 1980456
- 463676
- 3052560

## Taxonomy tree

Expand Down