Skip to content

Commit e28ea59

Browse files
committed
NASA 300
1 parent 9ed981f commit e28ea59

230 files changed

Lines changed: 66378 additions & 10 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

datasets/ilmn-dragen-1kgp.yaml

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,44 @@
11
Name: "1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5, 3.7, 4.0, 4.2, and 4.4"
22
Description: |
3-
# Description
3+
<b> Overview </b><p>
44
5-
## Overivew
6-
7-
This dataset contains alignment files and small variant (includes single nucleotide variants (SNV) and indels), copy number variant (CNV), short tandem repeat (*i.e.*, repeat expansion; STR), structural variant (SV) and other variant call files from the [1000 Genomes Project (1KGP) Phase 3 dataset](https://www.internationalgenome.org/) (3,202 individuals, 602 trios) using Illumina DRAGEN v3.5.7b, v3.7.6, v4.0.3, v4.2.7, and v4.4.7 software.
5+
This dataset contains alignment files and small variant (includes single nucleotide variants (SNV) and indels), copy number variant (CNV), short tandem repeat (i.e., repeat expansion; STR), structural variant (SV) and other variant call files from the [1000 Genomes Project (1KGP) Phase 3 dataset](https://www.internationalgenome.org/) (3,202 individuals, 602 trios) using Illumina DRAGEN v3.5.7b, v3.7.6, v4.0.3, v4.2.7, and v4.4.7 software.
86
All DRAGEN analyses were performed in the cloud using the [Illumina Connected Analytics](https://www.illumina.com/products/by-type/informatics-products/connected-analytics.html) bioinformatics platform powered by Amazon Web Services (see ['Data solution empowering population genomics'](https://www.illumina.com/science/genomics-research/articles/data-solution-empowering-population-genomics-research.html) for more information).
97
The v3.7.6, v4.2.7, and v4.4.7 datasets include results from trio small variant, *de novo* structural variant, and *de novo* copy number variant calls on 602 trio families comprised of members from the 1KGP Phase 3 dataset.
108
Trio repeat expansion calling was included in the v3.7.6 dataset only.
119
Joint cohort analysis was also performed on the entire 1KGP sample dataset for the v3.7.6, v4.0.3, v4.2.7, and v4.4.7 re-analyses using [DRAGEN Iterative gVCF Genotyper](https://www.illumina.com/products/by-type/informatics-products/dragen-secondary-analysis/iterative-GVCF-genotyper.html) v3.8.3, v4.2.0, v4.2.7, v4.4.7, respectively (see ['Genotyping variants at population scale using DRAGEN gVCF Genotyper'](https://www.illumina.com/science/genomics-research/articles/gVCF-Genotyper.html) and ['Population Genotyping'](https://help.dragen.illumina.com/product-guide/dragen-v4.4/dragen-dna-pipeline/iterative-gvcf-genotyper)).
1210
13-
## DRAGEN Versions
11+
<b> DRAGEN Versions </b><p>
1412
15-
### v3.7
13+
##### v3.7
1614
1715
[User Guide](https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/dragen-bio-it/Illumina-DRAGEN-Bio-IT-Platform-User-Guide-1000000141465-00.pdf) | [Release Notes](https://www.illumina.com/content/dam/illumina-support/documents/downloads/software/dragen/release-notes/Illumina-DRAGEN-Bio-IT-Platform-3.7-Release-Notes-1000000142362-v00.pdf)
1816
1917
Improvements and new features in the v3.7.6 individual samples analyses include *CYP2D6* variant calling (see '[Overcoming high homology to detect variation in CYP21A2 with whole-genome sequencing in DRAGEN](https://www.illumina.com/science/genomics-research/articles/CYP21A2.html)') and joint detection and use of graph-based hg19 and hg38 reference hash tables (see ['DRAGEN Wins at PrecisionFDA Truth Challenge V2 Showcase Accuracy Gains from Alt-aware Mapping and Graph Reference Genomes'](https://www.illumina.com/science/genomics-research/dragen-wins-precisionfda-challenge-showcase-accuracy-gains.html) and ['Demystifying the versions of GRCh38/hg38 reference genomes, how they are used in DRAGEN and their impact on accuracy'](https://www.illumina.com/science/genomics-research/articles/dragen-demystifying-reference-genomes.html) for details).
2018
21-
### v4.0
19+
##### v4.0
2220
2321
[User Guide](https://support-docs.illumina.com/SW/DRAGEN_v40/Content/SW/FrontPages/DRAGEN.htm) | [Release Notes](https://support.illumina.com/content/dam/illumina-support/documents/downloads/software/dragen/release-notes/200024449_01_DRAGEN-4.0-Customer-Release-Notes.pdf)
2422
2523
The DRAGEN v4.0.3 dataset features improved small variant calling accuracy due to utilization of a newly integrated [machine learning functionality](https://support-docs.illumina.com/SW/dragen_v42/Content/SW/DRAGEN/ml_for_vc.htm?Highlight=dragen-ml) with an updated graph based reference for difficult to map regions (see ['DRAGEN Sets New Standard for Data Accuracy in PrecisionFDA Benchmark Data. Optimizing Variant Calling Performance with Illumina Machine Learning and DRAGEN Graph'](https://www.illumina.com/science/genomics-research/articles/dragen-shines-again-precisionfda-truth-challenge-v2.html)); accuracy and runtime improvements in the SV caller; new targeted callers including *CYP2B6*, *GBA*, *SMN* and a Star Allele PGx caller; and an expanded catalog for use with Expansion Hunter STR caller.
2624
27-
### v4.2
25+
##### v4.2
2826
2927
[User Guide](https://support-docs.illumina.com/SW/dragen_v42/Content/SW/FrontPages/DRAGEN.htm) | [Release Notes](https://support.illumina.com/content/dam/illumina-support/documents/downloads/software/dragen/release-notes/200040845_02_DRAGEN-4.2-Customer-Release-Notes.pdf)
3028
3129
DRAGEN v4.2.7 offers significant accuracy improvements in small variant, CNV, and SV calling, includes new targeted callers (*HBA*, *LPA*, *RH*, *CYP21A2*, *SMN* silent carrier variant), and supports Star Allele calling for five additional pharmacogenes (*BCHE*, *ABCG2*, *NAT2*, *F5*, and *UGT2B17*).
3230
These are further improved by upgraded machine learning models.
3331
See [DRAGEN 4.2: Enhanced machine learning, new targeted callers, and more](https://developer.illumina.com/news-updates/dragen-4-2-enhanced-machine-learning-new-targeted-callers-and-more) for further details on these and other enchancements.
3432
35-
### v4.4
33+
##### v4.4
3634
3735
[User Guide](https://help.dragen.illumina.com/product-guide/dragen-v4.4) | [Release Notes](https://www.illumina.com/content/dam/illumina-support/documents/downloads/software/dragen/release-notes/200068065_00_DRAGEN-4_4_4-Customer-Release-Notes.pdf)
3836
3937
DRAGEN v4.4.7 boosts the speed and accuracy of all callers via the official release of an optimized pangenome graph reference ('[The quest for accuracy gains in the dark regions of the genomes: Presenting the DRAGEN multigenome mapper and pangenome reference updates in version 4.3](https://www.illumina.com/science/genomics-research/articles/second-gen-multigenome-mapping.html)').
4038
Namely, SV calling accuracy is substantially increased via the implementation of a multigenome mapper capable of exploiting the power of a pangenome reference.
4139
Runtime is further reduced by supporting AWS F2 EC2 instances ([Enabling Rapid Genomic and Multiomic Data Analysis with Illumina DRAGEN™ v4.4 on Amazon EC2 F2 Instances](https://aws.amazon.com/blogs/hpc/enabling-rapid-genomic-and-multiomic-data-analysis-with-illumina-dragen-v4-4-on-amazon-ec2-f2-instances/))
4240
43-
## Annotation
41+
<b> Annotation </b><p>
4442
4543
Starting with the v4.0.3 reanalysis, annotation using the Illumina Connected Annotations (also known as Illumina Annotation Engine or Nirvana) was included as part of the analysis (see [Illumina Connected Annotations documentation](https://help.dragen.illumina.com/product-guide/dragen-v4.4/nirvana) for more information).
4644
For the v4.0.3, v4.2.7, and v4.4.7 datasets, annotation was performed on the merged small variant VCF generated by the DRAGEN Iterative gVCF Genotyper for the entire 1KGP cohort.

datasets/nasa-1993-an-nasa.yaml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
Name: NASA 1993_AN_NASA Project
2+
Description: |
3+
This data set contains spot elevation measurements of Arctic, Greenland, Antarctic, and Patagonia sea ice and ice surface acquired using the NASA Airborne Topographic Mapper (ATM) instrumentation.
4+
<br>
5+
6+
#### BLATM2
7+
This data set contains resampled and smoothed elevation measurements of Arctic and Antarctic sea ice, as well as Greenland, Arctic, Patagonia, and Antarctic region land ice surface acquired using the NASA Airborne Topographic Mapper (ATM) instrumentation.
8+
<br>
9+
10+
#### Data Discovery
11+
Explore this data using NASA's [Earthdata Search](https://search.earthdata.nasa.gov/), a comprehensive tool for discovering and visualizing Earth science datasets.
12+
<br>
13+
14+
#### Data Access
15+
Access requires an [Earthdata Login](https://urs.earthdata.nasa.gov/) account. [Read our guide on obtaining AWS credentials](https://data.nsidc.earthdatacloud.nasa.gov/s3credentialsREADME) to retrieve this data from AWS.
16+
<br><br>
17+
Documentation: https://nsidc.org/data/blatm1b/versions/1
18+
Contact: https://earthdata.nasa.gov/contact
19+
ManagedBy: NASA
20+
UpdateFrequency: Varies by dataset
21+
Tags:
22+
- aws-pds
23+
- elevation
24+
- ice
25+
License: '[Creative Commons BY 4.0](https://creativecommons.org/licenses/by/4.0/)'
26+
Resources:
27+
- Description: This data set contains spot elevation measurements of Arctic, Greenland, Antarctic, and Patagonia sea ice and ice surface acquired using the NASA Airborne Topographic Mapper (ATM) instrumentation.
28+
ARN: arn:aws:s3:::nsidc-cumulus-prod-protected/ICEBRIDGE-Related/BLATM1B/1
29+
Region: us-west-2
30+
Type: S3 Bucket
31+
RequesterPays: false
32+
ControlledAccess: https://data.nsidc.earthdatacloud.nasa.gov/s3credentials
33+
Name: BLATM1B v1
34+
- Description: This data set contains resampled and smoothed elevation measurements of Arctic and Antarctic sea ice, as well as Greenland, Arctic, Patagonia, and Antarctic region land ice surface acquired using the NASA Airborne Topographic Mapper (ATM) instrumentation.
35+
ARN: arn:aws:s3:::nsidc-cumulus-prod-protected/ICEBRIDGE-Related/BLATM2/1
36+
Region: us-west-2
37+
Type: S3 Bucket
38+
RequesterPays: false
39+
ControlledAccess: https://data.nsidc.earthdatacloud.nasa.gov/s3credentials
40+
Name: BLATM2 v1

datasets/nasa-1993-gr-nasa.yaml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
Name: NASA 1993_GR_NASA Project
2+
Description: |
3+
This data set contains depth sounder measurements of ice elevation, ice surface, ice bottom, and ice thickness over Greenland and Antarctica, acquired by the Multichannel Coherent Radar Depth Sounder (MCoRDS).
4+
<br>
5+
6+
#### Data Discovery
7+
Explore this data using NASA's [Earthdata Search](https://search.earthdata.nasa.gov/), a comprehensive tool for discovering and visualizing Earth science datasets.
8+
<br>
9+
10+
#### Data Access
11+
Access requires an [Earthdata Login](https://urs.earthdata.nasa.gov/) account. [Read our guide on obtaining AWS credentials](https://data.nsidc.earthdatacloud.nasa.gov/s3credentialsREADME) to retrieve this data from AWS.
12+
<br><br>
13+
Documentation: https://nsidc.org/data/brmcr2/versions/1
14+
Contact: https://earthdata.nasa.gov/contact
15+
ManagedBy: NASA
16+
UpdateFrequency: Varies by dataset
17+
Tags:
18+
- aws-pds
19+
- elevation
20+
- ice
21+
- radar
22+
License: '[Creative Commons BY 4.0](https://creativecommons.org/licenses/by/4.0/)'
23+
Resources:
24+
- Description: This data set contains depth sounder measurements of ice elevation, ice surface, ice bottom, and ice thickness over Greenland and Antarctica, acquired by the Multichannel Coherent Radar Depth Sounder (MCoRDS).
25+
ARN: arn:aws:s3:::nsidc-cumulus-prod-protected/ICEBRIDGE-Related/BRMCR2/1
26+
Region: us-west-2
27+
Type: S3 Bucket
28+
RequesterPays: false
29+
ControlledAccess: https://data.nsidc.earthdatacloud.nasa.gov/s3credentials
30+
Name: BRMCR2 v1

datasets/nasa-2007-gr-nasa.yaml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
Name: NASA 2007_GR_NASA Project
2+
Description: |
3+
This data set contains surface elevation data over Greenland measured by the NASA Land, Vegetation, and Ice Sensor (LVIS), an airborne lidar scanning laser altimeter.
4+
<br>
5+
6+
#### Data Discovery
7+
Explore this data using NASA's [Earthdata Search](https://search.earthdata.nasa.gov/), a comprehensive tool for discovering and visualizing Earth science datasets.
8+
<br>
9+
10+
#### Data Access
11+
Access requires an [Earthdata Login](https://urs.earthdata.nasa.gov/) account. [Read our guide on obtaining AWS credentials](https://data.nsidc.earthdatacloud.nasa.gov/s3credentialsREADME) to retrieve this data from AWS.
12+
<br><br>
13+
Documentation: https://nsidc.org/data/blvis2/versions/1
14+
Contact: https://earthdata.nasa.gov/contact
15+
ManagedBy: NASA
16+
UpdateFrequency: Varies by dataset
17+
Tags:
18+
- aws-pds
19+
- elevation
20+
- ice
21+
- lidar
22+
License: '[Creative Commons BY 4.0](https://creativecommons.org/licenses/by/4.0/)'
23+
Resources:
24+
- Description: This data set contains surface elevation data over Greenland measured by the NASA Land, Vegetation, and Ice Sensor (LVIS), an airborne lidar scanning laser altimeter.
25+
ARN: arn:aws:s3:::nsidc-cumulus-prod-protected/LVIS/BLVIS2/1
26+
Region: us-west-2
27+
Type: S3 Bucket
28+
RequesterPays: false
29+
ControlledAccess: https://data.nsidc.earthdatacloud.nasa.gov/s3credentials
30+
Name: BLVIS2 v1

0 commit comments

Comments
 (0)