Skip to content

Commit d14bb43

Browse files
authored
Merge branch 'CDCgov:main' into main
2 parents a631b6c + 2256df4 commit d14bb43

File tree

21 files changed

+73394
-377
lines changed

21 files changed

+73394
-377
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
/src/* @sbidari @dylanhmorris @O957
21
/.github/** @sbidari @dylanhmorris @O957
32
/hub-config/* @sbidari @dylanhmorris @O957
43
/auxiliary_data/weekly-model-submissions/ @sbidari @dylanhmorris @O957
54
/model-output/CovidHub-baseline/ @sbidari @dylanhmorris @O957
65
/model-output/CovidHub-ensemble/ @sbidari @dylanhmorris @O957
6+
/model-metadata/ @sbidari @dylanhmorris @O957
77
README.md @sbidari @dylanhmorris @O957
88
.pre-commit-config.yaml @sbidari @dylanhmorris @O957
99
.gitignore @sbidari @dylanhmorris @O957

.github/workflows/dispatch-ensemble-addition.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,3 +48,4 @@ jobs:
4848
token: ${{ steps.get_token_gov.outputs.token }}
4949
repository: CDCgov/cfa-forecast-hub-reports
5050
event-type: covid-ensemble-added
51+
client-payload: '{"disease":"covid"}'

README.md

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,193 @@ We have made changes from previous versions of the [COVID-19 Forecast Hub](https
9797
Both Hubs will require quantile-based forecasts of epiweekly incident hospital admissions reported into NHSN, with the same -1:3 week horizon span. Both will accept these forecasts via Github pull requests of files formatted according to the standard [hubverse schema](https://hubverse.io/en/latest/user-guide/model-output.html#model-output). The Hubs also plan to share a forecast deadline of 11pm USA/Eastern time on Wednesdays.
9898

9999

100+
101+
102+
## Accessing COVID-19 Data On The Cloud
103+
104+
To ensure greater access to the data created by and submitted to this hub, real-time copies of files in the following directories are hosted on the Hubverse's Amazon Web Services (AWS) infrastructure, in a public S3 bucket: `covid19-forecast-hub`.
105+
106+
- `auxiliary-data`
107+
- `hub-config`
108+
- `model-metadata`
109+
- `model-output`
110+
- `target-data`
111+
112+
GitHub remains the primary interface for operating the COVID-19 Forecast Hub and collecting forecasts from modelers. However, the mirrors of hub files on S3 are the most convenient way to access hub data without using `git`/GitHub or cloning the entire hub to your local machine.
113+
114+
The sections below provide examples for accessing hub data on the cloud, depending on your goals and
115+
preferred tools. The options include:
116+
117+
| Access Method | Description |
118+
| -------------------------- | ------------------------------------------------------------------------------------- |
119+
| hubData (R) | Hubverse R client and R code for accessing hub data. |
120+
| hub-data (Python) | Python package for working with hubverse data |
121+
| AWS command line interface | Download data and use hubData, Pyarrow, or another tool for fast local access. |
122+
123+
In general, accessing the data directly from S3 (instead of downloading it first) is more convenient. However, if performance is critical (for example, you're building an interactive visualization), or if you need to work offline, we recommend downloading the data first.
124+
125+
<details markdown=1>
126+
127+
<summary>hubData (R)</summary>
128+
129+
[hubData](https://hubverse-org.github.io/hubData), the Hubverse R client, can create an interactive session for accessing, filtering, and transforming hub model output data stored in S3.
130+
131+
hubData is a good choice if you:
132+
133+
- already use R for data analysis
134+
- want to interactively explore hub data from the cloud without downloading it
135+
- want to save a subset of the hub's data (*e.g.*, forecasts for a specific date or target) to your local machine
136+
- want to save hub data in a different file format (*e.g.*, `.parquet` to `.csv`)
137+
138+
### Installing hubData
139+
140+
To install `hubData` and its dependencies (including the `dplyr` and `arrow` packages), follow the [instructions in the hubData documentation](https://hubverse-org.github.io/hubData/#installation).
141+
142+
### Using hubData
143+
144+
hubData's [`connect_hub()` function](https://hubverse-org.github.io/hubData/reference/connect_hub.html) returns an [Arrow multi-file dataset](https://arrow.apache.org/docs/r/reference/Dataset.html) that represents a hub's model output data. The dataset can be filtered and transformed using dplyr and then materialized into a local data frame using the [`collect_hub()` function](https://hubverse-org.github.io/hubData/reference/collect_hub.html).
145+
146+
#### Accessing Model Output Data
147+
148+
Use hubData to connect to a hub on S3 and retrieve all model-output files into a local dataframe. (note: depending on the size of the hub, this operation will take a few minutes):
149+
150+
```r
151+
library(dplyr)
152+
library(hubData)
153+
154+
bucket_name <- "covid19-forecast-hub"
155+
hub_bucket <- s3_bucket(bucket_name)
156+
hub_con <- hubData::connect_hub(hub_bucket, file_format = "parquet", skip_checks = TRUE)
157+
model_output <- hub_con %>%
158+
hubData::collect_hub()
159+
```
160+
161+
Use hubData to connect to a hub on S3 and filter model output data before "collecting" it into a local dataframe:
162+
163+
```r
164+
library(dplyr)
165+
library(hubData)
166+
167+
bucket_name <- "covid19-forecast-hub"
168+
hub_bucket <- s3_bucket(bucket_name)
169+
hub_con <- hubData::connect_hub(hub_bucket, file_format = "parquet", skip_checks = TRUE)
170+
hub_con %>%
171+
dplyr::filter(target == "wk inc covid hosp", location == "25", output_type == "quantile") %>%
172+
hubData::collect_hub() %>%
173+
dplyr::select(reference_date, model_id, target_end_date, location, output_type_id, value)
174+
```
175+
176+
- [Full hubData documentation](https://hubverse-org.github.io/hubData/)
177+
178+
</details>
179+
180+
<details markdown=1>
181+
182+
<summary>hub-data (Python)</summary>
183+
184+
The Hubverse team is developing a Python client which provides some initial tools for accessing Hubverse data. The repository is located at <https://github.com/hubverse-org/hub-data>.
185+
186+
187+
### Installing hub-data
188+
189+
Use `pip` to install `hub-data` (the `pypi` package is <https://pypi.org/project/hubdata>):
190+
191+
```sh
192+
pip install hubdata
193+
```
194+
195+
### Using hub-data
196+
197+
Please see the [hub-data package documentation](https://hubverse-org.github.io/hub-data) for examples of how to use the CLI, and the `hubdata.connect_hub()` and `hubdata.create_hub_schema()` functions.
198+
199+
</details>
200+
201+
202+
<details markdown=1>
203+
204+
<summary>AWS CLI</summary>
205+
206+
AWS provides a terminal-based command line interface (CLI) for exploring and downloading S3 files.
207+
208+
This option is ideal if you:
209+
210+
- plan to work with hub data offline but don't want to use git or GitHub
211+
- want to download a subset of the data (instead of the entire hub)
212+
- are using the data for an application that requires local storage or fast response times
213+
214+
### Installing AWS CLI
215+
216+
- Install the AWS CLI using the [instructions here](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
217+
- You can skip the instructions for setting up security credentials, since Hubverse data is public
218+
219+
### Using AWS CLI
220+
221+
When using the AWS CLI, the `--no-sign-request` option is required, since it tells AWS to bypass a credential check
222+
(*i.e.*, `--no-sign-request` allows anonymous access to public S3 data).
223+
224+
> [!NOTE]
225+
>
226+
> Files in the bucket's `raw` directory should not be used for analysis (they're for internal use only).
227+
228+
List all directories in the hub's S3 bucket:
229+
230+
```sh
231+
aws s3 ls covid19-forecast-hub --no-sign-request
232+
```
233+
234+
List all files in the hub's bucket:
235+
236+
```sh
237+
aws s3 ls covid19-forecast-hub --recursive --no-sign-request
238+
```
239+
240+
Download all of target-data contents to your current working directory:
241+
242+
```sh
243+
aws s3 cp s3://covid19-forecast-hub/target-data/ . --recursive --no-sign-request
244+
```
245+
246+
Download the model-output files for a specific model (e.g., the hub baseline):
247+
248+
```sh
249+
aws s3 cp s3://covid19-forecast-hub/model-output/CovidHub-baseline/ . --recursive --no-sign-request
250+
```
251+
252+
- [Full documentation for `aws s3 ls`](https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html)
253+
- [Full documentation for `aws s3 cp`](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html)
254+
255+
</details>
256+
257+
## Using Hub Data In Downstream Products
258+
259+
If you are building a product (e.g., a dashboard, analysis pipeline, or evaluation) downstream of `covid19-forecast-hub` that uses data from this hub, please follow the guidance in this section.
260+
261+
### Prefer Hubverse Tooling Over Direct File Paths
262+
263+
We recommend accessing hub data through official [hubverse](https://hubverse.io) tooling rather than by hard-coding paths into this repository's file tree. The hubverse R and Python packages (e.g., [`hubData`](https://hubverse-org.github.io/hubData/) and [`hub-data`](https://github.com/hubverse-org/hub-data)) provide interfaces to the COVIDHub model output, target data, and model metadata, which all follow the [hubverse schema](https://hubverse.io/en/latest/user-guide/model-output.html#model-output).
264+
265+
### Hubverse schema version
266+
The specific version of the Hubverse schema currently used by the Hub is specified in the Hub's [`admin.json`](hub-config/admin.json) file. We notify users in advance of planned schema version update.
267+
268+
### File Structure And Guarantees
269+
270+
> [!WARNING]
271+
>
272+
> The layout of this repository is **not a stable public API**. Directories, file names, and schemas outside the hubverse-managed paths may change at any time, possibly without formal notice.
273+
274+
Specifically:
275+
276+
- Hubverse-managed directories (`model-output/`, `model-metadata/`, `target-data/`, `hub-config/`) follow the [hubverse schema](https://hubverse.io/en/latest/user-guide/model-output.html#model-output). Changes here are guided by hubverse conventions; we will communicate planned changes in advance.
277+
- `auxiliary-data/` is a catch-all for supporting files (e.g., location tables, raw NSSP snapshots, weekly submission summaries). Files within have no formal schema and no guarantee of consistency across time (e.g. they may be renamed, restructured, or removed). Please do not rely on specific filenames or columns in `auxiliary-data/`.
278+
279+
If you need a file only available through `auxiliary-data/` for a downstream product, please [open an issue](https://github.com/CDCgov/covid19-forecast-hub/issues) with your use case so we can consider making its presence more stable.
280+
281+
### Following Changes
282+
283+
If you maintain a downstream product and want to be notified of planned changes to hub data or structure, please email [covidhub@cdc.gov](mailto:covidhub@cdc.gov) to be added to our announcement list.
284+
100285
## Acknowledgments
286+
101287
This repository follows the guidelines and standards outlined by the [hubverse](https://hubdocs.readthedocs.io/en/latest/), which provides a set of data formats and open source tools for modeling hubs.
102288

103289
<details markdown=1>
-2.22 KB
Binary file not shown.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
model_id,designated_model,target
2+
CADPH-CovidCAT_Ensemble,TRUE,wk inc covid hosp
3+
CEPH-Rtrend_covid,TRUE,wk inc covid hosp
4+
CFA-EpiAutoGP,TRUE,wk inc covid hosp
5+
CFA_Pyrenew-Pyrenew_HE_COVID,TRUE,wk inc covid hosp
6+
CFA_Pyrenew-Pyrenew_H_COVID,FALSE,wk inc covid hosp
7+
CMU-TimeSeries,TRUE,wk inc covid hosp
8+
CovidHub-baseline,FALSE,wk inc covid hosp
9+
Google_SAI-Ensemble,TRUE,wk inc covid hosp
10+
NEU_ISI-AdaptiveEnsemble,TRUE,wk inc covid hosp
11+
OHT_JHU-nbxd,TRUE,wk inc covid hosp
12+
UGA_flucast-INFLAenza,TRUE,wk inc covid hosp
13+
UM-DeepOutbreak,TRUE,wk inc covid hosp
14+
UMass-ar6_pooled,TRUE,wk inc covid hosp
15+
UMass-gbqr,TRUE,wk inc covid hosp
16+
CFA_Pyrenew-Pyrenew_E_COVID,FALSE,wk inc covid prop ed visits
17+
CFA_Pyrenew-Pyrenew_HE_COVID,TRUE,wk inc covid prop ed visits
18+
CMU-TimeSeries,TRUE,wk inc covid prop ed visits
19+
CovidHub-baseline,FALSE,wk inc covid prop ed visits
20+
UGA_flucast-INFLAenza,TRUE,wk inc covid prop ed visits

0 commit comments

Comments
 (0)