Skip to content

Submitting shape-based ensemble#23

Merged
LucieContamin merged 8 commits intomidas-network:mainfrom
scc-usc:main
Jun 24, 2025
Merged

Submitting shape-based ensemble#23
LucieContamin merged 8 commits intomidas-network:mainfrom
scc-usc:main

Conversation

@scc-usc
Copy link
Copy Markdown
Contributor

@scc-usc scc-usc commented Jun 5, 2025

Hopefully, we can bypass some checks, as this is an ensemble of all the submitted models, with quantiles only (no trajectories).

@LucieContamin
Copy link
Copy Markdown
Contributor

Good afternoon,

Thank you for the submission. I will work on a validation version on my local and by-pass the automatic validation.

One of the issue here is the metadata file. In the new format, the metadata file should be in the model-metadata folder in yaml format. I cannot merge the PR without it.

Also, the files and folder names DTWS_Ensemble do not match expected pattern of [team_abbr]-[model_abbr]. It should be 2025-04-27-DTWS-Ensemble for example.

Please find below, the result of the local validation. To pass the validation, it's also necessary to remove the stochastic_run and run_grouping column (columns used for samples only).

  • the red cross error need to be fixed
  • a red ! is a warning: either fix the issue or please confirm the warning is expected and should be ignore

Please let me know if any issues or questions,
Best,
Lucie

Run validation on files: 2025-04-27-DTWS-Ensemble.gz.parquet

✅: [valid_round_id_col]: round_id_col name is valid.

✅: [unique_round_id]: round_id column "origin_date" contains a single, unique round ID value.

✅: [match_round_id]: All round_id_col "origin_date" values match submission round_id from file name.

✅: [colnames]: Column names are consistent with expected round task IDs and std column names.

❗: [col_types]: Column data types do not match hub schema.
output_type_id should be "double" not "character".
❌: [valid_vals]: tbl contains invalid values/value combinations. For example: Column location contains invalid
value "us".

✅: [rows_unique]: All combinations of task ID column/output_type/output_type_id values are unique.

✅: [req_vals]: Task ID/output type/output_type_id combinations all present.

✅: [value_col_valid]: Values in column value all valid with respect to modeling task config.

❗: [value_col_non_desc]: Quantile or cdf value values do not all increase when ordered by output_type_id.
See error_tbl attribute for details.
ℹ: [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_taskid_set check.
ℹ: [spl_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_tid check.
ℹ: [spl_non_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_non_compound_tid check.
ℹ: [spl_n]: No v3 samples found in model output data to check. Skipping check_tbl_spl_n check.
✅: [na_value]: value does not contain NA value.

❗: [flat_projection]: Some projections have a unique value for the whole projection period.
Please verify, for example: 2025-04-27, E-2025-04-01, inc hosp, 02, 0-130, quantile, 0.01; 2025-04-27, B-2025-04-01, inc hosp, 02, 0-130, quantile, 0.05; 2025-04-27, D-2025-04-01, inc hosp, 02, 0-130, quantile, 0.025; 2025-04-27, D-2025-04-01, inc hosp, 02, 0-130, quantile, 0.05; 2025-04-27, A-2025-04-01, inc hosp, 31, 0-64, quantile, 0.025
✅: [cumul_proj]: The cumulative values are not decreasing.

@LucieContamin LucieContamin self-requested a review June 10, 2025 19:28
@github-actions
Copy link
Copy Markdown

Model Output

--- Ensemble_DTW/2025-04-27-Ensemble_DTWS.gz.parquet ---

✅: [file_exists]: File exists at path 'model-output/Ensemble_DTW/2025-04-27-Ensemble_DTWS.gz.parquet'.

❌: [file_name]: File name "2025-04-27-Ensemble_DTWS.gz.parquet" must be valid.
Could not correctly parse submission metadata.


Run validation on files: 2025-04-27-Ensemble_DTWS.gz.parquet

✅: [col_names]: Column names is consistent with expected round task IDs and std column names.
'' should be present in the file.

@LucieContamin
Copy link
Copy Markdown
Contributor

LucieContamin commented Jun 13, 2025

Good morning,

Thanks for the update. The validation is not working properly because of 2 main issues:

  • a metadata file needs to be submitted too (in the model-metadata folder).
  • the folder / file name does not match the require format: [round_id]-[team_abbr]-[model_abbr] or [team_abbr]-[model_abbr] for the folders. Even if the file is an ensemble, I will need the associated team information.

In the meantime, I updated the validation to ignore "Ensemble" files and ran the validation locally, please find the results below. For the next update, feel free to update your fork repository to have the last version of the validation.

Please let me know if any issues or questions,
Lucie

✅: [valid_round_id_col]: round_id_col name is valid.

✅: [unique_round_id]: round_id column "origin_date" contains a single, unique round ID value.

✅: [match_round_id]: All round_id_col "origin_date" values match submission round_id from file name.

✅: [colnames]: Column names are consistent with expected round task IDs and std column names.

✅: [col_types]: Column data types match hub schema.

✅: [valid_vals]: tbl contains valid values/value combinations.

✅: [rows_unique]: All combinations of task ID column/output_type/output_type_id values are unique.

✅: [req_vals]: Task ID/output type/output_type_id combinations all present.

✅: [value_col_valid]: Values in column value all valid with respect to modeling task config.

✅: [value_col_non_desc]: Quantile or cdf value values increase when ordered by output_type_id.

ℹ: [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_taskid_set check.
ℹ: [spl_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_tid check.
ℹ: [spl_non_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_non_compound_tid check.
ℹ: [spl_n]: No v3 samples found in model output data to check. Skipping check_tbl_spl_n check.
✅: [na_value]: value does not contain NA value.

❗: [flat_projection]: Some projections have a unique value for the whole projection period.
Please verify, for example: 2025-04-27, D-2025-04-01, inc hosp, 50, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.025; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.05; 2025-04-27, A-2025-04-01, inc hosp, 02, 0-130, quantile, 0.01
✅: [cumul_proj]: The cumulative values are not decreasing.

@github-actions
Copy link
Copy Markdown

Model Metadata

--- LEMMA-EnsembleDTWS.yaml ---

✅: [metadata_schema_exists]: File exists at path 'hub-config/model-metadata-schema.json'.

✅: [metadata_file_exists]: File exists at path 'model-metadata/LEMMA-EnsembleDTWS.yaml'.

✅: [metadata_file_ext]: Metadata file extension is "yml" or "yaml".

✅: [metadata_file_location]: Metadata file directory name matches "model-metadata".

✅: [metadata_matches_schema]: Metadata file contents are consistent with schema specifications.

✅: [metadata_file_name]: Metadata file name matches the model_id specified within the metadata file.

@github-actions
Copy link
Copy Markdown

Model Output

--- LEMMA-EnsembleDTWS/2025-04-27-LEMMA-EnsembleDTWS.gz.parquet ---

✅: [file_exists]: File exists at path 'model-output/LEMMA-EnsembleDTWS/2025-04-27-LEMMA-EnsembleDTWS.gz.parquet'.

✅: [file_name]: File name "2025-04-27-LEMMA-EnsembleDTWS.gz.parquet" is valid.

✅: [file_location]: File directory name matches model_id
metadata in file name.

✅: [round_id_valid]: round_id is valid.

✅: [file_format]: File is accepted hub format.

✅: [file_n]: Number of accepted model output files per round met.

✅: [metadata_exists]: Metadata file exists at path 'model-metadata/LEMMA-EnsembleDTWS.yaml'.


Run validation on files: 2025-04-27-LEMMA-EnsembleDTWS.gz.parquet

✅: [col_names]: Column names is consistent with expected round task IDs and std column names.
'' should be present in the file.

Copy link
Copy Markdown
Contributor

@LucieContamin LucieContamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run validation on files: 2025-04-27-LEMMA-EnsembleDTWS.gz.parquet

✅: [valid_round_id_col]: round_id_col name is valid.

✅: [unique_round_id]: round_id column "origin_date" contains a single, unique round ID value.

✅: [match_round_id]: All round_id_col "origin_date" values match submission round_id from file name.

✅: [colnames]: Column names are consistent with expected round task IDs and std column names.

✅: [col_types]: Column data types match hub schema.

✅: [valid_vals]: tbl contains valid values/value combinations.

✅: [rows_unique]: All combinations of task ID column/output_type/output_type_id values are unique.

✅: [req_vals]: Task ID/output type/output_type_id combinations all present.

✅: [value_col_valid]: Values in column value all valid with respect to modeling task config.

✅: [value_col_non_desc]: Quantile or cdf value values increase when ordered by output_type_id.

ℹ: [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_taskid_set check.
ℹ: [spl_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_tid check.
ℹ: [spl_non_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_non_compound_tid check.
ℹ: [spl_n]: No v3 samples found in model output data to check. Skipping check_tbl_spl_n check.
✅: [na_value]: value does not contain NA value.

❗: [flat_projection]: Some projections have a unique value for the whole projection period.
Please verify, for example: 2025-04-27, D-2025-04-01, inc hosp, 50, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.025; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.05; 2025-04-27, A-2025-04-01, inc hosp, 02, 0-130, quantile, 0.01
✅: [cumul_proj]: The cumulative values are not decreasing.

@LucieContamin
Copy link
Copy Markdown
Contributor

thanks for the update! I will merge the PR soon.
Please let me know if any issues or questions

@LucieContamin
Copy link
Copy Markdown
Contributor

Apologizes for the second messaged, but I just wanted to verify I understand the submission file properly, the file contains ensemble for inc hosp target only, is that correct?

@scc-usc
Copy link
Copy Markdown
Contributor Author

scc-usc commented Jun 16, 2025

Yes, only inc hosp.

@LucieContamin
Copy link
Copy Markdown
Contributor

Sorry for the late notice, after looking into the data it seems that there is an error that I miss, I apologize for the confusion. It seems that the last horizons are missing, would it be possible to fix it please?

Please let me know if any issues or questions
Lucie

Here an update:
✅: [valid_round_id_col]: round_id_col name is valid.

✅: [unique_round_id]: round_id column "origin_date" contains a single, unique round ID value.

✅: [match_round_id]: All round_id_col "origin_date" values match submission round_id from file name.

✅: [colnames]: Column names are consistent with expected round task IDs and std column names.

✅: [col_types]: Column data types match hub schema.

✅: [valid_vals]: tbl contains valid values/value combinations.

✅: [rows_unique]: All combinations of task ID column/output_type/output_type_id values are unique.

❌: [req_vals]: Required task ID/output type/output_type_id combinations missing.
Please verify:

origin_date: 2025-04-27
scenario_id: c("A-2025-04-01", "B-2025-04-01", "C-2025-04-01", "D-2025-04-01", "E-2025-04-01")
target: inc hosp
horizon: c("51", "52")
age_group: c("0-130", "65-130", "0-64")
output_type: quantile
output_type_id: c("0.01", "0.025", "0.05", "0.1", "0.15", "0.2", "0.25", "0.3", "0.35", "0.4", "0.45", "0.5", "0.55", "0.6", "0.65", "0.7", "0.75", "0.8", "0.85", "0.9", "0.95", "0.975", "0.99")

✅: [value_col_valid]: Values in column value all valid with respect to modeling task config.

✅: [value_col_non_desc]: Quantile or cdf value values increase when ordered by output_type_id.

ℹ: [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_taskid_set check.
ℹ: [spl_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_tid check.
ℹ: [spl_non_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_non_compound_tid check.
ℹ: [spl_n]: No v3 samples found in model output data to check. Skipping check_tbl_spl_n check.
✅: [na_value]: value does not contain NA value.

❗: [flat_projection]: Some projections have a unique value for the whole projection period.
Please verify, for example: 2025-04-27, D-2025-04-01, inc hosp, 50, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.025; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.05; 2025-04-27, A-2025-04-01, inc hosp, 02, 0-130, quantile, 0.01
✅: [cumul_proj]: The cumulative values are not decreasing.

@github-actions
Copy link
Copy Markdown

Model Output

--- LEMMA-EnsembleDTWS/2025-04-27-LEMMA-EnsembleDTWS.gz.parquet ---

✅: [file_exists]: File exists at path 'model-output/LEMMA-EnsembleDTWS/2025-04-27-LEMMA-EnsembleDTWS.gz.parquet'.

✅: [file_name]: File name "2025-04-27-LEMMA-EnsembleDTWS.gz.parquet" is valid.

✅: [file_location]: File directory name matches model_id
metadata in file name.

✅: [round_id_valid]: round_id is valid.

✅: [file_format]: File is accepted hub format.

✅: [file_n]: Number of accepted model output files per round met.

✅: [metadata_exists]: Metadata file exists at path 'model-metadata/LEMMA-EnsembleDTWS.yaml'.


Run validation on files: 2025-04-27-LEMMA-EnsembleDTWS.gz.parquet

✅: [col_names]: Column names is consistent with expected round task IDs and std column names.
'' should be present in the file.

@LucieContamin
Copy link
Copy Markdown
Contributor

Here the validation output:
✅: [valid_round_id_col]: round_id_col name is valid.

✅: [unique_round_id]: round_id column "origin_date" contains a single, unique round ID value.

✅: [match_round_id]: All round_id_col "origin_date" values match submission round_id from file name.

✅: [colnames]: Column names are consistent with expected round task IDs and std column names.

✅: [col_types]: Column data types match hub schema.

✅: [valid_vals]: tbl contains valid values/value combinations.

✅: [rows_unique]: All combinations of task ID column/output_type/output_type_id values are unique.

✅: [req_vals]: Task ID/output type/output_type_id combinations all present.

✅: [value_col_valid]: Values in column value all valid with respect to modeling task config.

✅: [value_col_non_desc]: Quantile or cdf value values increase when ordered by output_type_id.

ℹ: [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_taskid_set check.
ℹ: [spl_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_tid check.
ℹ: [spl_non_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_non_compound_tid check.
ℹ: [spl_n]: No v3 samples found in model output data to check. Skipping check_tbl_spl_n check.
✅: [na_value]: value does not contain NA value.

❗: [flat_projection]: Some projections have a unique value for the whole projection period.
Please verify, for example: 2025-04-27, D-2025-04-01, inc hosp, 50, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.025; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.05; 2025-04-27, A-2025-04-01, inc hosp, 02, 0-130, quantile, 0.01
✅: [cumul_proj]: The cumulative values are not decreasing.

@LucieContamin LucieContamin merged commit 8611f5c into midas-network:main Jun 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants