Submitting shape-based ensemble#23
Conversation
|
Good afternoon, Thank you for the submission. I will work on a validation version on my local and by-pass the automatic validation. One of the issue here is the metadata file. In the new format, the metadata file should be in the model-metadata folder in yaml format. I cannot merge the PR without it. Also, the files and folder names Please find below, the result of the local validation. To pass the validation, it's also necessary to remove the
Please let me know if any issues or questions, Run validation on files: 2025-04-27-DTWS-Ensemble.gz.parquet ✅: [valid_round_id_col]: ✅: [unique_round_id]: ✅: [match_round_id]: All ✅: [colnames]: Column names are consistent with expected round task IDs and std column names. ❗: [col_types]: Column data types do not match hub schema. ✅: [rows_unique]: All combinations of task ID column/ ✅: [req_vals]: Task ID/output type/output_type_id combinations all present. ✅: [value_col_valid]: Values in column ❗: [value_col_non_desc]: Quantile or cdf ❗: [flat_projection]: Some projections have a unique value for the whole projection period. |
Model Output--- Ensemble_DTW/2025-04-27-Ensemble_DTWS.gz.parquet --- ✅: [file_exists]: File exists at path 'model-output/Ensemble_DTW/2025-04-27-Ensemble_DTWS.gz.parquet'. ❌: [file_name]: File name "2025-04-27-Ensemble_DTWS.gz.parquet" must be valid. Run validation on files: 2025-04-27-Ensemble_DTWS.gz.parquet ✅: [col_names]: Column names is consistent with expected round task IDs and std column names. |
|
Good morning, Thanks for the update. The validation is not working properly because of 2 main issues:
In the meantime, I updated the validation to ignore "Ensemble" files and ran the validation locally, please find the results below. For the next update, feel free to update your fork repository to have the last version of the validation. Please let me know if any issues or questions, ✅: [valid_round_id_col]: ✅: [unique_round_id]: ✅: [match_round_id]: All ✅: [colnames]: Column names are consistent with expected round task IDs and std column names. ✅: [col_types]: Column data types match hub schema. ✅: [valid_vals]: ✅: [rows_unique]: All combinations of task ID column/ ✅: [req_vals]: Task ID/output type/output_type_id combinations all present. ✅: [value_col_valid]: Values in column ✅: [value_col_non_desc]: Quantile or cdf ℹ: [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping ❗: [flat_projection]: Some projections have a unique value for the whole projection period. |
Model Metadata--- LEMMA-EnsembleDTWS.yaml --- ✅: [metadata_schema_exists]: File exists at path 'hub-config/model-metadata-schema.json'. ✅: [metadata_file_exists]: File exists at path 'model-metadata/LEMMA-EnsembleDTWS.yaml'. ✅: [metadata_file_ext]: Metadata file extension is "yml" or "yaml". ✅: [metadata_file_location]: Metadata file directory name matches "model-metadata". ✅: [metadata_matches_schema]: Metadata file contents are consistent with schema specifications. ✅: [metadata_file_name]: Metadata file name matches the |
Model Output--- LEMMA-EnsembleDTWS/2025-04-27-LEMMA-EnsembleDTWS.gz.parquet --- ✅: [file_exists]: File exists at path 'model-output/LEMMA-EnsembleDTWS/2025-04-27-LEMMA-EnsembleDTWS.gz.parquet'. ✅: [file_name]: File name "2025-04-27-LEMMA-EnsembleDTWS.gz.parquet" is valid. ✅: [file_location]: File directory name matches ✅: [round_id_valid]: ✅: [file_format]: File is accepted hub format. ✅: [file_n]: Number of accepted model output files per round met. ✅: [metadata_exists]: Metadata file exists at path 'model-metadata/LEMMA-EnsembleDTWS.yaml'. Run validation on files: 2025-04-27-LEMMA-EnsembleDTWS.gz.parquet ✅: [col_names]: Column names is consistent with expected round task IDs and std column names. |
LucieContamin
left a comment
There was a problem hiding this comment.
Run validation on files: 2025-04-27-LEMMA-EnsembleDTWS.gz.parquet
✅: [valid_round_id_col]: round_id_col name is valid.
✅: [unique_round_id]: round_id column "origin_date" contains a single, unique round ID value.
✅: [match_round_id]: All round_id_col "origin_date" values match submission round_id from file name.
✅: [colnames]: Column names are consistent with expected round task IDs and std column names.
✅: [col_types]: Column data types match hub schema.
✅: [valid_vals]: tbl contains valid values/value combinations.
✅: [rows_unique]: All combinations of task ID column/output_type/output_type_id values are unique.
✅: [req_vals]: Task ID/output type/output_type_id combinations all present.
✅: [value_col_valid]: Values in column value all valid with respect to modeling task config.
✅: [value_col_non_desc]: Quantile or cdf value values increase when ordered by output_type_id.
ℹ: [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_taskid_set check.
ℹ: [spl_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_compound_tid check.
ℹ: [spl_non_compound_tid]: No v3 samples found in model output data to check. Skipping check_tbl_spl_non_compound_tid check.
ℹ: [spl_n]: No v3 samples found in model output data to check. Skipping check_tbl_spl_n check.
✅: [na_value]: value does not contain NA value.
❗: [flat_projection]: Some projections have a unique value for the whole projection period.
Please verify, for example: 2025-04-27, D-2025-04-01, inc hosp, 50, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.01; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.025; 2025-04-27, E-2025-04-01, inc hosp, 15, 0-130, quantile, 0.05; 2025-04-27, A-2025-04-01, inc hosp, 02, 0-130, quantile, 0.01
✅: [cumul_proj]: The cumulative values are not decreasing.
|
thanks for the update! I will merge the PR soon. |
|
Apologizes for the second messaged, but I just wanted to verify I understand the submission file properly, the file contains ensemble for |
|
Yes, only inc hosp. |
|
Sorry for the late notice, after looking into the data it seems that there is an error that I miss, I apologize for the confusion. It seems that the last horizons are missing, would it be possible to fix it please? Please let me know if any issues or questions Here an update: ✅: [unique_round_id]: ✅: [match_round_id]: All ✅: [colnames]: Column names are consistent with expected round task IDs and std column names. ✅: [col_types]: Column data types match hub schema. ✅: [valid_vals]: ✅: [rows_unique]: All combinations of task ID column/ ❌: [req_vals]: Required task ID/output type/output_type_id combinations missing. origin_date: 2025-04-27 ✅: [value_col_valid]: Values in column ✅: [value_col_non_desc]: Quantile or cdf ℹ: [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping ❗: [flat_projection]: Some projections have a unique value for the whole projection period. |
Model Output--- LEMMA-EnsembleDTWS/2025-04-27-LEMMA-EnsembleDTWS.gz.parquet --- ✅: [file_exists]: File exists at path 'model-output/LEMMA-EnsembleDTWS/2025-04-27-LEMMA-EnsembleDTWS.gz.parquet'. ✅: [file_name]: File name "2025-04-27-LEMMA-EnsembleDTWS.gz.parquet" is valid. ✅: [file_location]: File directory name matches ✅: [round_id_valid]: ✅: [file_format]: File is accepted hub format. ✅: [file_n]: Number of accepted model output files per round met. ✅: [metadata_exists]: Metadata file exists at path 'model-metadata/LEMMA-EnsembleDTWS.yaml'. Run validation on files: 2025-04-27-LEMMA-EnsembleDTWS.gz.parquet ✅: [col_names]: Column names is consistent with expected round task IDs and std column names. |
|
Here the validation output: ✅: [unique_round_id]: ✅: [match_round_id]: All ✅: [colnames]: Column names are consistent with expected round task IDs and std column names. ✅: [col_types]: Column data types match hub schema. ✅: [valid_vals]: ✅: [rows_unique]: All combinations of task ID column/ ✅: [req_vals]: Task ID/output type/output_type_id combinations all present. ✅: [value_col_valid]: Values in column ✅: [value_col_non_desc]: Quantile or cdf ℹ: [spl_compound_taskid_set]: No v3 samples found in model output data to check. Skipping ❗: [flat_projection]: Some projections have a unique value for the whole projection period. |
Hopefully, we can bypass some checks, as this is an ensemble of all the submitted models, with quantiles only (no trajectories).