Description
I have been bothered by this a lot recently. The observation is that the version of Dataset registered does not change at all nor last modified timestamp when I re-ran the pipeline. However, when I checked the contents, they changed already and they are the latest. When I checked previous versions, surprisingly the data contents are changed, too. They are not old dataset but the latest dataset, that is the content is the same as the latest version. I have checked several older versions, their contents are the same as the latest version.
I have another data pipeline where several datasets are registered. For this pipeline, the dataset version changes as expected. When I compared these two pipelines, I have found a possible reason. In this pipeline, the data source paths change all the time and there is not data processing in the pipeline. While in the pipeline above, there is a processing step and the processed data are temporarily stored in azure default blob storage with the same path and name always. It seems Azure ML only checks the source path. If the source path is the same, then regardless the content changes or timestamp changes, it will ignore. I am not sure whether it was designed this like or a bug. Clearly this is not convenient. In this case, the version does not change even content has changed and I am not able to check historical versions as they are all the same.
azureml sdk version: 1.12.0