Hello,
First of all, thank you for releasing this beautiful large dataset with phase and step annotations. It is a great resource and benchmark for surgical video understanding and I cannot wait to see researchers use this more in the future.
I am a bit confused about the suggested data splits though.
The text in the publication seems to imply that a simple hold-out (80:20:40) split was used.
In contrast, this repository refers to 5-fold cross validation.
But I checked the pickle files at labels/<center>/labels_by70_splits/labels and it seems that repeated random subsampling was used to create five different 40:10:20 splits per center. In fact, the five test sets are not disjoint and they do not cover the complete dataset (which would be expected for k-fold cross validation).
Given that the provided scripts to reproduce the experiments also focus on the first datasplit (id_split: 0), I was wondering whether it would be sufficient to simply evaluate in a hold-out fashion (train:val:test) on split 0.
What would you suggest?
If I repeat experiments for the four remaining splits, how should I summarize the results, given that I would compute them on non-identical but maybe overlapping test sets?
Kind regards,
Isabel
Hello,
First of all, thank you for releasing this beautiful large dataset with phase and step annotations. It is a great resource and benchmark for surgical video understanding and I cannot wait to see researchers use this more in the future.
I am a bit confused about the suggested data splits though.
The text in the publication seems to imply that a simple hold-out (80:20:40) split was used.
In contrast, this repository refers to 5-fold cross validation.
But I checked the pickle files at
labels/<center>/labels_by70_splits/labelsand it seems that repeated random subsampling was used to create five different 40:10:20 splits per center. In fact, the five test sets are not disjoint and they do not cover the complete dataset (which would be expected for k-fold cross validation).Given that the provided scripts to reproduce the experiments also focus on the first datasplit (
id_split: 0), I was wondering whether it would be sufficient to simply evaluate in a hold-out fashion (train:val:test) on split 0.What would you suggest?
If I repeat experiments for the four remaining splits, how should I summarize the results, given that I would compute them on non-identical but maybe overlapping test sets?
Kind regards,
Isabel