In the experiment, I evaluated the same model on both the HM3D-val v2 and HM3D-val v1 datasets. The results on the v2 dataset were significantly worse compared to those on the v1 dataset. Upon closer examination, I identified a potential issue: when the model predicts that it should stop, the distance to the goal consistently falls between 0.1m and 0.2m on the v2 dataset. Could this discrepancy arise from differences in dataset characteristics, annotation protocols, or inherent model biases when applied to the updated dataset version? I would appreciate your help in investigating this issue.
No labels