Thank you for the excellent work!
Could you please release the evaluation script used to compute the results in Table 1? Specifically:
- L2 (m) 1s, 2s, 3s, ave
- Failure rate (%) with a 10 m threshold
Validation results in my environment
When I evaluated Qwen2.0-VL-7B on the nuScenes Validation split and applied the 10 m threshold, I observed only one failure out of 150 scenes — specifically, scene 0636.
I would like to understand where the difference between my results and the paper’s Table 1 might be coming from.
Thank you very much in advance.