Question about WebQSP Test Dataset Size
Hi @LHRLAB,
I'm studying your KBQA-o1 implementation and noticed that the WebQSP test dataset is limited to only 200 samples in run_explore.py. I have a few questions about this design choice:
Current Implementation
In run_explore.py line 456:
if not ex:
kbqa_data = kbqa_data[:200] # Test mode limited to 200 samples
else:
random.shuffle(kbqa_data) # Explore mode uses all data
Questions
-
Performance impact: How does this 200-sample subset compare to the full WebQSP test set (which typically has ~2,032 questions)? Are there any performance differences you've observed?
-
Reproducibility: For fair comparison with other KBQA methods that use the full test set, should researchers:
- Use the same 200-sample subset for comparison?
- Or modify the code to use the full test set?