I'm using the QA dataset from your open-source project (MiniRAG) to evaluate my own RAG question-answering framework. However, I noticed that some of the QA pairs in the dataset do not have reference answers.
I was wondering—when calculating the accuracy, should I exclude these questions without reference answers and only evaluate those that have standard answers? Or should I include all the questions and manually verify whether the model’s responses are correct?
I’d like to kindly ask for your advice on how to handle this situation. Thank you in advance!