-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Hi,
First of all, thanks for your work. We are currently attempting to reproduce the baseline results from the ChartQAPro paper, but we cannot match with some the results posted. We believe that this is because of key decoder parameters are omitted from both the paper and the provided implementation code. For instance, here are our results (using the provided prompts and evaluation script) for Microsoft-Phi-3.5-Vision-4B with Direct Prompting (took the mean of 3 trials):
- Factoid: 14.82% (your result is 17.48%)
- Conversational: 21.91% (your result is 28.54%)
- Hypothetical: 31.10% (your result is 37.27%)
- Fact Checking: 37.24% (your result is 41.99%)
- Multi Choice: 28.01% (your result is 30.37%)
As you can see above, we are far away from your posted results
Could you please release the full config for decoding with values for such as but not limited to
- max_tokens
- top_p
- frequency_penalty
- presence_penalty
- temperature
and also the experiment settings (are the posted results best of n experiments, mean of n experiments, or one shot) so that your claimed results are reproducible. If we can have access to the inference settings, we can produce closer results.
Thanks!
HaFred
Metadata
Metadata
Assignees
Labels
No labels