Critical reproducibility issue: Missing decoder parameters in ChartQAPro baseline results

Hi,
First of all, thanks for your work. We are currently attempting to reproduce the baseline results from the ChartQAPro paper, but we cannot match with some the results posted. We believe that this is because of **key decoder parameters are omitted from both the paper and the provided implementation code**. For instance, here are our results (using the provided prompts and evaluation script) for Microsoft-Phi-3.5-Vision-4B with Direct Prompting (took the mean of 3 trials):
- Factoid: 14.82% (your result is 17.48%)
- Conversational: 21.91% (your result is 28.54%)
- Hypothetical: 31.10% (your result is 37.27%)
- Fact Checking: 37.24% (your result is 41.99%)
- Multi Choice: 28.01% (your result is 30.37%)

As you can see above, we are far away from your posted results

Could you please release the full config for decoding with values for such as but not limited to
- max_tokens
- top_p
- frequency_penalty
- presence_penalty
- temperature
and also the experiment settings (are the posted results best of n experiments, mean of n experiments, or one shot) so that your claimed results are reproducible. If we can have access to the inference settings, we can produce closer results.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Critical reproducibility issue: Missing decoder parameters in ChartQAPro baseline results #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Critical reproducibility issue: Missing decoder parameters in ChartQAPro baseline results #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions