Description
Hello,
I am currently using Cambrian's official evaluation code to assess models on the MME and MathVista benchmarks. However, I am unable to achieve the performance levels reported by Cambrian. I am reaching out to seek guidance on what might be going wrong.
Details:
Model: models--nyu-visionx--cambrian-8b
Output from Evaluation Code:
MME Results:
model,time,total_score,accuracy,Perception,Cognition,code_reasoning,artwork,celebrity,numerical_calculation,text_translation,count,color,commonsense_reasoning,position,OCR,landmark,scene,existence,posters
cambrian-8b,2024-11-11 19:57:06,1699.8325330132052,78.85425442291492,1380.1896758703479,319.6428571428571,"{'acc_score': 50.0, 'acc_plus_score': 20.0, 'score': 70.0, 'size': 40.0}","{'acc_score': 71.0, 'acc_plus_score': 42.0, 'score': 112.99999999999999, 'size': 400.0}","{'acc_score': 80.58823529411765, 'acc_plus_score': 61.76470588235294, 'score': 142.35294117647058, 'size': 340.0}","{'acc_score': 50.0, 'acc_plus_score': 0.0, 'score': 50.0, 'size': 40.0}","{'acc_score': 62.5, 'acc_plus_score': 30.0, 'score': 92.5, 'size': 40.0}","{'acc_score': 65.0, 'acc_plus_score': 30.0, 'score': 95.0, 'size': 60.0}","{'acc_score': 88.33333333333333, 'acc_plus_score': 76.66666666666667, 'score': 165.0, 'size': 60.0}","{'acc_score': 68.57142857142857, 'acc_plus_score': 38.57142857142858, 'score': 107.14285714285714, 'size': 140.0}","{'acc_score': 78.33333333333333, 'acc_plus_score': 60.0, 'score': 138.33333333333334, 'size': 60.0}","{'acc_score': 55.00000000000001, 'acc_plus_score': 10.0, 'score': 65.0, 'size': 40.0}","{'acc_score': 82.5, 'acc_plus_score': 65.5, 'score': 148.0, 'size': 400.0}","{'acc_score': 88.0, 'acc_plus_score': 77.0, 'score': 165.0, 'size': 400.0}","{'acc_score': 96.66666666666667, 'acc_plus_score': 93.33333333333333, 'score': 190.0, 'size': 60.0}","{'acc_score': 85.71428571428571, 'acc_plus_score': 72.78911564625851, 'score': 158.50340136054422, 'size': 294.0}"
MathVista:
{'model': 'cambrian-8b', 'time': '2024-11-11 20:27:43', 'accuracy': 0.02, 'total_count': 1000, 'math-targeted-vqa': {'accurcay': 0.018518518518518517, 'total': 540}, 'general-vqa': {'accurcay': 0.021739130434782608, 'total': 460}}
Request:
Could someone please review the output and provide insights into any potential misconfigurations or errors that might be affecting the results? Any guidance on how to align the performance with Cambrian's reported metrics would be greatly appreciated.