Discrepancy in Evaluation Results Using Cambrian's Official Code

Hello,

I am currently using Cambrian's official evaluation code to assess models on the MME and MathVista benchmarks. However, I am unable to achieve the performance levels reported by Cambrian. I am reaching out to seek guidance on what might be going wrong.

Details:

Model: models--nyu-visionx--cambrian-8b

Output from Evaluation Code:

MME Results: 
model,time,total_score,accuracy,Perception,Cognition,code_reasoning,artwork,celebrity,numerical_calculation,text_translation,count,color,commonsense_reasoning,position,OCR,landmark,scene,existence,posters
cambrian-8b,2024-11-11 19:57:06,1699.8325330132052,78.85425442291492,1380.1896758703479,319.6428571428571,"{'acc_score': 50.0, 'acc_plus_score': 20.0, 'score': 70.0, 'size': 40.0}","{'acc_score': 71.0, 'acc_plus_score': 42.0, 'score': 112.99999999999999, 'size': 400.0}","{'acc_score': 80.58823529411765, 'acc_plus_score': 61.76470588235294, 'score': 142.35294117647058, 'size': 340.0}","{'acc_score': 50.0, 'acc_plus_score': 0.0, 'score': 50.0, 'size': 40.0}","{'acc_score': 62.5, 'acc_plus_score': 30.0, 'score': 92.5, 'size': 40.0}","{'acc_score': 65.0, 'acc_plus_score': 30.0, 'score': 95.0, 'size': 60.0}","{'acc_score': 88.33333333333333, 'acc_plus_score': 76.66666666666667, 'score': 165.0, 'size': 60.0}","{'acc_score': 68.57142857142857, 'acc_plus_score': 38.57142857142858, 'score': 107.14285714285714, 'size': 140.0}","{'acc_score': 78.33333333333333, 'acc_plus_score': 60.0, 'score': 138.33333333333334, 'size': 60.0}","{'acc_score': 55.00000000000001, 'acc_plus_score': 10.0, 'score': 65.0, 'size': 40.0}","{'acc_score': 82.5, 'acc_plus_score': 65.5, 'score': 148.0, 'size': 400.0}","{'acc_score': 88.0, 'acc_plus_score': 77.0, 'score': 165.0, 'size': 400.0}","{'acc_score': 96.66666666666667, 'acc_plus_score': 93.33333333333333, 'score': 190.0, 'size': 60.0}","{'acc_score': 85.71428571428571, 'acc_plus_score': 72.78911564625851, 'score': 158.50340136054422, 'size': 294.0}"

MathVista:
{'model': 'cambrian-8b', 'time': '2024-11-11 20:27:43', 'accuracy': 0.02, 'total_count': 1000, 'math-targeted-vqa': {'accurcay': 0.018518518518518517, 'total': 540}, 'general-vqa': {'accurcay': 0.021739130434782608, 'total': 460}}

Request:

Could someone please review the output and provide insights into any potential misconfigurations or errors that might be affecting the results? Any guidance on how to align the performance with Cambrian's reported metrics would be greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in Evaluation Results Using Cambrian's Official Code #84

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Discrepancy in Evaluation Results Using Cambrian's Official Code #84

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions