Skip to content

Metadata correction for 2024.findings-acl.673 #7597

@nicolaioestergaard

Description

@nicolaioestergaard

JSON data block

{
  "anthology_id": "2024.findings-acl.673",
  "abstract": "The rapid advancement of Large Language Models (LLMs) in the realm of mathematical reasoning necessitates comprehensive evaluations to gauge progress and inspire future directions. Existing assessments predominantly focus on problem-solving from the examinee perspective, overlooking a dual perspective of examiner regarding error identification and correction. From the examiner perspective, we define four evaluation tasks for error identification and correction along with a new dataset with annotated error types and steps. We also design diverse prompts to thoroughly evaluate eleven representative LLMs. Our principal findings indicate that GPT-4 outperforms all models, while open-source model LLaMA-2-7B demonstrates comparable abilities to closed-source models GPT-3.5 and Gemini Pro. Notably, calculation error proves the most challenging error type. Moreover, prompting LLMs with the error types can improve the average correction accuracy by 47.9%. These results reveal potential directions for developing the mathematical reasoning abilities of LLMs. Our code and dataset is available on <url>https://github.com/LittleCirc1e/EIC</url>."
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    approvedUsed to note team approval of metadata requestscorrectionfor corrections submitted to the anthologymetadataCorrection to paper metadata

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions