Skip to content

Provide a metric that uses Math-Verify #1731

Open
@gsganden

Description

@gsganden

🚀 Feature Request

Provide a metric that uses Math-Verify to parse and compare mathematical expressions with more flexibility than InContextLearningGenerationExactMatchAccuracy.

Motivation

https://huggingface.co/blog/math_verify_leaderboard reports that overly simple methods for evaluating LLM math performance can give very misleading results, which Math-Verify addresses.

[Optional] Implementation

Create a MathVerifyAccuracy class that inherits from InContextLearningMetric, in llmfoundry/eval/metrics/nlp.py or perhaps a new llmfoundry/eval/metrics/math.py. The implementation of that class is relatively straightforward, and I would be happy to carry it out if desired.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions