Provide a metric that uses Math-Verify

## 🚀 Feature Request
Provide a metric that uses [Math-Verify](https://github.com/huggingface/Math-Verify) to parse and compare mathematical expressions with more flexibility than `InContextLearningGenerationExactMatchAccuracy`.

## Motivation

https://huggingface.co/blog/math_verify_leaderboard reports that overly simple methods for evaluating LLM math performance can give very misleading results, which Math-Verify addresses.

## [Optional] Implementation

Create a `MathVerifyAccuracy` class that inherits from `InContextLearningMetric`, in llmfoundry/eval/metrics/nlp.py or perhaps a new llmfoundry/eval/metrics/math.py. The implementation of that class is relatively straightforward, and I would be happy to carry it out if desired.

## Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide a metric that uses Math-Verify #1731

🚀 Feature Request

Motivation

[Optional] Implementation

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide a metric that uses Math-Verify #1731

Description

🚀 Feature Request

Motivation

[Optional] Implementation

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions