Skip to content

Possibility to benchmark with MedCalc-Bench Verified instead of v1.0? #18

@nikhilk7153

Description

@nikhilk7153

Hi,

I was wondering if future iterations could be benchmarked with MedCalc-Bench Verified instead of v1.0? We've corrected almost 1/3 of the labels which had either incorrect computation or incorrect extraction of relevant entities, both of which affect the ground truth. We also made improvements like finding notes which may be better applications for certain calculators and re-writing notes so that they read more closely to writing style of the notes from PMC using o4-mini: https://huggingface.co/datasets/nsk7153/MedCalc-Bench-Verified.

I know I posted this on the HELM GitHub issues too, but I just want everyone to use the most accurate version to date since the changes can affect the ranking of stronger models.

Thanks for taking the time to read my request!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions