Skip to content

feat: LLM-as-a-judge for REPLACE evaluation #98

@alexahaushalter

Description

@alexahaushalter

Priority Level

Medium (Nice to have)

Is your feature request related to a problem?

Hard to tell if all the PII was found and replaced well when using Replace.

Describe the solution you'd like

The evaluation for rewrite is great and we would like to offer the option to extend it to replace in some capacity.

Ideal eval is a human reviewing, but in absence of that, can an LLM optionally review to help answer "Did it actually (1) find all the PII and (2) replace it in a contextually relevant way?"

This could also be very helpful when trying to run Anonymizer on a language one is unfamiliar with to have a sense of whether Anonymizer can perform well on that language. And ultimately be able to provide more benchmark info.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions