Fixes to GraniteGuardian metric,, safety evals cleanups #1690

bnayahu · 2025-03-18T06:53:58Z

Granite Guardian metric to properly handle data_classification_policy
Interim fix to make 'prediction' available to the metric.
Further cleanup to safety evals.

… template. Signed-off-by: Jonathan Bnayahu <[email protected]>

Signed-off-by: Jonathan Bnayahu <[email protected]>

martinscooper · 2025-03-19T16:56:54Z

@bnayahu @elronbandel note the the following added lines makes to impossible to set an input field named 'prediction' as it will be overwritten:

# TODO replace with logic inside verify_granite_guardian_config and process_input_fields
task_data["prediction"] = prediction

In addition, we should discuss how the fix should be implemented. Using the predictions only makes sense, conceptually, in the case where the risk is evaluating a generated response (the response). So it would be all the assistant risks and some of the RAG test cases.

What promoted this change was to be able to mix LLMAsJudge metrics with Guardian metrics. LLMAsJudge needs the predictions to be set but GG metric doesn't. So, we could either adapt LLMAsJudge or adapt GG.

bnayahu · 2025-03-19T17:38:09Z

@martinscooper That's true. Perhaps a more delicate way would be to add a flag field to the class (e.g., use_generated_prediction) that could be set in the specification of the metric in the card, then be checked in verify_granite_guardian_config and process_input_fields to either only use task_data or use the prediction field for the assistant field?

bnayahu added 6 commits March 17, 2025 10:32

Fixed an issue with GraniteGuardian metric, and switched to a generic…

2dceae1

… template. Signed-off-by: Jonathan Bnayahu <[email protected]>

Removal of redundant steps

d306372

Signed-off-by: Jonathan Bnayahu <[email protected]>

Fix missing predictions and classification policy in evaluated dataset

2403a50

Signed-off-by: Jonathan Bnayahu <[email protected]>

Safer data_classification_policy handling

bbc76ab

Signed-off-by: Jonathan Bnayahu <[email protected]>

Interim solution to make the prediction available to the metric

6273100

Signed-off-by: Jonathan Bnayahu <[email protected]>

Merge branch 'main' into jb/gg-hack

b0825bb

bnayahu requested review from elronbandel and martinscooper March 18, 2025 06:55

elronbandel approved these changes Mar 19, 2025

View reviewed changes

elronbandel added 2 commits March 19, 2025 11:05

Merge branch 'main' into jb/gg-hack

f846b9b

Merge branch 'main' into jb/gg-hack

9ae61c8

elronbandel merged commit bcf5b4a into main Mar 19, 2025
12 checks passed

elronbandel deleted the jb/gg-hack branch March 19, 2025 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes to GraniteGuardian metric,, safety evals cleanups #1690

Fixes to GraniteGuardian metric,, safety evals cleanups #1690

bnayahu commented Mar 18, 2025 •

edited

Loading

martinscooper commented Mar 19, 2025 •

edited

Loading

bnayahu commented Mar 19, 2025 •

edited

Loading

Fixes to GraniteGuardian metric,, safety evals cleanups #1690

Fixes to GraniteGuardian metric,, safety evals cleanups #1690

Conversation

bnayahu commented Mar 18, 2025 • edited Loading

martinscooper commented Mar 19, 2025 • edited Loading

bnayahu commented Mar 19, 2025 • edited Loading

bnayahu commented Mar 18, 2025 •

edited

Loading

martinscooper commented Mar 19, 2025 •

edited

Loading

bnayahu commented Mar 19, 2025 •

edited

Loading