Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to GraniteGuardian metric,, safety evals cleanups #1690

Merged
merged 8 commits into from
Mar 19, 2025
Merged

Conversation

bnayahu
Copy link
Collaborator

@bnayahu bnayahu commented Mar 18, 2025

  • Granite Guardian metric to properly handle data_classification_policy
  • Interim fix to make 'prediction' available to the metric.
  • Further cleanup to safety evals.

@elronbandel elronbandel merged commit bcf5b4a into main Mar 19, 2025
12 checks passed
@elronbandel elronbandel deleted the jb/gg-hack branch March 19, 2025 09:09
@martinscooper
Copy link
Collaborator

martinscooper commented Mar 19, 2025

@bnayahu @elronbandel note the the following added lines makes to impossible to set an input field named 'prediction' as it will be overwritten:

# TODO replace with logic inside verify_granite_guardian_config and process_input_fields
task_data["prediction"] = prediction

In addition, we should discuss how the fix should be implemented. Using the predictions only makes sense, conceptually, in the case where the risk is evaluating a generated response (the response). So it would be all the assistant risks and some of the RAG test cases.

What promoted this change was to be able to mix LLMAsJudge metrics with Guardian metrics. LLMAsJudge needs the predictions to be set but GG metric doesn't. So, we could either adapt LLMAsJudge or adapt GG.

@bnayahu
Copy link
Collaborator Author

bnayahu commented Mar 19, 2025

@martinscooper That's true. Perhaps a more delicate way would be to add a flag field to the class (e.g., use_generated_prediction) that could be set in the specification of the metric in the card, then be checked in verify_granite_guardian_config and process_input_fields to either only use task_data or use the prediction field for the assistant field?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants