Wprazuch/sdk reshape by wprazuch · Pull Request #923 · NVIDIA-NeMo/Evaluator

wprazuch · 2026-04-22T11:49:26Z

No description provided.

Mechanical rename to align with NEL's existing scoring/ module naming. NEL distinguishes 'metrics' (aggregation layer: confidence intervals, pass@k, paired tests) from 'scoring' (per-sample scoring primitives). SDK's old 'metrics/' contained scoring primitives, so this matches NEL convention. - src/nemo_evaluator/sdk/metrics/ -> src/nemo_evaluator/sdk/scoring/ - tests/test_sdk/metrics/ -> tests/test_sdk/scoring/ - Imports: nemo_evaluator.sdk.metrics.* -> nemo_evaluator.sdk.scoring.* No behavioral changes. All 148 tests still pass. Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>

Adds NELScorerMixin in nemo_evaluator.sdk.scoring.base that provides __call__(ScorerInput) -> dict. This bridges SDK's (item, sample) metric API to NEL's (ScorerInput) scorer protocol so SDK metrics plug into NEL's scoring pipeline without an external adapter layer. All 10 concrete runtime metric classes now inherit the mixin: - BLEUMetric, ROUGEMetric, F1Metric, ExactMatchMetric - NumberCheckMetric, StringCheckMetric, ToolCallingMetric - LLMJudgeMetric - _RemoteMetricBase (and its subclasses RemoteMetric, NemoAgentToolkitRemoteMetric) Mapping performed by the mixin: item = {'reference': scorer_input.target, **scorer_input.metadata} sample = {'output_text': scorer_input.response, 'response': scorer_input.response} score = self.metric(item, sample) # SDK's native call return {'score': float(score), 'metric_type': self.type} Users configure SDK metrics with templates that reference these item/ sample keys (e.g. reference='{{ reference }}'), then can use the metric anywhere NEL accepts a scorer. Tests: 3 new tests in tests/test_sdk/scoring/test_nel_scorer_mixin.py. All 151 SDK tests pass. Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>

copy-pr-bot · 2026-04-22T11:49:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-22T11:49:31Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: e0522aa7-5c2a-4c12-8b80-0b09581498aa

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch wprazuch/sdk-reshape

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

wprazuch added 2 commits April 22, 2026 13:30

github-actions Bot added the tests label Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wprazuch/sdk reshape#923

Wprazuch/sdk reshape#923
wprazuch wants to merge 2 commits intowprazuch/sdk-onboarding-approach1from
wprazuch/sdk-reshape

wprazuch commented Apr 22, 2026

Uh oh!

copy-pr-bot Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wprazuch commented Apr 22, 2026

Uh oh!

copy-pr-bot Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant