Closed
Conversation
Copies the public nemo_evaluator_sdk tree from NVIDIA-NeMo/Platform into NEL as src/nemo_evaluator/sdk/, minus the execution/ orchestrator submodule (NEL owns the evaluation loop — see AGENTS.md design principle). What's included: - metrics/ — BLEU, ROUGE, F1, ExactMatch, ToolCalling, LLMJudge, Remote, RAGAS (11 variants) + aggregation, base types, template rendering - values/ — metric config types, result types, score types, secrets - resilience/ — scheduler with retry/rate-limiting, policies, classifier - datasets/ — multi-format loader (JSON/JSONL/CSV/Parquet/Feather/Arrow) - inference.py, agent_inference.py — OpenAI-compatible async client with preprocessing/postprocessing hooks - templates.py — Jinja2 prompt rendering - structured_output.py — structured output mode detection/validation - enums.py, constants.py — type definitions Changes from upstream: - Import rewrites: nemo_evaluator_sdk.* → nemo_evaluator.sdk.* - __init__.py: drops the execution.evaluator.Evaluator export - metrics/remote.py: inlines the _run_sync helper (previously lived in execution.metric_execution) to avoid pulling in the full orchestrator Dependencies: added via new [sdk] optional extra (jsonschema, jsonpath-ng, pyarrow, pandas, openai, sacrebleu, rouge_score). Python floor bumped from 3.10 to 3.11 to match SDK constraint. Tests: ported 10 test files under tests/test_sdk/ covering inference, agent_inference, llm_judge, exact_match, remote, values. Skipped tests that depend on the execution orchestrator (bleu/rouge/f1/number_check/ string_check/tool_calling/api) and test_params.py (depends on nmp internal module). All 148 ported tests pass. Verified: pip install -e .[sdk] succeeds, imports resolve, tests pass. This is Approach 1 (verbatim copy). A follow-up PR will distribute the SDK into NEL's existing scoring/, metrics/, adapters/, engine/, environments/, and config/ modules per the RFC Phase 1.5 design. Signed-off-by: Wojciech Prazuch <wprazuch@nvidia.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Copies the public nemo_evaluator_sdk tree from NVIDIA-NeMo/Platform into NEL as src/nemo_evaluator/sdk/, minus the execution/ orchestrator submodule (NEL owns the evaluation loop — see AGENTS.md design principle).
What's included:
Changes from upstream:
Dependencies: added via new [sdk] optional extra (jsonschema, jsonpath-ng, pyarrow, pandas, openai, sacrebleu, rouge_score). Python floor bumped from 3.10 to 3.11 to match SDK constraint.
Tests: ported 10 test files under tests/test_sdk/ covering inference, agent_inference, llm_judge, exact_match, remote, values. Skipped tests that depend on the execution orchestrator (bleu/rouge/f1/number_check/ string_check/tool_calling/api) and test_params.py (depends on nmp internal module). All 148 ported tests pass.
Verified: pip install -e .[sdk] succeeds, imports resolve, tests pass. This is Approach 1 (verbatim copy). A follow-up PR will distribute the SDK into NEL's existing scoring/, metrics/, adapters/, engine/, environments/, and config/ modules per the RFC Phase 1.5 design.