feat: Add EvalHub integration by ruivieira · Pull Request #60 · trustyai-explainability/llama-stack-provider-ragas

ruivieira · 2026-03-08T21:02:02Z

Updated Containerfile to use UBI Python image for better compatibility and to avoid Docker Hub rate limits.
Modified pip install command to include 'evalhub' extra dependencies.
Added EvalHub adapter scripts and classes, enabling the package to serve as a standalone module for RAGAS evaluation jobs.
Updated README to document the new EvalHub adapter functionality.
Introduced new files for embeddings and LLM wrappers compatible with OpenAI for the EvalHub integration.

trustyai-explainability/trustyai-service-operator#664 depends on this PR

Summary by Sourcery

Add an EvalHub-compatible RAGAS adapter so the project’s container image can be used both as a Llama Stack RAGAS provider and as an EvalHub RAGAS evaluation job entrypoint.

New Features:

Expose a ragas-evalhub-adapter console entrypoint that runs RAGAS evaluations as an EvalHub framework adapter using EvalHub job specs.
Introduce EvalHub-specific OpenAI-compatible LLM and embeddings wrappers to drive RAGAS metrics against EvalHub-provided models and endpoints.

Enhancements:

Switch the container base image to UBI Python 3.12 for better OpenShift compatibility and to avoid Docker Hub rate limits, and install the new evalhub extra alongside remote dependencies.
Document the EvalHub adapter usage and installation in the README, clarifying that the same image can power both Llama Stack provider and EvalHub RAGAS jobs.

Documentation:

Describe the EvalHub adapter module, entrypoint, and installation extras in the README so users can run EvalHub RAGAS jobs with this package.

- Updated Containerfile to use UBI Python image for better compatibility and to avoid Docker Hub rate limits. - Modified pip install command to include 'evalhub' extra dependencies. - Added EvalHub adapter scripts and classes, enabling the package to serve as a standalone module for RAGAS evaluation jobs. - Updated README to document the new EvalHub adapter functionality. - Introduced new files for embeddings and LLM wrappers compatible with OpenAI for the EvalHub integration.

sourcery-ai · 2026-03-08T21:02:09Z

Reviewer's Guide

Adds an EvalHub-specific RAGAS adapter so the same image can run both Llama Stack RAGAS provider (KFP) and EvalHub jobs, including new OpenAI-compatible LLM/embeddings wrappers, a CLI entrypoint, and UBI-based container adjustments plus README/docs and extras wiring.

Sequence diagram for EvalHub RAGAS benchmark job execution

sequenceDiagram
    actor EvalHubUser
    participant EvalHubController
    participant Kubernetes
    participant RagasEvalHubAdapter as RagasEvalHubAdapter(main)
    participant DefaultCallbacks
    participant RagasCore as RAGAS_evaluate
    participant EvalHubOpenAILLM
    participant EvalHubOpenAIEmbeddings
    participant OpenAIModelAPI as OpenAI_Compat_LLM
    participant OpenAIEmbedAPI as OpenAI_Compat_Embeddings

    EvalHubUser->>EvalHubController: Configure benchmark and model
    EvalHubController->>Kubernetes: Create Job with image and entrypoint ragas-evalhub-adapter
    Kubernetes->>RagasEvalHubAdapter: Start container and run main()

    RagasEvalHubAdapter->>RagasEvalHubAdapter: Load JobSpec from /meta/job.json
    RagasEvalHubAdapter->>DefaultCallbacks: Initialize with job_id, callback_url, oci_auth

    RagasEvalHubAdapter->>DefaultCallbacks: report_status(INITIALIZING)

    RagasEvalHubAdapter->>RagasEvalHubAdapter: _validate_config(JobSpec)
    RagasEvalHubAdapter->>RagasEvalHubAdapter: _resolve_data_path(JobSpec)
    RagasEvalHubAdapter->>RagasEvalHubAdapter: _load_dataset(Path)
    RagasEvalHubAdapter->>RagasEvalHubAdapter: _apply_column_map / _limit_records
    RagasEvalHubAdapter->>DefaultCallbacks: report_status(LOADING_DATA)

    RagasEvalHubAdapter->>RagasCore: ragas_evaluate(dataset, metrics, llm, embeddings, run_config)
    activate RagasCore
    RagasCore->>EvalHubOpenAILLM: generate_text(prompt)
    EvalHubOpenAILLM->>OpenAIModelAPI: POST /v1/completions
    OpenAIModelAPI-->>EvalHubOpenAILLM: completion text
    EvalHubOpenAILLM-->>RagasCore: LLMResult

    RagasCore->>EvalHubOpenAIEmbeddings: embed_query / embed_documents
    EvalHubOpenAIEmbeddings->>OpenAIEmbedAPI: POST /v1/embeddings
    OpenAIEmbedAPI-->>EvalHubOpenAIEmbeddings: embeddings
    EvalHubOpenAIEmbeddings-->>RagasCore: vectors
    RagasCore-->>RagasEvalHubAdapter: ragas_result
    deactivate RagasCore

    RagasEvalHubAdapter->>DefaultCallbacks: report_status(POST_PROCESSING)
    RagasEvalHubAdapter->>RagasEvalHubAdapter: Aggregate metrics, build EvaluationResult list

    alt OCI export configured
        RagasEvalHubAdapter->>DefaultCallbacks: create_oci_artifact(OCIArtifactSpec)
        DefaultCallbacks-->>RagasEvalHubAdapter: OCI artifact reference
    end

    RagasEvalHubAdapter-->>EvalHubController: JobResults (overall_score, metrics)
    RagasEvalHubAdapter->>DefaultCallbacks: report_results(JobResults)
    RagasEvalHubAdapter->>Kubernetes: Exit code 0
    Kubernetes-->>EvalHubController: Job completed

Class diagram for EvalHub adapter, LLM, and embeddings integration

classDiagram
    class FrameworkAdapter {
        <<external>>
        +JobSpec job_spec
        +run_benchmark_job(config, callbacks) JobResults
    }

    class JobSpec {
        <<external>>
        +str id
        +str benchmark_id
        +int benchmark_index
        +ModelSpec model
        +dict benchmark_config
        +ExportsConfig exports
        +str provider_id
        +str callback_url
        +int num_examples
    }

    class JobCallbacks {
        <<external>>
        +report_status(update)
        +create_oci_artifact(spec) OCIArtifactRef
        +report_results(results)
    }

    class RagasEvalHubAdapter {
        +run_benchmark_job(config, callbacks) JobResults
        -_validate_config(config) void
        -_resolve_data_path(config) Path
        -_load_dataset(path) list~dict~
        -_apply_column_map(records, column_map) list~dict~
        -_limit_records(records, num_examples) list~dict~
    }

    class EvaluationDataset {
        <<external>>
        +from_list(records) EvaluationDataset
    }

    class EvaluationResult {
        <<external>>
        +str metric_name
        +float metric_value
        +str metric_type
        +int num_samples
        +dict metadata
    }

    class RunConfig {
        <<external>>
        +int max_workers
    }

    class BaseRagasLLM {
        <<external>>
        +BaseRagasLLM(run_config, multiple_completion_supported)
        +generate_text(prompt, n, temperature, stop, callbacks) LLMResult
        +agenerate_text(prompt, n, temperature, stop, callbacks) LLMResult
    }

    class EvalHubOpenAILLM {
        -str _base_url
        -str _model_id
        -int _max_tokens
        -float _temperature
        +EvalHubOpenAILLM(base_url, model_id, max_tokens, temperature, run_config)
        +generate_text(prompt, n, temperature, stop, callbacks) LLMResult
        +agenerate_text(prompt, n, temperature, stop, callbacks) LLMResult
        +get_temperature(n) float
        -_client() Any
    }

    class BaseRagasEmbeddings {
        <<external>>
        +set_run_config(run_config) void
        +embed_query(text) list~float~
        +embed_documents(texts) list~list~float~~
    }

    class EvalHubOpenAIEmbeddings {
        -str _base_url
        -str _model_id
        +EvalHubOpenAIEmbeddings(base_url, model_id, run_config)
        +embed_query(text) list~float~
        +embed_documents(texts) list~list~float~~
        +aembed_query(text) list~float~
        +aembed_documents(texts) list~list~float~~
        -_client() Any
        -_validate_embedding(embedding) list~float~
    }

    class METRIC_MAPPING {
        <<module>>
        +dict~str, Metric~
    }

    class OpenAI {
        <<external>>
        +completions
        +embeddings
    }

    FrameworkAdapter <|-- RagasEvalHubAdapter
    JobSpec --> ModelSpec : uses
    RagasEvalHubAdapter --> JobSpec : consumes
    RagasEvalHubAdapter --> JobCallbacks : uses
    RagasEvalHubAdapter --> EvaluationDataset : builds
    RagasEvalHubAdapter --> EvaluationResult : aggregates
    RagasEvalHubAdapter --> RunConfig : configures
    RagasEvalHubAdapter --> EvalHubOpenAILLM : constructs
    RagasEvalHubAdapter --> EvalHubOpenAIEmbeddings : constructs
    RagasEvalHubAdapter --> METRIC_MAPPING : selects metrics

    BaseRagasLLM <|-- EvalHubOpenAILLM
    EvalHubOpenAILLM --> OpenAI : calls

    BaseRagasEmbeddings <|-- EvalHubOpenAIEmbeddings
    EvalHubOpenAIEmbeddings --> OpenAI : calls

File-Level Changes

Change	Details	Files
Switch container base image to UBI Python and install both remote and EvalHub extras so a single image works for KFP and EvalHub.	Change base image from upstream python:3.12 to registry.access.redhat.com/ubi9/python-312:latest and run as root user Keep working directory and COPY context the same as before Update pip install to install editable package with both remote and evalhub extras	`Containerfile`
Wire up EvalHub adapter as an optional extra with a console script entrypoint.	Add ragas-evalhub-adapter console script pointing to llama_stack_provider_ragas.evalhub.adapter:main Define evalhub optional dependency extra pulling in eval-hub-sdk[adapter] and openai>=1.0.0	`pyproject.toml`
Document EvalHub adapter usage and positioning relative to inline/remote modes.	Add README section describing EvalHub adapter as a standalone module in the same container Explain that EvalHub invokes the ragas-evalhub-adapter entrypoint and how to install via [evalhub] extra	`README.md`
Implement EvalHub framework adapter that runs RAGAS evaluations from EvalHub JobSpec, reading datasets from mounted paths and optionally exporting OCI artifacts.	Implement RagasEvalHubAdapter subclass of FrameworkAdapter that validates the EvalHub JobSpec and orchestrates RAGAS evaluation Resolve dataset path from JobSpec.benchmark_config or from /test_data and /data, supporting .jsonl and .json formats with simple schema handling Build EvaluationDataset from records, choose metrics from benchmark_config or default METRIC_MAPPING, and configure OpenAI-compatible LLM and embeddings clients Run ragas.evaluate with configured metrics/LLM/embeddings, aggregate metric scores into EvaluationResult objects and overall score Optionally materialize detailed results to JSONL and push as an OCI artifact via EvalHub callbacks, returning JobResults with metadata and timing On failure, report JobStatus.FAILED with ErrorInfo and propagate exception	`src/llama_stack_provider_ragas/evalhub/adapter.py`
Provide an OpenAI-compatible LLM wrapper for RAGAS using the EvalHub model URL and env-sourced API keys.	Implement EvalHubOpenAILLM subclass of BaseRagasLLM that calls OpenAI-style /v1/completions endpoint Configure base URL, model id, max_tokens, and temperature from adapter config, sourcing API key from OPENAICOMPATIBLE_API_KEY or OPENAI_API_KEY Translate PromptValue to string, call client.completions.create, and map responses into LangChain Generation/LLMResult objects Provide a simple async wrapper and a deterministic temperature helper	`src/llama_stack_provider_ragas/evalhub/llm.py`
Provide an OpenAI-compatible embeddings wrapper for RAGAS using EvalHub model URL.	Implement EvalHubOpenAIEmbeddings subclass of BaseRagasEmbeddings that calls OpenAI-style /v1/embeddings endpoint Configure base URL (ensuring /v1 suffix), model id, and RunConfig; source API key from OPENAICOMPATIBLE_API_KEY or OPENAI_API_KEY Expose embed_query/embed_documents plus async variants, validating that embeddings are float vectors rather than base64 strings	`src/llama_stack_provider_ragas/evalhub/embeddings.py`
Expose EvalHub adapter main entrypoint via a dedicated subpackage.	Create evalhub package that re-exports adapter.main and documents its role as a standalone EvalHub module sharing the container with the provider	`src/llama_stack_provider_ragas/evalhub/__init__.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The metric_names handling in RagasEvalHubAdapter.run_benchmark_job silently drops unknown metric names; consider logging a warning or raising if requested metrics are not found in METRIC_MAPPING so misconfigurations are easier to detect.
The _get_api_key helper is duplicated in both evalhub.llm and evalhub.embeddings; consider moving this into a shared utility module to avoid divergence in future changes.
The data directories /test_data and /data (and DEFAULT_DATASET_FILENAME) are currently hardcoded in the adapter; consider allowing these to be overridden via environment variables or benchmark config to make the adapter more flexible across environments.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `metric_names` handling in `RagasEvalHubAdapter.run_benchmark_job` silently drops unknown metric names; consider logging a warning or raising if requested metrics are not found in `METRIC_MAPPING` so misconfigurations are easier to detect.
- The `_get_api_key` helper is duplicated in both `evalhub.llm` and `evalhub.embeddings`; consider moving this into a shared utility module to avoid divergence in future changes.
- The data directories `/test_data` and `/data` (and `DEFAULT_DATASET_FILENAME`) are currently hardcoded in the adapter; consider allowing these to be overridden via environment variables or benchmark config to make the adapter more flexible across environments.

## Individual Comments

### Comment 1
<location path="src/llama_stack_provider_ragas/evalhub/adapter.py" line_range="149" />
<code_context>
+            column_map = bc.get("column_map")
+            if isinstance(column_map, dict):
+                records = _apply_column_map(records, column_map)
+            records = _limit_records(records, config.num_examples)
+            if not records:
+                raise ValueError(f"No records in dataset at {data_path} (or after limit)")
</code_context>
<issue_to_address>
**suggestion (performance):** Limit is applied after loading full dataset; for large JSONL this can be unnecessarily expensive.

For `.jsonl`, consider passing `config.num_examples` into `_load_dataset` and stopping iteration once that many records are read, instead of loading all rows first. You can special-case this for `.jsonl` while keeping the current behavior for `.json`, where partial reads are trickier.

Suggested implementation:

```python
            data_path = _resolve_data_path(config)
            # For large JSONL files, avoid loading the full dataset when a limit is set.
            # We pass the limit through to the loader so it can stop iterating early.
            num_examples = getattr(config, "num_examples", None)
            if str(data_path).endswith(".jsonl") and num_examples:
                records = _load_dataset(data_path, limit=num_examples)
            else:
                records = _load_dataset(data_path)

            column_map = bc.get("column_map")
            if isinstance(column_map, dict):
                records = _apply_column_map(records, column_map)

            # Keep in-memory limiting as a safety net, including for non-JSONL formats.
            records = _limit_records(records, num_examples)

```

To fully implement the optimization, you will also need to update the `_load_dataset` implementation to accept and use the new `limit` parameter for `.jsonl`:

1. Update the `_load_dataset` function signature to accept an optional `limit`:
   - From: `def _load_dataset(path: Union[str, Path]) -> List[Dict[str, Any]]:`
   - To:   `def _load_dataset(path: Union[str, Path], limit: Optional[int] = None) -> List[Dict[str, Any]]:`

2. Inside `_load_dataset`, special-case `.jsonl`:
   - When `path` ends with `.jsonl` and `limit` is not `None`, iterate over the file line by line, `json.loads` each line, append to `records`, and `break` once `len(records) >= limit`.
   - For `.json` or other formats, keep the current behavior of loading the entire file; ignore `limit` in those cases.

3. Ensure all other call sites of `_load_dataset` in the codebase are updated (or left as-is) so that they either:
   - Continue calling `_load_dataset(path)` with no `limit`, or
   - Explicitly pass `limit=` if they want streaming/early stop behavior for `.jsonl`.
</issue_to_address>

### Comment 2
<location path="src/llama_stack_provider_ragas/evalhub/adapter.py" line_range="160-164" />
<code_context>
+                list(records[0].keys()) if records else [],
+            )
+
+            metric_names = bc.get("metrics") or bc.get("scoring_functions") or list(METRIC_MAPPING.keys())
+            metrics = [METRIC_MAPPING[name] for name in metric_names if name in METRIC_MAPPING]
+            if not metrics:
+                metrics = list(METRIC_MAPPING.values())
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Silently dropping unknown metric names can make configuration issues hard to diagnose.

Right now, any metric name not in `METRIC_MAPPING` is silently skipped and the run proceeds with the remaining metrics (or all metrics if none match). Consider either logging a warning listing the unknown names or raising a configuration error when they’re present, so misconfigured `metrics` / `scoring_functions` are easier to detect.

```suggestion
            metric_names = bc.get("metrics") or bc.get("scoring_functions") or list(METRIC_MAPPING.keys())

            unknown_metric_names = [name for name in metric_names if name not in METRIC_MAPPING]
            if unknown_metric_names:
                logger.warning(
                    "Unknown metric names in configuration: %s. These will be ignored. Known metrics: %s",
                    unknown_metric_names,
                    list(METRIC_MAPPING.keys()),
                )

            metrics = [METRIC_MAPPING[name] for name in metric_names if name in METRIC_MAPPING]
            if not metrics:
                metrics = list(METRIC_MAPPING.values())
                if metric_names:
                    logger.warning(
                        "No valid metric names found in configuration (requested: %s). "
                        "Falling back to default RAGAS metrics.",
                        metric_names,
                    )
                logger.info("Using default RAGAS metrics")
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-03-08T21:03:27Z

src/llama_stack_provider_ragas/evalhub/adapter.py

+            column_map = bc.get("column_map")
+            if isinstance(column_map, dict):
+                records = _apply_column_map(records, column_map)
+            records = _limit_records(records, config.num_examples)


suggestion (performance): Limit is applied after loading full dataset; for large JSONL this can be unnecessarily expensive.

For .jsonl, consider passing config.num_examples into _load_dataset and stopping iteration once that many records are read, instead of loading all rows first. You can special-case this for .jsonl while keeping the current behavior for .json, where partial reads are trickier.

Suggested implementation:

data_path = _resolve_data_path(config) # For large JSONL files, avoid loading the full dataset when a limit is set. # We pass the limit through to the loader so it can stop iterating early. num_examples = getattr(config, "num_examples", None) if str(data_path).endswith(".jsonl") and num_examples: records = _load_dataset(data_path, limit=num_examples) else: records = _load_dataset(data_path) column_map = bc.get("column_map") if isinstance(column_map, dict): records = _apply_column_map(records, column_map) # Keep in-memory limiting as a safety net, including for non-JSONL formats. records = _limit_records(records, num_examples)

To fully implement the optimization, you will also need to update the _load_dataset implementation to accept and use the new limit parameter for .jsonl:

Update the _load_dataset function signature to accept an optional limit:

From: def _load_dataset(path: Union[str, Path]) -> List[Dict[str, Any]]:

To: def _load_dataset(path: Union[str, Path], limit: Optional[int] = None) -> List[Dict[str, Any]]:

Inside _load_dataset, special-case .jsonl:

When path ends with .jsonl and limit is not None, iterate over the file line by line, json.loads each line, append to records, and break once len(records) >= limit.

For .json or other formats, keep the current behavior of loading the entire file; ignore limit in those cases.

Ensure all other call sites of _load_dataset in the codebase are updated (or left as-is) so that they either:

Continue calling _load_dataset(path) with no limit, or

Explicitly pass limit= if they want streaming/early stop behavior for .jsonl.

sourcery-ai · 2026-03-08T21:03:27Z

src/llama_stack_provider_ragas/evalhub/adapter.py

+            metric_names = bc.get("metrics") or bc.get("scoring_functions") or list(METRIC_MAPPING.keys())
+            metrics = [METRIC_MAPPING[name] for name in metric_names if name in METRIC_MAPPING]
+            if not metrics:
+                metrics = list(METRIC_MAPPING.values())
+                logger.info("Using default RAGAS metrics")


suggestion (bug_risk): Silently dropping unknown metric names can make configuration issues hard to diagnose.

Right now, any metric name not in METRIC_MAPPING is silently skipped and the run proceeds with the remaining metrics (or all metrics if none match). Consider either logging a warning listing the unknown names or raising a configuration error when they’re present, so misconfigured metrics / scoring_functions are easier to detect.

Suggested change

metric_names = bc.get("metrics") or bc.get("scoring_functions") or list(METRIC_MAPPING.keys())

metrics = [METRIC_MAPPING[name] for name in metric_names if name in METRIC_MAPPING]

if not metrics:

metrics = list(METRIC_MAPPING.values())

logger.info("Using default RAGAS metrics")

metric_names = bc.get("metrics") or bc.get("scoring_functions") or list(METRIC_MAPPING.keys())

unknown_metric_names = [name for name in metric_names if name not in METRIC_MAPPING]

if unknown_metric_names:

logger.warning(

"Unknown metric names in configuration: %s. These will be ignored. Known metrics: %s",

unknown_metric_names,

list(METRIC_MAPPING.keys()),

)

metrics = [METRIC_MAPPING[name] for name in metric_names if name in METRIC_MAPPING]

if not metrics:

metrics = list(METRIC_MAPPING.values())

if metric_names:

logger.warning(

"No valid metric names found in configuration (requested: %s). "

"Falling back to default RAGAS metrics.",

metric_names,

)

logger.info("Using default RAGAS metrics")

gnaulak-redhat · 2026-03-10T02:39:05Z

src/llama_stack_provider_ragas/evalhub/adapter.py

+            provider_id=adapter.job_spec.provider_id,
+            sidecar_url=adapter.job_spec.callback_url,
+            oci_auth_config_path=Path(oci_auth) if oci_auth else None,
+            oci_insecure=os.environ.get("OCI_REGISTRY_INSECURE", "false").lower() == "true",


The default env variable name in eval-hub-sdk's AdaptorSettings is OCI_INSECURE , ~~OCI_REGISTRY_INSECURE~~

# OCI registry configuration oci_auth_config_path: Path | None = Field( default=None, validation_alias="OCI_AUTH_CONFIG_PATH" ) oci_insecure: bool = Field(default=False, validation_alias="OCI_INSECURE")

dmaniloff · 2026-03-10T18:27:48Z

@ruivieira thank you for this!

i think you already hinted at this earlier, but it seems this pr is purely additive and parallel in the sense that it's creating a new adapter that only requires ragas and nothing from this provider.

like you said we should move to eval-hub contrib. happy to post a pr! let me know.

ruivieira requested review from dmaniloff, gnaulak-redhat, julpayne, scheruku-rh and tarilabs March 8, 2026 21:02

ruivieira self-assigned this Mar 8, 2026

ruivieira added the enhancement New feature or request label Mar 8, 2026

ruivieira added this to TrustyAI planning Mar 8, 2026

sourcery-ai bot reviewed Mar 8, 2026

View reviewed changes

gnaulak-redhat reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add EvalHub integration#60

feat: Add EvalHub integration#60
ruivieira wants to merge 1 commit intotrustyai-explainability:mainfrom
ruivieira:evalhub

ruivieira commented Mar 8, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Mar 8, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Mar 8, 2026

Uh oh!

sourcery-ai bot Mar 8, 2026

Uh oh!

gnaulak-redhat Mar 10, 2026

Uh oh!

dmaniloff commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ruivieira commented Mar 8, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for EvalHub RAGAS benchmark job execution

Class diagram for EvalHub adapter, LLM, and embeddings integration

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

gnaulak-redhat Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

dmaniloff commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ruivieira commented Mar 8, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Mar 8, 2026 •

edited

Loading