[WIP] RHAIENG-1227:RAGAS Eval Provider #1028

skrthomas · 2025-10-30T01:47:18Z

Description

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

Documentation
- Added comprehensive Ragas docs: remote provider production setup with prerequisites, secrets/configuration, manifests, deployment and verification steps.
- Added inline provider setup for development with ConfigMap/manifest examples and quick-deploy guidance.
- Added RAG evaluation guide with a Jupyter notebook demo, evaluation workflow, metric interpretation, and monitoring steps.
- Integrated Ragas overview and evaluation content into monitoring and data‑science assemblies.

coderabbitai · 2025-10-30T01:47:29Z

Walkthrough

Adds new AsciiDoc content documenting Ragas evaluation: an assembly page and three modules covering configuring a remote provider for production, setting up an inline provider for development, and running a notebook-driven evaluation workflow. Also inserts the new assembly into monitoring docs.

Changes

Cohort / File(s)	Summary
Remote provider production guide `modules/configuring-ragas-remote-provider-for-production.adoc`	New procedure documenting prerequisites, secret creation (S3, Kubeflow Pipelines token, model info), `kubeflow-ragas-config` ConfigMap, `LlamaStackDistribution` (`llama-stack-ragas-remote`) manifest using `envFrom` and secret refs, commands to apply manifests, pod/route verification, health checks, and example CLI/Python test snippets.
Inline provider development guide `modules/setting-up-ragas-inline-provider.adoc`	New procedure describing prerequisites, `ragas-inline-config.yaml` ConfigMap (EMBEDDING_MODEL), LlamaStackDistribution manifest with env vars (VLLM_URL, VLLM_API_TOKEN, INFERENCE_MODEL, MILVUS_DB_PATH, VLLM_TLS_VERIFY, FMS_ORCHESTRATOR_URL, EMBEDDING_MODEL), deployment via `oc apply`, pod readiness checks, verification notes, and next-step links.
RAGAS evaluation workflow / notebook `modules/evaluating-rag-system-quality-with-ragas.adoc`	New notebook-driven evaluation guide: prerequisites (cluster/project context, pipeline server, AWS secret, Llama Stack distribution with Ragas), cloning repo, running `basic_demo.ipynb` in a workbench, dataset/benchmark registration, launching and monitoring evaluation runs (KFP/API), and viewing results; includes conditional upstream/non-upstream references.
RAGAS assembly page `assemblies/evaluating-rag-systems.adoc`	New assembly introducing Ragas concepts and core metrics (Faithfulness, Answer Relevancy, Context Precision/Recall, Answer Correctness/Similarity), deployment modes (inline vs remote), use cases, prerequisites, and include links to the procedural modules.
Monitoring docs integration `monitoring-data-science-models.adoc`	Adds include: `assemblies/evaluating-rag-systems.adoc[leveloffset=+1]` inserted after `evaluating-large-language-models.adoc`.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Dev as Developer / Operator
    participant OC as OpenShift CLI/API
    participant LS as Llama Stack Operator
    participant KS as KServe
    participant S3 as S3 Storage
    participant KFP as Kubeflow Pipelines
    rect rgb(240,248,255)
    Dev->>OC: create secrets (S3, KFP token, model info)
    Dev->>OC: apply ConfigMap (embedding model, endpoints, creds)
    Dev->>LS: apply LlamaStackDistribution (remote inference + RAGAS)
    end
    rect rgb(255,250,240)
    LS->>KS: request model deployment
    KS->>S3: pull model / store artifacts
    KS-->>LS: model endpoint ready
    LS-->>Dev: expose route / pod ready
    end
    rect rgb(240,255,240)
    Dev->>KFP: verify Pipelines token & endpoint
    Dev->>LS: trigger test evaluation (Python/KFP client)
    LS->>KFP: submit evaluation run
    KFP-->>Dev: run status / results
    end

sequenceDiagram
    autonumber
    participant Client as Python Client
    participant API as Llama Stack API
    rect rgb(248,249,255)
    Client->>API: create dataset (inputs, responses, contexts)
    API-->>Client: dataset_id
    Client->>API: register benchmark (RAGAS scoring_functions)
    API-->>Client: benchmark_id
    Client->>API: run evaluation job (benchmark_config, model info)
    API-->>Client: job_id
    end
    rect rgb(245,255,250)
    Client->>API: poll job status
    alt job complete
        API-->>Client: results (per-metric scores)
    else job failed
        API-->>Client: error
    end
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Check consistency of ConfigMap keys and envFrom/secret references in the two LlamaStackDistribution manifests.
Verify CLI commands and YAML examples (secrets, ConfigMaps, manifests) in modules/configuring-ragas-remote-provider-for-production.adoc.
Review notebook steps and demo notebook references in modules/evaluating-rag-system-quality-with-ragas.adoc.
Confirm the include placement and leveloffset=+1 in monitoring-data-science-models.adoc.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title '[WIP] RHAIENG-1227:RAGAS Eval Provider' is partially related to the changeset. It references the ticket but is vague and lacks specificity about the actual content—which adds comprehensive documentation for setting up and evaluating RAG systems with Ragas.	Revise the title to clearly describe the main change, e.g., 'Add Ragas evaluation provider documentation for inline and remote deployment' or similar, and remove the [WIP] prefix once ready for merge.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (8)

modules/setting-up-ragas-inline-provider.adoc (3)

180-197: Add error handling to Python API example.

The client initialization and provider listing (lines 187–193) lack error handling. If the Llama Stack route is incorrect or the service is unavailable, users will encounter an unhelpful error.

Apply this diff to add basic error handling:

 from llama_stack_client import LlamaStackClient
 
 # Get the Llama Stack route URL
 # oc get route llama-stack-ragas-inline -o jsonpath='{.spec.host}'
 
-client = LlamaStackClient(base_url="https://__<llama-stack-route-url>__")
+try:
+    client = LlamaStackClient(base_url="https://__<llama-stack-route-url>__")
 
-# List available providers
-providers = client.providers.list()
-print("Available eval providers:")
-for provider in providers.eval:
-    print(f"  - {provider.provider_id} ({provider.provider_type})")
+    # List available providers
+    providers = client.providers.list()
+    print("Available eval providers:")
+    for provider in providers.eval:
+        print(f"  - {provider.provider_id} ({provider.provider_type})")
 
-# Expected output should include:
-#   - trustyai_ragas_inline (inline::trustyai_ragas)
+    # Expected output should include:
+    #   - trustyai_ragas_inline (inline::trustyai_ragas)
+except Exception as e:
+    print(f"Error connecting to Llama Stack: {e}")
+    print("Verify that the Llama Stack route URL is correct and the service is running.")

61-61: Emphasize production TLS requirement in security context.

Line 61 sets VLLM_TLS_VERIFY="false" with a comment referencing production use (line 73). While the note is present, the security implication warrants clearer emphasis for users who might copy-paste without reading carefully.

Consider restructuring the NOTE block to frontload the security concern:

 $ export VLLM_API_TOKEN="__<token_identifier>__"
 
 $ oc create secret generic llama-stack-inference-model-secret \
   --from-literal INFERENCE_MODEL="$INFERENCE_MODEL" \
   --from-literal VLLM_URL="$VLLM_URL" \
   --from-literal VLLM_TLS_VERIFY="$VLLM_TLS_VERIFY" \
   --from-literal VLLM_API_TOKEN="$VLLM_API_TOKEN"
 ----
 +
+[WARNING]
+====
+TLS verification is disabled for demonstration. In production, set `VLLM_TLS_VERIFY` to `true` to enable TLS certificate verification and protect against man-in-the-middle attacks.
+====
+
-[NOTE]
-====
-Replace `__<token_identifier>__` with your actual model API token. In production environments, set `VLLM_TLS_VERIFY` to `true` to enable TLS certificate verification.
-====
+[NOTE]
+====
+Replace `__<token_identifier>__` with your actual model API token.
+====

Also applies to: 73-73

144-151: Clarify pod readiness criteria and add timeout guidance.

The procedure instructs users to wait for pod status to show Running (line 151), but does not explain what "ready" means or specify a reasonable timeout. A hanging pod might silently fail.

Enhance the wait step with a more explicit condition check:

 . Wait for the deployment to complete:
 +
 [source,subs="+quotes"]
 ----
-$ oc get pods -w
+$ oc wait --for=condition=ready pod -l app=llama-stack-ragas-inline --timeout=300s
 ----
 +
-Wait until the `llama-stack-ragas-inline` pod status shows `Running`.
+Wait until the pod status shows `Ready 1/1`. If the timeout is exceeded, check pod logs for errors using:
+
+[source,subs="+quotes"]
+----
+$ oc logs -l app=llama-stack-ragas-inline --tail=50
+----

modules/evaluating-rag-system-quality-with-ragas.adoc (3)

77-77: Use consistent placeholder format.

Line 77 uses <your-llama-stack-url> while other documentation files use __<placeholder>__ format. This inconsistency can confuse users about which parts require substitution.

Apply this diff to align with the project's placeholder convention:
 # Initialize the client
-client = LlamaStackClient(base_url="<your-llama-stack-url>")
+client = LlamaStackClient(base_url="https://__<llama-stack-route-url>__")
80-84: Add error handling and validation to dataset registration.

The dataset registration (lines 80–84) lacks error handling. If the dataset registration fails (e.g., invalid format, network error), users receive no guidance.

Apply this diff to add basic validation and error handling:
 # Register the dataset
-client.datasets.register(
-    dataset_id="my-rag-evaluation",
-    dataset_def=dataset,
-    provider_id="huggingface",  # or your configured provider
-)
+try:
+    client.datasets.register(
+        dataset_id="my-rag-evaluation",
+        dataset_def=dataset,
+        provider_id="huggingface",  # or your configured provider
+    )
+    print("✓ Dataset registered successfully")
+except Exception as e:
+    print(f"✗ Failed to register dataset: {e}")
+    raise
140-151: Add exponential backoff guidance and timeout limit to polling loop.

The job status polling (lines 140–151) uses a fixed 5-second interval with no maximum retry count or timeout. A frozen job could cause the script to run indefinitely.

Apply this diff to add timeout and clearer failure messaging:
 import time
+import sys
 
 # Poll for completion
+max_wait_seconds = 3600  # 1 hour timeout
+elapsed = 0
 while True:
     job_status = client.eval.job_status(job_id=response.job_id)
 
     if job_status.status == "completed":
         print("Evaluation completed successfully")
         break
     elif job_status.status == "failed":
         print(f"Evaluation failed: {job_status.error}")
-        break
+        sys.exit(1)
     else:
         print(f"Status: {job_status.status}, Progress: {job_status.progress}%")
+        elapsed += 5
+        if elapsed > max_wait_seconds:
+            print(f"✗ Evaluation timeout after {max_wait_seconds} seconds")
+            sys.exit(1)
         time.sleep(5)

modules/configuring-ragas-remote-provider-for-production.adoc (2)

126-140: Add fallback guidance if Llama Stack route is not automatically created.

The procedure assumes the external route is automatically created during LSD deployment (line 133), but provides no troubleshooting steps if the route does not exist.

Add a note clarifying what happens if the route is missing:

 . Create an external route for the Llama Stack server:
 +
 [NOTE]
 ====
 The KUBEFLOW_LLAMA_STACK_URL must be an external route that is accessible from the Kubeflow Pipeline pods.
 ====
 +
 The route is automatically created when you deploy the LlamaStackDistribution. You can get the route URL with the following command:
 +
+If the route is not automatically created, verify that the LlamaStackDistribution CR was deployed successfully:
+
+[source,subs="+quotes"]
+----
+$ oc get llamastackdistribution llama-stack-ragas-remote -o jsonpath='{.status.conditions[*].message}'
+----
+
 [source,subs="+quotes"]
 ----
 $ export LLAMA_STACK_URL=$(oc get route llama-stack-ragas-remote -o jsonpath='{.spec.host}')
 
 $ echo "Llama Stack URL: https://$LLAMA_STACK_URL"
+
+If the command returns an error, the LSD deployment may still be in progress. Wait a few moments and retry.
 ----

253-299: Add error handling and retry logic to test evaluation.

The test evaluation (lines 269–299) and provider listing (lines 253–267) lack error handling. Connection failures or invalid credentials will surface only as cryptic exceptions.

Apply this diff to add error handling and clearer diagnostic output:

 from llama_stack_client import LlamaStackClient
 
-client = LlamaStackClient(base_url="https://__<llama-stack-route-url>__")
+try:
+    client = LlamaStackClient(base_url="https://__<llama-stack-route-url>__")
+except Exception as e:
+    print(f"✗ Failed to connect to Llama Stack: {e}")
+    print("Verify that the Llama Stack URL is correct and the service is running.")
+    raise
 
 # List available eval providers
-providers = client.providers.list()
-print("Available eval providers:")
-for provider in providers.eval:
-    print(f"  - {provider.provider_id} ({provider.provider_type})")
+try:
+    providers = client.providers.list()
+    print("✓ Available eval providers:")
+    for provider in providers.eval:
+        print(f"  - {provider.provider_id} ({provider.provider_type})")
+except Exception as e:
+    print(f"✗ Failed to list providers: {e}")
+    raise
 
 # Expected output should include:
 #   - trustyai_ragas_remote (remote::trustyai_ragas)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6343c30 and ecf7994.

📒 Files selected for processing (4)

modules/configuring-ragas-remote-provider-for-production.adoc (1 hunks)
modules/evaluating-rag-system-quality-with-ragas.adoc (1 hunks)
modules/setting-up-ragas-inline-provider.adoc (1 hunks)
monitoring-data-science-models.adoc (1 hunks)

🔇 Additional comments (1)

monitoring-data-science-models.adoc (1)

32-32: LGTM—include reference correctly positioned.

The addition integrates RAG evaluation documentation into the monitoring flow following LLM evaluation, with proper syntax and formatting consistency.

modules/configuring-ragas-remote-provider-for-production.adoc

modules/evaluating-rag-system-quality-with-ragas.adoc

modules/setting-up-ragas-inline-provider.adoc

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36d6af2 and 31de171.

📒 Files selected for processing (1)

modules/configuring-ragas-remote-provider-for-production.adoc (1 hunks)

🔇 Additional comments (2)

modules/configuring-ragas-remote-provider-for-production.adoc (2)

142-145: I'll verify the technical claims in this review by searching for the official DSPA and OpenShift AI documentation on route naming and operator labels.

Verify the DSPA route configuration against your deployed instances.

This review suggests updating the route discovery method but the specific patterns referenced cannot be fully verified against official OpenShift AI documentation. One documented example uses the route name ds-pipeline-pipelines-definition, which differs from both the current code and the suggested alternatives.

Test the proposed changes against your actual DSPA instances to confirm which route discovery method works correctly in your environment:

Verify the current label selector returns the expected KFP route

Test the suggested route name pattern with your DSPA instance name

Confirm the alternative label selector works reliably

Once validated, update the documentation with the confirmed approach.

412-414: Verify/update Next steps links — modules/configuring-ragas-remote-provider-for-production.adoc (lines 412–414)

The Next steps entry links to {odhdocshome}/working-with-rag/#evaluating-rag-system-quality-with-ragas_rag; modules/evaluating-rag-system-quality-with-ragas.adoc exists, but modules/integrating-ragas-into-cicd-pipelines.adoc and modules/monitoring-ragas-evaluation-pipelines.adoc are not present in modules/.

Before merge: either create the two missing files, update the links to point to existing documentation (or the correct working-with-rag anchor), or replace the references with a "coming soon" placeholder.

modules/configuring-ragas-remote-provider-for-production.adoc

assemblies/evaluating-rag-systems.adoc

adolfo-ab · 2025-11-03T15:46:55Z

modules/configuring-ragas-remote-provider-for-production.adoc

+[role='_abstract']
+You can configure the RAGAS remote provider to run evaluations as distributed jobs using {productname-short} pipelines. The remote provider enables production-scale evaluations by running RAGAS in a separate Kubeflow Pipelines environment, providing resource isolation, improved scalability, and integration with CI/CD workflows.
+
+.Prerequisites


Should we mention installing the RHOAI operator, deploying DSCI, and DSC with LlamaStack component enabled?

@adolfo-ab That's a good idea. i will add those. In the DSC, what is the exact path of the spec.component that needs to be enabled? Is it the spec.components.llamastackoperator or something else?

I've added something at line 12, and I'll update it with more correct info when I hear back on this.

Answered in L12

modules/configuring-ragas-remote-provider-for-production.adoc

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

modules/configuring-ragas-remote-provider-for-production.adoc (2)
142-145: Replace unsupported label selector with DSPA instance-name route pattern (Duplicate from past review).

Line 142 uses the label selector app=ds-pipeline-dspa to retrieve the Kubeflow Pipelines endpoint, but this label is not a published/supported selector in the Data Science Pipelines Operator. Use the documented route naming pattern instead: ds-pipeline-ui-<DSPA_INSTANCE_NAME> or query the actual route by component label.

Apply this diff to use the documented route naming pattern:
-$ export KFP_ENDPOINT=$(oc get route -n <project_name> -l app=ds-pipeline-dspa -o jsonpath='{.items[0].spec.host}')
+$ export KFP_ENDPOINT=$(oc get route ds-pipeline-ui-__<dspa_instance_name>__ -n <project_name> -o jsonpath='{.spec.host}')
Alternatively, if the DSPA instance name is not known, provide guidance to list available routes:
+# To find your DSPA instance name and route:
+$ oc get route -n <project_name> | grep ds-pipeline
+# Then use: oc get route ds-pipeline-ui-<your_instance_name> -n <project_name> -o jsonpath='{.spec.host}'
128-129: Add security advisory for disabled TLS verification (Duplicate from past review with enhancement).

Lines 128–129 set VLLM_TLS_VERIFY="false" with only an inline comment. While the comment acknowledges this is not production-safe, no structured security warning block is provided. Add a prominent [IMPORTANT] admonition to clearly warn against this practice in production.

Apply this diff to add security guidance immediately after line 128:
 $ export VLLM_TLS_VERIFY="false"  # Use "true" in production
 $ export VLLM_API_TOKEN="<token_identifier>"
+
+[IMPORTANT]
+====
+*Never disable TLS verification in production environments.* Setting `VLLM_TLS_VERIFY=false` 
+exposes your system to man-in-the-middle (MITM) attacks. This setting is only appropriate 
+for development and testing with self-signed certificates.
+
+For production deployments:
+
+* Set `VLLM_TLS_VERIFY=true`
+* Use a valid CA-signed certificate for your inference service
+* If using self-signed certificates in production, add your internal CA to the system trust store instead of disabling verification
+* Rotate `VLLM_API_TOKEN` regularly and store it securely in a secret
+====
Also applies to: 217-223

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31de171 and a38d2fd.

📒 Files selected for processing (4)

assemblies/evaluating-rag-systems.adoc (1 hunks)
modules/configuring-ragas-remote-provider-for-production.adoc (1 hunks)
modules/evaluating-rag-system-quality-with-ragas.adoc (1 hunks)
modules/setting-up-ragas-inline-provider.adoc (1 hunks)

✅ Files skipped from review due to trivial changes (1)

assemblies/evaluating-rag-systems.adoc

🚧 Files skipped from review as they are similar to previous changes (2)

modules/setting-up-ragas-inline-provider.adoc
modules/evaluating-rag-system-quality-with-ragas.adoc

modules/configuring-ragas-remote-provider-for-production.adoc

assemblies/evaluating-rag-systems.adoc

dmaniloff

Thank you @skrthomas !! Left some comments w/ some suggestions.

assemblies/evaluating-rag-systems.adoc

modules/configuring-ragas-remote-provider-for-production.adoc

dmaniloff · 2025-11-04T20:54:31Z

modules/evaluating-rag-system-quality-with-ragas.adoc

+
+* You have access to a workbench or notebook environment where you can run Python code.
+
+.Procedure


same comment as above this is just duplicating the contents of https://github.com/trustyai-explainability/llama-stack-provider-ragas/blob/main/demos/basic_demo.ipynb which is going to be the more up to date version so we should just link to that.

Okay @dmaniloff @adolfo-ab I re-did this procedure to focus more on the steps a user would need to take to run the Jupyter notebook instead of trying to re-create the notebook in the docs.

I think it's overkill now. IMO explaining how to create a workbench and clone the upstream repo is a bit out of scope for these docs and we're basically redirecting the user somewhere else (the upstream repo) to learn how to use the provider. Personally I think it would be better to put less emphasis on creating the workbench and show the code snippets here, briefly explaining what we're doing on each step (registering the dataset, registering the benchmark, running an evaluation, checking the results). But I leave the final decision up to you guys @dmaniloff @skrthomas

I wasn't sure what the best approach was. When I talked with the AI docs team about it, the example use case I got was this https://opendatahub.io/docs/working-with-data-in-s3-compatible-object-store/, and it seemed like they went through the steps for cloning the repo and creating the workbench. My initial inclination was to do as you say, replicate the code snippets and explain them in procedural steps, which is what I originally had (although I understand there were some inaccuracies in how the Python ultimately got copied over). In OCP docs, this is the approach we would usually take (replication rather than linking) because we don't ever link to upstream repos in OCP. In RHOAI, however, it appears that the guidance is different. The content strategist for our team advised the following when I was asking the team about which direction was best for this:

The notebooks are upstream but they have been worked on extensively by engineering teams. Also, by cloning the notebooks, users can run them in their local environment or on runtime in a cluster. That is something that they can't do in the docs.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

assemblies/evaluating-rag-systems.adoc (1)
5-5: AsciiDoc ID should use lowercase convention.

The anchor ID uses a capital "R" in Ragas, but AsciiDoc convention typically uses lowercase for IDs. Consider updating to evaluating-rag-systems-with-ragas_{context} for consistency with standard documentation patterns.
-[id="evaluating-rag-systems-with-Ragas_{context}"]
+[id="evaluating-rag-systems-with-ragas_{context}"]

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56d40e0 and 4afaddb.

📒 Files selected for processing (2)

assemblies/evaluating-rag-systems.adoc (1 hunks)
modules/evaluating-rag-system-quality-with-ragas.adoc (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

modules/evaluating-rag-system-quality-with-ragas.adoc

🔇 Additional comments (1)

assemblies/evaluating-rag-systems.adoc (1)

1-86: Content structure and messaging align well with prior review feedback.

The documentation correctly:

Uses "Ragas" capitalization (not "RAGAS") as per upstream docs convention

Distinguishes between the Ragas product and the Ragas provider pattern

Includes external links to authoritative documentation

Removes high-level prerequisites in favor of procedure-specific requirements

Clearly separates inline (development) vs. remote (production) deployment modes with appropriate trade-offs

The content is well-structured, the use cases are comprehensive, and the metrics explanations are clear. The integration of this assembly into the monitoring documentation should help engineers understand RAG evaluation workflows.

assemblies/evaluating-rag-systems.adoc

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

modules/configuring-ragas-remote-provider-for-production.adoc (1)
126-134: Consider expanding TLS security guidance with an IMPORTANT admonition block.

The inline comment on line 126 acknowledges the production concern, but past review feedback indicates an explicit IMPORTANT block should be added with comprehensive security guidance (e.g., MITM risks, proper certificate handling, credential rotation).

Consider adding the following IMPORTANT block after line 134 to enhance security guidance:
+
+[IMPORTANT]
+====
+Never disable TLS verification in production environments. Setting `VLLM_TLS_VERIFY=false` 
+exposes your system to man-in-the-middle (MITM) attacks. This setting is only appropriate 
+for development and testing with self-signed certificates. For production, use a valid 
+certificate and set `VLLM_TLS_VERIFY=true`. Ensure credentials like `VLLM_API_TOKEN` and 
+S3 keys are kept secure and rotated regularly.
+====

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4afaddb and 732ebab.

📒 Files selected for processing (1)

modules/configuring-ragas-remote-provider-for-production.adoc (1 hunks)

modules/configuring-ragas-remote-provider-for-production.adoc

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

modules/configuring-ragas-remote-provider-for-production.adoc (2)
315-315: Fix pod name inconsistency in deployment verification step.

Line 315 references the wrong pod name. The YAML at line 186 defines the pod as llama-stack-pod, but the instruction tells users to wait for llama-stack-ragas-remote, causing verification to fail.
-Wait until the `llama-stack-ragas-remote` pod status shows `Running`.
+Wait until the `llama-stack-pod` pod status shows `Running`.
126-127: Enhance TLS verification security guidance with explicit warnings.

Lines 126-127 set VLLM_TLS_VERIFY="false" but provide insufficient security context. Add a dedicated admonition block immediately after line 127 with production-environment warnings and safe alternatives.

Apply this diff to add comprehensive security guidance:
$ export VLLM_API_TOKEN="<token_identifier>"

+[IMPORTANT]
+====
+Never disable TLS verification in production environments. Setting `VLLM_TLS_VERIFY=false` 
+exposes your system to man-in-the-middle (MITM) attacks and is a critical security risk.
+
+For production deployments:
+
+* Set `VLLM_TLS_VERIFY=true` and use a valid CA-signed certificate, or
+* Add your internal CA to the system trust store in the container image, or
+* Use a dedicated development/testing namespace with explicit CA trust rather than disabling verification.
+
+Ensure `VLLM_API_TOKEN` and other credentials are also secured appropriately (rotated regularly, accessed only by authorized service accounts).
+====
+
$ oc create secret generic llama-stack-inference-model-secret \

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 732ebab and 3790df2.

📒 Files selected for processing (1)

modules/configuring-ragas-remote-provider-for-production.adoc (1 hunks)

🔇 Additional comments (1)

modules/configuring-ragas-remote-provider-for-production.adoc (1)
318-320: Close unclosed ifdef::upstream[] block to fix AsciiDoc syntax error.

The file ends with an open ifdef::upstream[] block (line 318) without a closing endif::[] directive. This breaks AsciiDoc rendering and prevents the documentation from being built.

Add endif::[] after the link:
 * link:{odhdocshome}/working-with-rag/#evaluating-rag-system-quality-with-ragas_rag[Evaluating RAG system quality with Ragas metrics]
+endif::[]
Also verify that the "Next steps" section is complete. If additional steps or links are intended for cloud-service or self-managed environments, add corresponding ifdef blocks and content before the closing endif::[].

Likely an incorrect or invalid review comment.

modules/configuring-ragas-remote-provider-for-production.adoc

skrthomas · 2025-11-05T20:24:12Z

modules/setting-up-ragas-inline-provider.adoc

+  name: llama-stack-ragas-inline
+  namespace: <project_name>
+spec:
+  imageName: quay.io/trustyai/llama-stack-ragas:latest


@dmaniloff did you ever hear back about the image name we can use in the product docs? cc @adolfo-ab

quay.io/rhoai/odh-trustyai-ragas-lls-provider-dsp-rhel9:rhoai-3.0

See my comment above about the spec

Sorry, I think I got image questions mixed up. I was meaning to ask about the KUBEFLOW_BASE_IMAGE in the Ragas remote provider ConfigMap.

dmaniloff · 2025-11-06T01:38:00Z

modules/evaluating-rag-system-quality-with-ragas.adoc

+ifndef::upstream[]
+For more information, see link:{rhoaidocshome}evaluating-rag-systems/setting-up-Ragas-inline-provider[Setting up the Ragas inline provider for development].
+endif::[]
+* You have access to a workbench or notebook environment where you can run Python code.


the llama stack server can take requests over http. the python client (which is what we use in the demo notebook to verify setup) is one of many clients that can talk to llama stack.

therefore, this is not strictly a requirement. i think i would phrase this part along the following lines:

"to test your setup, read the demo notebok in (link to the 0.4.2 version of the ragas provider) which outlines the basic steps via the python client.

alternatively, you can also submit an eval by directly using the http methods of the llama stack api (link to the llama stack 0.3.0 docs).

if you do wish to follow the example provided in the notebook, you can do so from the Jupyter environment via the following steps (and then list your steps below, also mention that the llama stack pod will have to be accessible from the jupyter env in the cluster which may not be the case by default).

@dmaniloff thanks for this info. I will incorporate this to the intro! Just wanting to check, when you say link to the 3.0 llama stack api in the 3.0 docs, do you mean the 3.0 version of this: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.25/html/working_with_llama_stack/overview-of-llama-stack_rag#the_llamastackdistribution_custom_resource_api_providers

Or something else?

cc @kelbrown20

adolfo-ab · 2025-11-06T09:01:53Z

modules/configuring-ragas-remote-provider-for-production.adoc

+.Prerequisites
+* You have cluster administrator privileges for your {openshift-platform} cluster.
+* You have installed the {productname-short} Operator.
+* You have a DataScienceCluster custom resource in your environment; in the `spec.components` section the `llamastack` component is set to `enabled/true`.


Suggested change

* You have a DataScienceCluster custom resource in your environment; in the `spec.components` section the `llamastack` component is set to `enabled/true`.

* You have a DataScienceCluster custom resource in your environment; in the `spec.components` section the `llamastackoperator` component should be enabled:

llamastackoperator:

managementState: Managed

adolfo-ab · 2025-11-06T09:02:03Z

modules/configuring-ragas-remote-provider-for-production.adoc

+[role='_abstract']
+You can configure the RAGAS remote provider to run evaluations as distributed jobs using {productname-short} pipelines. The remote provider enables production-scale evaluations by running RAGAS in a separate Kubeflow Pipelines environment, providing resource isolation, improved scalability, and integration with CI/CD workflows.
+
+.Prerequisites


Answered in L12

adolfo-ab · 2025-11-06T09:04:40Z

modules/evaluating-rag-system-quality-with-ragas.adoc

+.Prerequisites
+* You have logged in to {productname-long}.
+* You have created a data science project.
+* You have created a data science pipeline.


Suggested change

* You have created a data science pipeline.

* You have created a data science pipeline server.

adolfo-ab · 2025-11-06T09:05:21Z

modules/evaluating-rag-system-quality-with-ragas.adoc

+* You have created a data science project.
+* You have created a data science pipeline.
+* You have created a secret for your AWS credentials in your project namespace.
+* You have deployed a Llama Stack distribution with the Ragas evaluation provider enabled, either Inline or Remote provider.


Suggested change

* You have deployed a Llama Stack distribution with the Ragas evaluation provider enabled, either Inline or Remote provider.

* You have deployed a Llama Stack distribution with the Ragas evaluation provider enabled (either Inline or Remote).

adolfo-ab · 2025-11-06T09:06:41Z

modules/evaluating-rag-system-quality-with-ragas.adoc

+ifndef::upstream[]
+For more information, see link:{rhoaidocshome}evaluating-rag-systems/setting-up-Ragas-inline-provider[Setting up the Ragas inline provider for development].
+endif::[]
+* You have access to a workbench or notebook environment where you can run Python code.


adolfo-ab · 2025-11-06T09:15:25Z

modules/evaluating-rag-system-quality-with-ragas.adoc

+
+* You have access to a workbench or notebook environment where you can run Python code.
+
+.Procedure


I think it's overkill now. IMO explaining how to create a workbench and clone the upstream repo is a bit out of scope for these docs and we're basically redirecting the user somewhere else (the upstream repo) to learn how to use the provider. Personally I think it would be better to put less emphasis on creating the workbench and show the code snippets here, briefly explaining what we're doing on each step (registering the dataset, registering the benchmark, running an evaluation, checking the results). But I leave the final decision up to you guys @dmaniloff @skrthomas

adolfo-ab · 2025-11-06T09:21:45Z

modules/setting-up-ragas-inline-provider.adoc

+metadata:
+  name: llama-stack-ragas-inline
+  namespace: <project_name>
+spec:


Unless a lot of things have changed recently with LlamaStackDistribution, this spec is not correct, it should be something like this:

spec: replicas: 1 server: containerSpec: env: - name: VLLM_URL value: <model_url> - name: VLLM_API_TOKEN value: <model_api_token (if necessary)> - name: INFERENCE_MODEL value: <model_name> - name: MILVUS_DB_PATH value: ~/.llama/milvus.db - name: VLLM_TLS_VERIFY value: "false" - name: FMS_ORCHESTRATOR_URL value: http://localhost:123 - name: EMBEDDING_MODEL value: granite-embedding-125m

AFAIK the only "extra" field that we need for Ragas inline is the EMBEDDING_MODEL.
cc: @dmaniloff @Bobbins228 @Artemon-line

Okay gotcha. I updated to exactly this in the docs. @dmaniloff @Bobbins228 can you confirm that the above YAML is sufficient for the Inline LlamaStackDistribution example? I think there was a lot of incorrect spec inferred from Claude code because in the resources, only EMBEDDING_MODEL is specifically mentioned for inline. But yeah, I can see that this is really off.

I thought EMBEDDING_MODEL is needed for both inline and remote though https://trustyai.org/llama-stack-provider-ragas/llama-stack-provider-ragas/0.1/remote-provider.html

Correct, EMBEDDING_MODEL is needed for both.

adolfo-ab · 2025-11-06T09:22:04Z

modules/setting-up-ragas-inline-provider.adoc

+  name: llama-stack-ragas-inline
+  namespace: <project_name>
+spec:
+  imageName: quay.io/trustyai/llama-stack-ragas:latest


See my comment above about the spec

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 081f671 and 6b8cd49.

📒 Files selected for processing (3)

modules/configuring-ragas-remote-provider-for-production.adoc (1 hunks)
modules/evaluating-rag-system-quality-with-ragas.adoc (1 hunks)
modules/setting-up-ragas-inline-provider.adoc (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

modules/configuring-ragas-remote-provider-for-production.adoc
modules/setting-up-ragas-inline-provider.adoc

🔇 Additional comments (2)

modules/evaluating-rag-system-quality-with-ragas.adoc (2)

17-29: Prerequisites section aligns with earlier feedback.

The prerequisites section properly incorporates the wording suggestions from prior review feedback ("created a data science pipeline server" and "Ragas evaluation provider enabled (Inline or Remote)") and includes appropriate conditional documentation links.

31-66: Procedure structure and workflow are sound.

The step-by-step procedure effectively guides users through the evaluation workflow: workbench setup, repository cloning, notebook execution, pipeline monitoring, and results inspection. The approach of linking to the upstream notebook rather than duplicating content aligns with team discussions and the RHOAI documentation guidance on leveraging upstream repositories.

modules/evaluating-rag-system-quality-with-ragas.adoc

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b8cd49 and 0e96a57.

📒 Files selected for processing (4)

assemblies/evaluating-rag-systems.adoc (1 hunks)
modules/configuring-ragas-remote-provider-for-production.adoc (1 hunks)
modules/evaluating-rag-system-quality-with-ragas.adoc (1 hunks)
modules/setting-up-ragas-inline-provider.adoc (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

modules/setting-up-ragas-inline-provider.adoc
modules/configuring-ragas-remote-provider-for-production.adoc

🔇 Additional comments (4)

assemblies/evaluating-rag-systems.adoc (2)

5-5: Verify ID declaration capitalization for consistency.

Line 5 uses uppercase "Ragas" in the ID, but the included files (lines 79–83) use lowercase ragas in filenames. For consistency with typical AsciiDoc conventions and to ensure cross-references work reliably on case-sensitive systems, the ID should use lowercase ragas: [id="evaluating-rag-systems-with-ragas_{context}"].

Confirm this matches the capitalization pattern used in the target module files (modules/setting-up-ragas-inline-provider.adoc, modules/configuring-ragas-remote-provider-for-production.adoc, and modules/evaluating-rag-system-quality-with-ragas.adoc).

79-83: Include statements are correct.

The filenames use consistent lowercase ragas convention. Past issue flagged on line 81 appears to be resolved.

modules/evaluating-rag-system-quality-with-ragas.adoc (2)

3-3: Standardize ID declaration capitalization to lowercase.

Line 3 uses uppercase "Ragas" in the ID: evaluating-rag-system-quality-with-Ragas_{context}. To match the lowercase convention used in related files and filenames, change this to evaluating-rag-system-quality-with-ragas_{context}.

Confirm this aligns with how the ID is referenced in parent assembly or other cross-reference locations.

40-40: Image reference is valid.

The image file images/open.png exists in the repository and is correctly referenced on line 40. No changes needed.

coderabbitai · 2025-11-07T20:18:17Z

modules/evaluating-rag-system-quality-with-ragas.adoc

+* You have created a secret for your AWS credentials in your project namespace.
+* You have deployed a Llama Stack distribution with the Ragas evaluation provider enabled (Inline or Remote).
+ifdef::upstream[]
+For more information, see link:{odhdocshome}/evaluating-rag-systems/configuring-Ragas-remote-provider-for-production[Configuring the Ragas remote provider for production].


⚠️ Potential issue | 🔴 Critical

Fix capitalization in link target IDs to match actual declarations.

Lines 24 and 27 reference link targets with uppercase "Ragas" (configuring-Ragas-remote-provider-for-production and setting-up-Ragas-inline-provider), but the actual ID declarations in the target files use lowercase ragas. This will cause links to fail on case-sensitive filesystems.

Update line 24 from configuring-Ragas-remote-provider-for-production to configuring-ragas-remote-provider-for-production and line 27 from setting-up-Ragas-inline-provider to setting-up-ragas-inline-provider.

ifdef::upstream[] - For more information, see link:{odhdocshome}/evaluating-rag-systems/configuring-Ragas-remote-provider-for-production[Configuring the Ragas remote provider for production]. + For more information, see link:{odhdocshome}/evaluating-rag-systems/configuring-ragas-remote-provider-for-production[Configuring the Ragas remote provider for production]. endif::[] ifndef::upstream[] - For more information, see link:{rhoaidocshome}evaluating-rag-systems/setting-up-Ragas-inline-provider[Setting up the Ragas inline provider for development]. + For more information, see link:{rhoaidocshome}evaluating-rag-systems/setting-up-ragas-inline-provider[Setting up the Ragas inline provider for development]. endif::[]

Also applies to: 27-27

🤖 Prompt for AI Agents

In modules/evaluating-rag-system-quality-with-ragas.adoc around lines 24 and 27, the link target IDs use uppercase "Ragas" which don't match the actual lowercase declarations; update line 24's target from configuring-Ragas-remote-provider-for-production to configuring-ragas-remote-provider-for-production and line 27's target from setting-up-Ragas-inline-provider to setting-up-ragas-inline-provider so the link IDs match exactly (case-sensitive).

coderabbitai bot reviewed Oct 30, 2025

View reviewed changes

skrthomas marked this pull request as draft October 30, 2025 02:26

skrthomas changed the title ~~RHAIENG-1227:RAGAS Eval Provider~~ [WIP] RHAIENG-1227:RAGAS Eval Provider Oct 30, 2025

RHAIENG-1227:RAGAS Eval Provider

26aa719

skrthomas force-pushed the RHAIENG-1226 branch from ecf7994 to 26aa719 Compare October 30, 2025 02:37

skrthomas added 3 commits October 30, 2025 11:18

removing links

fffb8b0

first pass validation of claude

cd50e12

updating environment variable information

36d6af2

skrthomas force-pushed the RHAIENG-1226 branch from d04705f to 36d6af2 Compare November 3, 2025 14:30

skrthomas marked this pull request as ready for review November 3, 2025 14:33

skrthomas requested a review from adolfo-ab November 3, 2025 14:33

Updating KUBEFLOW_LLAMA_STACK_URL step

31de171

coderabbitai bot reviewed Nov 3, 2025

View reviewed changes

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

modules/configuring-ragas-remote-provider-for-production.adoc Show resolved Hide resolved

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

adolfo-ab reviewed Nov 3, 2025

View reviewed changes

skrthomas requested review from Bobbins228 and dmaniloff November 3, 2025 16:07

skrthomas commented Nov 3, 2025

View reviewed changes

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

Post-QE review comments

a38d2fd

coderabbitai bot reviewed Nov 3, 2025

View reviewed changes

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

modules/configuring-ragas-remote-provider-for-production.adoc Show resolved Hide resolved

coderabbit

56d40e0

skrthomas commented Nov 4, 2025

View reviewed changes

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

skrthomas commented Nov 4, 2025

View reviewed changes

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

skrthomas commented Nov 4, 2025

View reviewed changes

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

dmaniloff reviewed Nov 4, 2025

View reviewed changes

assemblies/evaluating-rag-systems.adoc Outdated Show resolved Hide resolved

dmaniloff reviewed Nov 4, 2025

View reviewed changes

assemblies/evaluating-rag-systems.adoc Outdated Show resolved Hide resolved

dmaniloff suggested changes Nov 4, 2025

View reviewed changes

SME feedback

4afaddb

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

assemblies/evaluating-rag-systems.adoc Show resolved Hide resolved

SME Feedback

732ebab

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

modules/configuring-ragas-remote-provider-for-production.adoc Outdated Show resolved Hide resolved

skrthomas force-pushed the RHAIENG-1226 branch from 3790df2 to c62de3c Compare November 5, 2025 20:17

updated step due to misunderstanding

caa9b7f

skrthomas force-pushed the RHAIENG-1226 branch from c62de3c to caa9b7f Compare November 5, 2025 20:20

skrthomas commented Nov 5, 2025

View reviewed changes

dmaniloff reviewed Nov 6, 2025

View reviewed changes

adolfo-ab reviewed Nov 6, 2025

View reviewed changes

skrthomas added 2 commits November 6, 2025 09:39

nits

ffec987

SME/QE feedback updates

6b8cd49

skrthomas force-pushed the RHAIENG-1226 branch from 081f671 to 6b8cd49 Compare November 7, 2025 02:36

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

modules/evaluating-rag-system-quality-with-ragas.adoc Outdated Show resolved Hide resolved

redhat-docs-reviewer claude

0e96a57

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

dmaniloff approved these changes Nov 7, 2025

View reviewed changes


		* You have access to a workbench or notebook environment where you can run Python code.

		.Procedure

-* You have a DataScienceCluster custom resource in your environment; in the `spec.components` section the `llamastack` component is set to `enabled/true`.
+* You have a DataScienceCluster custom resource in your environment; in the `spec.components` section the `llamastackoperator` component should be enabled:
+    llamastackoperator:
+      managementState: Managed

	* You have created a data science pipeline.
	* You have created a data science pipeline server.

	* You have deployed a Llama Stack distribution with the Ragas evaluation provider enabled, either Inline or Remote provider.
	* You have deployed a Llama Stack distribution with the Ragas evaluation provider enabled (either Inline or Remote).

[WIP] RHAIENG-1227:RAGAS Eval Provider #1028

Are you sure you want to change the base?

[WIP] RHAIENG-1227:RAGAS Eval Provider #1028

Uh oh!

Conversation

skrthomas commented Oct 30, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dmaniloff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skrthomas Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

skrthomas commented Oct 30, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 30, 2025 •

edited

Loading

skrthomas Nov 7, 2025 •

edited

Loading