Llama stack assembly + 1 of 2 scenarios for lmeval #926

BSynRedhat · 2025-09-05T17:00:58Z

Llama stack assembly + 1 of 2 scenarios for lmeval

Summary by CodeRabbit

Documentation
- Added a new "Using Llama Stack with TrustyAI" assembly with three tutorials (LM-Eval external eval, custom evaluations, Guardrails Orchestrator).
- Added detailed guides for the Llama Stack external evaluation provider and for running custom evaluations with LMEval.
- Linked the new assembly from the monitoring data science models page.
- Adjusted Guardrails Orchestrator prerequisites (non-upstream condition and minimum version lowered to 2.20+).
- Removed an outdated include from the configuration assembly.

coderabbitai · 2025-09-05T17:01:05Z

Walkthrough

Removed a Llama‑Stack include from the Guardrails Orchestrator assembly, flipped a conditional and lowered the non‑upstream minimum version in the Guardrails‑with‑Llama‑Stack module, added multiple Llama Stack + TrustyAI modules/assembly (LM‑Eval provider and custom eval guides), and included the new assembly in monitoring docs.

Changes

Cohort / File(s)	Summary of Changes
Guardrails Orchestrator assembly `assemblies/configuring-the-guardrails-orchestrator-service.adoc`	Removed the include of the Llama Stack module (`modules/using-guardrails-orchestrator-with-llama-stack.adoc`).
Guardrails Orchestrator module adjustments `modules/using-guardrails-orchestrator-with-llama-stack.adoc`	Changed non-upstream conditional from `ifdef::upstream[]` to `ifndef::upstream[]` and lowered non-upstream minimum version from `2.29+` to `2.20+`.
TrustyAI + Llama Stack assembly & modules `modules/using-llama-stack-with-trustyai.adoc`, `modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc`, `modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc`	Added new assembly `using-llama-stack-with-trustyai.adoc` and two detailed modules: an LM‑Eval external eval provider procedure and a custom-evaluations guide for LMEval + Llama Stack (prereqs, server setup, provider configs, Python client workflows, job polling, results).
Monitoring assembly update `monitoring-data-science-models.adoc`	Inserted include `assemblies/using-llama-stack-with-trustyai.adoc[leveloffset=+1]` and added a comment line.
New Guardrails Orchestrator assembly `assemblies/guardrails-orchestrator-service.adoc`	Added new assembly describing Guardrails Orchestrator service, context/conditional blocks, and references to related modules.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Client as Python client
  participant Llama as Llama Stack Server
  participant Provider as TrustyAI LM‑Eval Provider
  participant LM as LM‑Eval Harness
  participant Model as Deployed Model

  User->>Client: configure env & start
  Client->>Llama: list / register benchmark
  Client->>Llama: submit eval job (benchmark, model, params)
  activate Llama
  Llama->>Provider: schedule evaluation task
  activate Provider
  Provider->>LM: run benchmark (requests -> Model)
  activate LM
  LM->>Model: query with sampling params
  Model-->>LM: responses
  LM-->>Provider: scores & artifacts
  deactivate LM
  Provider-->>Llama: job results / status
  deactivate Provider
  Llama-->>Client: job status (polled until complete/failed)
  deactivate Llama
  Client-->>User: present final scores/results
  note over Client,Llama: async polling loop until completion or failure

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbe7a57 and 2b5c7f1.

📒 Files selected for processing (1)

assemblies/guardrails-orchestrator-service.adoc (1 hunks)

✅ Files skipped from review due to trivial changes (1)

assemblies/guardrails-orchestrator-service.adoc

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

modules/using-guardrails-orchestrator-with-llama-stack.adoc (2)
139-151: Step order/duplication: register shield before port-forward is impossible.

Step 5 says “use /v1/shields,” but the port-forward is only opened in the next step. Also, the “Use the /v1/shields…” heading is duplicated. Reorder to open the port-forward first, then register, and drop the earlier duplicate sentence.

Apply:
- [start=5]
-. Once the Llama Stack server is running, use the `/v1/shields` endpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII).
-
-. Open a port-forward to access it locally:
+[start=5]
+. Open a port-forward to access the Llama Stack server locally:
@@
- . Use the `/v1/shields` endpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII): 
+ . Use the `/v1/shields` endpoint to dynamically register a shield. For example, register a shield that uses regex patterns to detect personally identifiable information (PII): 
Also applies to: 149-171

110-111: Namespace placeholder is inconsistent with earlier variable.

You create $PROJECT_NAME earlier but the CR uses <PROJECT_NAMESPACE>. Use the same variable to avoid confusion.
-  namespace: <PROJECT_NAMESPACE>
+  namespace: ${PROJECT_NAME}

🧹 Nitpick comments (5)

modules/using-llama-stack-with-trustyai.adoc (1)
5-13: Fix branding/capitalization and wording consistency.

Use “Llama Stack” and “TrustyAI” consistently; avoid mixed “llama stack/Llama-stack”.
-= Using llama stack with Trusty AI
+= Using Llama Stack with TrustyAI
@@
-The following sections describe how to work with llama stack and provide example use cases:
+The following sections describe how to work with Llama Stack and provide example use cases:
@@
-* Running custom evaluations with LMEval Llama Stack external eval provider
-* Use the trustyai-fms Guardrails Orchestrator with Llama-stack
+* Running custom evaluations with the LMEval Llama Stack external eval provider
+* Use the trustyai-fms Guardrails Orchestrator with Llama Stack
modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc (4)
32-38: Misleading phrasing: virtualenv is not “in your DataScienceCluster”.

Virtualenv runs on the workstation/notebook, not in the CR. Reword.
-. Configure a Python virtual environment for this tutorial in your `DataScienceCluster`:
+. Configure a Python virtual environment for this tutorial on your workstation or in a Jupyter workbench terminal:
60-66: CLI invocation may be incorrect; confirm the command form.

Docs alternate between “llama-stack run …” and “llama stack run …”. Please verify the correct CLI and update for consistency. If running interactively, mention using a separate terminal or backgrounding the process.
-llama stack run run.yaml --image-type venv
+# In a separate terminal:
+llama-stack run run.yaml --image-type venv
54-59: Stability: pin raw GitHub downloads to a commit SHA.

Main branch can change and break instructions. Pin to a specific commit for reproducibility.
-curl --create-dirs --output providers.d/remote/eval/trustyai_lmeval.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/providers.d/remote/eval/trustyai_lmeval.yaml
+curl --create-dirs --output providers.d/remote/eval/trustyai_lmeval.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/<commit-sha>/providers.d/remote/eval/trustyai_lmeval.yaml
@@
-curl --create-dirs --output run.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/run.yaml
+curl --create-dirs --output run.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/<commit-sha>/run.yaml
45-52: Command robustness: ensure route resolves to /v1/completions.

The grep predictor may select the wrong Knative service in multi-model namespaces. Consider making the selection explicit or documenting how to choose the correct service.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2cc4941 and 67e814f.

📒 Files selected for processing (5)

assemblies/configuring-the-guardrails-orchestrator-service.adoc (0 hunks)
modules/using-guardrails-orchestrator-with-llama-stack.adoc (1 hunks)
modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc (1 hunks)
modules/using-llama-stack-with-trustyai.adoc (1 hunks)
monitoring-data-science-models.adoc (1 hunks)

💤 Files with no reviewable changes (1)

assemblies/configuring-the-guardrails-orchestrator-service.adoc

🔇 Additional comments (2)

modules/using-guardrails-orchestrator-with-llama-stack.adoc (2)

26-28: Prereq gating flipped; confirm version policy for non-upstream.

Non-upstream changed to ifndef::upstream[] and min version to 2.20. Please confirm this aligns with product support; otherwise readers on <2.29 may follow unsupported steps.

32-36: Operator version pins—confirm they’re still correct.

Service Mesh 2.6.7-0, Serverless 1.35.1, Authorino 1.2.1 may drift. Please validate against current supported versions or remove micro-pins.

modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc

coderabbitai · 2025-09-05T17:07:27Z

modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc

+* You have cluster administrator privileges for your {productname-short} cluster.
+
+* You have downloaded and installed the {productname-short}  command-line interface (CLI). For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc[Installing the OpenShift CLI^].
+
+* You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
+
+* You have installed TrustyAI Operator in your {OpenShift} cluster.
+
+* You have set KServe to Raw Deployment mode in your cluster.


🛠️ Refactor suggestion

Attribute usage: CLI and platform names are inconsistent.

The CLI is for {openshift-platform} (oc), not {productname-short}. Also keep attribute names consistent with other modules.

-* You have cluster administrator privileges for your {productname-short} cluster. +* You have cluster administrator privileges for your {openshift-platform} cluster. @@ -* You have downloaded and installed the {productname-short} command-line interface (CLI). For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc[Installing the OpenShift CLI^]. +* You have downloaded and installed the {openshift-platform} command-line interface (CLI). For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc[Installing the OpenShift CLI^]. @@ -* You have installed TrustyAI Operator in your {OpenShift} cluster. +* You have installed the TrustyAI Operator in your {openshift-platform} cluster.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

* You have cluster administrator privileges for your {productname-short} cluster.

* You have downloaded and installed the {productname-short} command-line interface (CLI). For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc[Installing the OpenShift CLI^].

* You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.

* You have installed TrustyAI Operator in your {OpenShift} cluster.

* You have set KServe to Raw Deployment mode in your cluster.

* You have cluster administrator privileges for your {openshift-platform} cluster.

* You have downloaded and installed the {openshift-platform} command-line interface (CLI). For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc[Installing the OpenShift CLI^].

* You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.

* You have installed the TrustyAI Operator in your {openshift-platform} cluster.

* You have set KServe to Raw Deployment mode in your cluster.

🤖 Prompt for AI Agents

In modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc around lines 19 to 27, the second bullet incorrectly uses the {productname-short} attribute for the OpenShift CLI; change that instance to {openshift-platform} (oc) so the CLI reference is accurate and matches other modules, and review the surrounding bullets to ensure attribute names are consistent across the file (replace any other {productname-short} uses that refer to the platform/CLI with {openshift-platform}).

coderabbitai · 2025-09-05T17:07:27Z

modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc

+. Install the link:https://pypi.org/project/llama-stack/[Llama Stack provider] from the Python Package Index (PyPI):
+
+[source,bash]
+----
+pip install llama-stack-provider-lmeval
+----
+. Configure the Llama Stack server. Set the variables to configure the runtime endpoint and namespace. The VLLM_URL value should be the `v1/completions` endpoint of your model route and the TRUSTYAI_LM_EVAL_NAMESPACE should be the namespace where your model is deployed. For example: 


🛠️ Refactor suggestion

Missing required packages for server and client.

You install only the provider. The server CLI and client library aren’t installed, causing later steps to fail.

-. Install the link:https://pypi.org/project/llama-stack/[Llama Stack provider] from the Python Package Index (PyPI): +. Install the required packages from PyPI: @@ ---- -pip install llama-stack-provider-lmeval +pip install \ + llama-stack \ + llama-stack-client \ + llama-stack-provider-lmeval ----

🤖 Prompt for AI Agents

In modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc around lines 39 to 45, the instructions only install the Llama Stack provider but omit required server and client packages; update the installation step to also install the llama-stack server CLI and client library by adding their package names to the pip install command (or separate pip install lines) and mention that both server and client must be installed before configuring VLLM_URL and TRUSTYAI_LM_EVAL_NAMESPACE so subsequent steps don't fail.

coderabbitai · 2025-09-05T17:07:27Z

modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc

+Print a list of the current available benchmarks:
+
+[source,python]
+----
+benchmarks = client.benchmarks.list()
+
+pprint.print(f"Available benchmarks: {benchmarks}")
+----


⚠️ Potential issue

Bug: pprint.print does not exist.

Use pprint.pprint(...) or built-in print(...).

-pprint.print(f"Available benchmarks: {benchmarks}") +pprint.pprint(f"Available benchmarks: {benchmarks}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Print a list of the current available benchmarks:

+

[source,python]

----

benchmarks = client.benchmarks.list()

pprint.print(f"Available benchmarks: {benchmarks}")

----

benchmarks = client.benchmarks.list()

pprint.pprint(f"Available benchmarks: {benchmarks}")

🤖 Prompt for AI Agents

In modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc around lines 90 to 97, the snippet calls pprint.print(...) which does not exist; replace it with either pprint.pprint(benchmarks) or simply print(benchmarks), and if choosing pprint.pprint ensure the module is imported (import pprint) or reference via from pprint import pprint so the call resolves correctly.

modules/using-llama-stack-with-trustyai.adoc

coderabbitai · 2025-09-05T17:07:27Z

monitoring-data-science-models.adoc

+include::assemblies/using-llama-stack-with-trustyai.adoc[leveloffset=+1]
+
+// currently bias-monitoring is only in ODH


💡 Verification agent

🧩 Analysis chain

Verify include target exists and path is correct (assemblies vs modules).

The include points to assemblies/using-llama-stack-with-trustyai.adoc, but this PR shows modules/using-llama-stack-with-trustyai.adoc. If the assemblies/ file doesn’t exist, the build will fail. Update the path or add the assembly file accordingly.

Run to confirm presence and pick the correct path:

🏁 Script executed:

#!/bin/bash # List any matching files so we can choose the right include path. fd -a 'using-llama-stack-with-trustyai.adoc' -t f -H

Length of output: 116

Fix include path to modules/using-llama-stack-with-trustyai.adoc
assemblies/using-llama-stack-with-trustyai.adoc doesn’t exist; update the include in monitoring-data-science-models.adoc (line 34) to:

include::modules/using-llama-stack-with-trustyai.adoc[leveloffset=+1]

🤖 Prompt for AI Agents

In monitoring-data-science-models.adoc around lines 34 to 36 the include path points to a non-existent assemblies/using-llama-stack-with-trustyai.adoc; change the include to reference the correct file under modules by replacing that include with modules/using-llama-stack-with-trustyai.adoc[leveloffset=+1] so the document pulls the correct module file.

coderabbitai

Actionable comments posted: 6

♻️ Duplicate comments (1)

modules/using-llama-stack-with-trustyai.adoc (1)

16-17: Resolved: LM‑Eval module include added

The include for the LM‑Eval provider is now present. Looks good.

🧹 Nitpick comments (7)

modules/using-llama-stack-with-trustyai.adoc (1)
5-5: Normalize product naming and casing

Use consistent capitalization: “Llama Stack”, “TrustyAI”, and “lm-evaluation-harness”. Also fix the Guardrails bullet text.
- = Using llama stack with Trusty AI
+ = Using Llama Stack with TrustyAI
@@
-The following sections describe how to work with llama stack and provide example use cases:
+The following sections describe how to work with Llama Stack and provide example use cases:
@@
-* Using the Llama Stack external eval provider with lm-evaluation-harness in Trusty AI
-* Running custom evaluations with LMEval Llama Stack external eval provider
-* Use the trustyai-fms Guardrails Orchestrator with Llama-stack
+* Using the Llama Stack external eval provider with lm-evaluation-harness in TrustyAI
+* Running custom evaluations with the Llama Stack external eval provider (LM‑Eval)
+* Using the TrustyAI FMS Guardrails Orchestrator with Llama Stack
Also applies to: 9-14
modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc (6)
5-5: Title: fix spacing and naming

Remove trailing space and prefer “LM‑Eval”.
-= Running custom evaluations with LMEval and Llama Stack 
+= Running custom evaluations with LM‑Eval and Llama Stack
22-22: Typo: double space

Single-space “{productname-short} command-line interface”.
-* You have downloaded and installed the {productname-short}  command-line interface (CLI). For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc[Installing the OpenShift CLI^].
+* You have downloaded and installed the {productname-short} command-line interface (CLI). For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc[Installing the OpenShift CLI^].
83-84: AsciiDoc block attribute syntax

Use “[source,bash]” (no space) for consistency.
-[source, bash]
+[source,bash]
119-134: Optional: start smaller for demo runs

Consider a lower num_examples (e.g., 100) to reduce runtime/cost in walkthroughs.
-        "num_examples": 1000,
+        "num_examples": 100,
75-81: Optional: add cleanup note for the temporary pod

Suggest adding a short cleanup step after copying the dataset.
 ----
 oc cp example-dk-bench-input-bmo.jsonl dataset-storage-pod:/data/upload_files/example-dk-bench-input-bmo.jsonl -n <MODEL_NAMESPACE>
 ----
+. After the copy completes, you can delete the temporary resources:
++
+[source,bash]
+----
+oc delete pod dataset-storage-pod -n <MODEL_NAMESPACE>
+# Keep the PVC if you plan to reuse the dataset; otherwise:
+# oc delete pvc my-pvc -n <MODEL_NAMESPACE>
+----
83-112: Path clarity between PVC mount points

The example copies to “/data/upload_files/…” in the helper pod and reads from “/opt/app-root/src/hf_home/…” in the eval job. Add a one-line note that the same PVC is mounted at different paths across pods, and DK_BENCH_DATASET_PATH must match the provider’s mount path.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 67e814f and fbe7a57.

📒 Files selected for processing (3)

modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc (1 hunks)
modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc (1 hunks)
modules/using-llama-stack-with-trustyai.adoc (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc

coderabbitai · 2025-09-05T17:53:32Z

modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc

+.Procedure
+
+Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod.  Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model: 
+	
+[source,bash]
+----
+oc apply -n <MODEL_NAMESPACE> -f - << EOF
+apiVersion: v1
+kind: PersistentVolumeClaim


🛠️ Refactor suggestion

Step list: first item missing “.” so the procedure won’t enumerate

Prefix the first step with “.” to render a numbered list.

-.Procedure +.Procedure @@ -Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model: +. Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

.Procedure

Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:

+

[source,bash]

----

oc apply -n <MODEL_NAMESPACE> -f - << EOF

apiVersion: v1

kind: PersistentVolumeClaim

.Procedure

. Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:

[source,bash]

----

oc apply -n <MODEL_NAMESPACE> -f - << EOF

apiVersion: v1

kind: PersistentVolumeClaim

🤖 Prompt for AI Agents

In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc around lines 33 to 41, the Procedure step list is missing a leading "." on the first item so the steps won't render as a numbered list; add a "." prefix to the first step line before "Upload your custom dataset..." (i.e., make the first list item start with ".") so Asciidoctor recognizes and enumerates the procedure correctly.

coderabbitai · 2025-09-05T17:53:33Z

modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc

+[source,python]
+----
+def get_job_status(job_id, benchmark_id):
+    return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
+
+while True:
+    job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench")
+    print(job)
+


💡 Verification agent

🧩 Analysis chain

Verify benchmark_id consistency across the repo

Ensure no remaining “dk_bench” references.

🏁 Script executed:

#!/usr/bin/env bash rg -nP "trustyai_lmeval::dk[_-]bench" -C2

Length of output: 2132

Use the dash‐version for the final benchmark_id reference
In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc line 147, update:

- job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench") + job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[source,python]

----

def get_job_status(job_id, benchmark_id):

return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:

job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench")

print(job)

[source,python]

----

def get_job_status(job_id, benchmark_id):

return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:

job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench")

print(job)

🤖 Prompt for AI Agents

In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc around lines 141 to 149, the benchmark_id string uses underscores/colons; change the final benchmark_id reference from "trustyai_lmeval::dk_bench" to the dash-style "trustyai-lmeval::dk-bench" so the code uses the dash-version identifier.

coderabbitai · 2025-09-05T17:53:33Z

modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc

+[source,python]
+----
+def get_job_status(job_id, benchmark_id):
+    return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
+
+while True:
+    job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench")
+    print(job)
+
+    if job.status in ['failed', 'completed']:
+        print(f"Job ended with status: {job.status}")
+        break
+
+    time.sleep(20)
+
+----


⚠️ Potential issue

Fix benchmark_id dash/underscore mismatch and missing import

The polling example uses “dk_bench” (underscore) while the rest uses “dk-bench” (dash). Also add import for time.

-[source,python] +[source,python] ---- +import time def get_job_status(job_id, benchmark_id): return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id) while True: - job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench") + job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench") print(job) if job.status in ['failed', 'completed']: print(f"Job ended with status: {job.status}") break time.sleep(20) ----

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[source,python]

----

def get_job_status(job_id, benchmark_id):

return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:

job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench")

print(job)

if job.status in ['failed', 'completed']:

print(f"Job ended with status: {job.status}")

break

time.sleep(20)

----

[source,python]

----

import time

def get_job_status(job_id, benchmark_id):

return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:

job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench")

print(job)

if job.status in ['failed', 'completed']:

print(f"Job ended with status: {job.status}")

break

time.sleep(20)

----

🤖 Prompt for AI Agents

In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc around lines 141-156, the polling example uses an inconsistent benchmark_id ("dk_bench" with underscore) and is missing the time import; change the benchmark_id to "dk-bench" to match the rest of the document and add an import for time at the top of the Python snippet so time.sleep(20) works as written.

coderabbitai · 2025-09-05T17:53:33Z

modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc

+[source,python]
+----
+pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench").scores)
+----


🛠️ Refactor suggestion

Results snippet missing pprint import

Add the import so the example runs as-is.

-[source,python] +[source,python] ---- +import pprint pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench").scores) ----

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[source,python]

----

pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench").scores)

----

[source,python]

----

import pprint

pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench").scores)

----

🤖 Prompt for AI Agents

In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc around lines 159 to 162, the example uses pprint.pprint but does not import pprint; add a top-of-example import line "import pprint" (or "from pprint import pprint" and adjust usage accordingly) so the snippet runs as-is.

coderabbitai · 2025-09-05T17:53:33Z

modules/using-llama-stack-with-trustyai.adoc

@@ -0,0 +1,22 @@
+:_module-type: ASSEMBLY


💡 Verification agent

🧩 Analysis chain

Verify assembly location vs. type

File sits under modules/ but declares ASSEMBLY. Confirm that upstream assemblies include this path correctly (or relocate to assemblies/).

🏁 Script executed:

#!/usr/bin/env bash # Find all occurrences of using-llama-stack-with-trustyai.adoc and who includes them fd -a 'using-llama-stack-with-trustyai.adoc' rg -n 'include::.*using-llama-stack-with-trustyai\.adoc' -C2

Length of output: 574

Sync file location with includes
Move modules/using-llama-stack-with-trustyai.adoc into the assemblies/ directory (or update the include in monitoring-data-science-models.adoc:34 to reference modules/using-llama-stack-with-trustyai.adoc).

🤖 Prompt for AI Agents

In modules/using-llama-stack-with-trustyai.adoc around line 1, the file location doesn't match the include path used from monitoring-data-science-models.adoc:34; either move this file into the assemblies/ directory so the existing include resolves, or update the include statement in monitoring-data-science-models.adoc (line 34) to reference modules/using-llama-stack-with-trustyai.adoc; ensure the include path is correct relative to the including file and update any build/CI references if necessary.

coderabbitai · 2025-09-05T17:53:33Z

modules/using-llama-stack-with-trustyai.adoc

+include::../modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc[leveloffset=+1]
+include::../modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]


🛠️ Refactor suggestion

Include the custom-evaluations module to match the listed use cases

Add the “Running custom evaluations …” module so the assembly renders all referenced sections.

include::../modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc[leveloffset=+1] +include::../modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc[leveloffset=+1] include::../modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

include::../modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc[leveloffset=+1]

include::../modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]

include::../modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc[leveloffset=+1]

include::../modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc[leveloffset=+1]

include::../modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]

🤖 Prompt for AI Agents

In modules/using-llama-stack-with-trustyai.adoc around lines 16 to 17, the assembly is missing the "Running custom evaluations …" module include referenced elsewhere; add an include directive for the custom-evaluations module (matching the other include style and leveloffset) so the assembled document renders the referenced section.

Llama stack assembly + 1 of 2 scenarios for lmeval

67e814f

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

2of 2 lmeval Llama Stack scenarios updated

fbe7a57

BSynRedhat marked this pull request as draft September 5, 2025 17:48

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

BSynRedhat marked this pull request as ready for review September 7, 2025 21:18

Updating to resolve merge issue

2b5c7f1

BSynRedhat closed this Oct 8, 2025

BSynRedhat reopened this Oct 8, 2025

		include::assemblies/using-llama-stack-with-trustyai.adoc[leveloffset=+1]

		// currently bias-monitoring is only in ODH

		include::../modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc[leveloffset=+1]
		include::../modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]

Llama stack assembly + 1 of 2 scenarios for lmeval #926

Are you sure you want to change the base?

Llama stack assembly + 1 of 2 scenarios for lmeval #926

Uh oh!

Conversation

BSynRedhat commented Sep 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BSynRedhat commented Sep 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 5, 2025 •

edited

Loading