Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ include::modules/configuring-the-opentelemetry-exporter.adoc[leveloffset=+1]
include::modules/using-hugging-face-models-with-guardrails-orchestrator.adoc[leveloffset=+1]
include::modules/configuring-the-guardrails-detector-hugging-face-serving-runtime.adoc[leveloffset=+1]
include::modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc[leveloffset=+1]
include::modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]



Expand Down
163 changes: 163 additions & 0 deletions modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
:_module-type: PROCEDURE

ifdef::context[:parent-context: {context}]
[id="running-custom-evaluations-with-LMEval-and-llama-stack_{context}"]
= Running custom evaluations with LMEval and Llama Stack
[role='_abstract']

This example demonstrates how to use the link:https://github.com/trustyai-explainability/llama-stack-provider-lmeval[LMEval Llama Stack external eval provider] to evaluate a language model with a custom dataset. Creating a custom task is useful for evaluating specific model knowledge and behavior.
The process involves three steps: uploading the task dataset to your {productname-short} cluster, registering it as a custom benchmark dataset with Llama Stack, and running a benchmark evaluation job on a language model.

.Prerequisites

ifdef::upstream[]
* You have installed {productname-long}, version 2.29 or later.
endif::[]
ifndef::upstream[]
* You have installed {productname-long}, version 2.20 or later.
endif::[]

* You have cluster administrator privileges for your {productname-short} cluster.
* You have downloaded and installed the {productname-short} command-line interface (CLI). For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc[Installing the OpenShift CLI^].
* You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.
* You have installed TrustyAI Operator in your {productname-short} cluster.
* You have set KServe to Raw Deployment mode in your cluster.
* You have a language model deployed on vLLM Serving Runtime in your {productname-short} cluster.
.Procedure

Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:
+
[source,bash]
----
oc apply -n <MODEL_NAMESPACE> -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
Comment on lines +33 to +41
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Step list: first item missing “.” so the procedure won’t enumerate

Prefix the first step with “.” to render a numbered list.

-.Procedure
+.Procedure
@@
-Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod.  Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model: 
+. Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
.Procedure
Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:
+
[source,bash]
----
oc apply -n <MODEL_NAMESPACE> -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
.Procedure
. Upload your custom dataset to your OpenShift cluster using PersistentVolumeClaim (PVC) and a temporary pod. Create a PVC named `my-pvc` to store your dataset. Run the following command in your CLI, replacing <MODEL_NAMESPACE> with the namespace of your language model:
[source,bash]
----
oc apply -n <MODEL_NAMESPACE> -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
🤖 Prompt for AI Agents
In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc around
lines 33 to 41, the Procedure step list is missing a leading "." on the first
item so the steps won't render as a numbered list; add a "." prefix to the first
step line before "Upload your custom dataset..." (i.e., make the first list item
start with ".") so Asciidoctor recognizes and enumerates the procedure
correctly.

metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
EOF
----
. Create a pod object named `dataset-storage-pod` to download the task dataset into the PVC. This pod is used to copy your dataset from your local machine to the {productname-short} cluster:
+
[source,bash]
----
oc apply -n <MODEL_NAMESPACE> -f - << EOF
apiVersion: v1
kind: Pod
metadata:
name: dataset-storage-pod
spec:
containers:
- name: dataset-container
image: 'quay.io/prometheus/busybox:latest'
command: ["/bin/sh", "-c", "sleep 3600"]
volumeMounts:
- mountPath: "/data/upload_files"
name: dataset-storage
volumes:
- name: dataset-storage
persistentVolumeClaim:
claimName: my-pvc
EOF
----
. Copy your locally stored task dataset to the pod to place it within the PVC. . In this example, the dataset is named `example-dk-bench-input-bmo.jsonl` and it is copied to the `dataset-storage-pod` under the path `/data/upload_files/`. Replace <MODEL_NAMESPACE> with the namespace where the language model you wish to evaluate lives:
+
[source,bash]
----
oc cp example-dk-bench-input-bmo.jsonl dataset-storage-pod:/data/upload_files/example-dk-bench-input-bmo.jsonl -n <MODEL_NAMESPACE>
----
. Once the custom dataset is uploaded to the PVC, register it as a benchmark for evaluations. At a minimum, provide the following metadata: The TrustyAI LM-Eval Tasks GitHub web address, your branch, the commit hash, and path of the custom task. Ensure that you replace the `DK_BENCH_DATASET_PATH` and any other metadata fields to match your specific configuration:
+
[source, bash]
----
client.benchmarks.register(
benchmark_id="trustyai_lmeval::dk-bench",
dataset_id="trustyai_lmeval::dk-bench",
scoring_functions=["string"],
provider_benchmark_id="string",
provider_id="trustyai_lmeval",
metadata={
"custom_task": {
"git": {
"url": "https://github.com/trustyai-explainability/lm-eval-tasks.git",
"branch": "main",
"commit": "8220e2d73c187471acbe71659c98bccecfe77958",
"path": "tasks/",
}
},
"env": {
# Path of the dataset inside the PVC
"DK_BENCH_DATASET_PATH": "/opt/app-root/src/hf_home/example-dk-bench-input-bmo.jsonl",
"JUDGE_MODEL_URL": "http://phi-3-predictor:8080/v1/chat/completions",
# For simplicity, we use the same model as the one being evaluated
"JUDGE_MODEL_NAME": "phi-3",
"JUDGE_API_KEY": "",
},
"tokenized_requests": False,
"tokenizer": "google/flan-t5-small",
"input": {"storage": {"pvc": "my-pvc"}}
},
)

----
. Run a benchmark evaluation on your model:
+
[source,bash]
----
job = client.eval.run_eval(
benchmark_id="trustyai_lmeval::dk-bench",
benchmark_config={
"eval_candidate": {
"type": "model",
"model": "phi-3",
"provider_id": "trustyai_lmeval",
"sampling_params": {
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 256
},
},
"num_examples": 1000,
},
)

print(f"Starting job '{job.job_id}'")

----
. Monitor the status of the evaluation job. The job runs asynchronously, so you can check its status periodically:
+
[source,python]
----
def get_job_status(job_id, benchmark_id):
return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:
job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench")
print(job)

Comment on lines +141 to +149
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify benchmark_id consistency across the repo

Ensure no remaining “dk_bench” references.


🏁 Script executed:

#!/usr/bin/env bash
rg -nP "trustyai_lmeval::dk[_-]bench" -C2

Length of output: 2132


Use the dash‐version for the final benchmark_id reference
In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc line 147, update:

- job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench")
+ job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[source,python]
----
def get_job_status(job_id, benchmark_id):
return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
while True:
job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench")
print(job)
[source,python]
----
def get_job_status(job_id, benchmark_id):
return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
while True:
job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench")
print(job)
🤖 Prompt for AI Agents
In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc around
lines 141 to 149, the benchmark_id string uses underscores/colons; change the
final benchmark_id reference from "trustyai_lmeval::dk_bench" to the dash-style
"trustyai-lmeval::dk-bench" so the code uses the dash-version identifier.

if job.status in ['failed', 'completed']:
print(f"Job ended with status: {job.status}")
break

time.sleep(20)

----
Comment on lines +141 to +156
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix benchmark_id dash/underscore mismatch and missing import

The polling example uses “dk_bench” (underscore) while the rest uses “dk-bench” (dash). Also add import for time.

-[source,python]
+[source,python]
 ----
+import time
 def get_job_status(job_id, benchmark_id):
     return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
 
 while True:
-    job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench")
+    job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench")
     print(job)
 
     if job.status in ['failed', 'completed']:
         print(f"Job ended with status: {job.status}")
         break
 
     time.sleep(20)
 ----
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[source,python]
----
def get_job_status(job_id, benchmark_id):
return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
while True:
job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk_bench")
print(job)
if job.status in ['failed', 'completed']:
print(f"Job ended with status: {job.status}")
break
time.sleep(20)
----
[source,python]
----
import time
def get_job_status(job_id, benchmark_id):
return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
while True:
job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench")
print(job)
if job.status in ['failed', 'completed']:
print(f"Job ended with status: {job.status}")
break
time.sleep(20)
----
🤖 Prompt for AI Agents
In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc around
lines 141-156, the polling example uses an inconsistent benchmark_id ("dk_bench"
with underscore) and is missing the time import; change the benchmark_id to
"dk-bench" to match the rest of the document and add an import for time at the
top of the Python snippet so time.sleep(20) works as written.

. Get the job results:
+
[source,python]
----
pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench").scores)
----
Comment on lines +159 to +162
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Results snippet missing pprint import

Add the import so the example runs as-is.

-[source,python]
+[source,python]
 ----
+import pprint
 pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench").scores)
 ----
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[source,python]
----
pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench").scores)
----
[source,python]
----
import pprint
pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::dk-bench").scores)
----
🤖 Prompt for AI Agents
In modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc around
lines 159 to 162, the example uses pprint.pprint but does not import pprint; add
a top-of-example import line "import pprint" (or "from pprint import pprint" and
adjust usage accordingly) so the snippet runs as-is.

Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ This example demonstrates how to use the built-in link:https://github.com/trusty
ifdef::upstream[]
* You have installed {productname-long}, version 2.29 or later.
endif::[]
ifdef::upstream[]
ifndef::upstream[]
* You have installed {productname-long}, version 2.20 or later.
endif::[]

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
:_module-type: PROCEDURE

ifdef::context[:parent-context: {context}]
[id="using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI_{context}"]
= Using Llama Stack external eval provider with lm-evaluation-harness in TrustyAI
[role='_abstract']

This example demonstrates how to evaluate a language model in {productname-long} using the LMEval Llama Stack external eval provider in a Python workbench. To do this, configure a Llama Stack server to use the LMEval Eval provider, register a benchmark dataset, and run a benchmark evaluation job on a language model.

.Prerequisites

ifdef::upstream[]
* You have installed {productname-long}, version 2.29 or later.
endif::[]
ifndef::upstream[]
* You have installed {productname-long}, version 2.20 or later.
endif::[]

* You have cluster administrator privileges for your {productname-short} cluster.

* You have downloaded and installed the {productname-short} command-line interface (CLI). For more information, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc[Installing the OpenShift CLI^].

* You have a large language model (LLM) for chat generation or text classification, or both, deployed in your namespace.

* You have installed TrustyAI Operator in your {productname-short} cluster.

* You have set KServe to Raw Deployment mode in your cluster.


.Procedure

. Configure a Python virtual environment for this tutorial in your `DataScienceCluster`:
+
[source,bash]
----
python3 -m venv .venv
source .venv/bin/activate
----
. Install the link:https://pypi.org/project/llama-stack/[Llama Stack provider] from the Python Package Index (PyPI):
+
[source,bash]
----
pip install llama-stack-provider-lmeval
----
. Configure the Llama Stack server. Set the variables to configure the runtime endpoint and namespace. The VLLM_URL value should be the `v1/completions` endpoint of your model route and the TRUSTYAI_LM_EVAL_NAMESPACE should be the namespace where your model is deployed. For example:
Comment on lines +39 to +45
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Missing required packages for server and client.

You install only the provider. The server CLI and client library aren’t installed, causing later steps to fail.

-. Install the link:https://pypi.org/project/llama-stack/[Llama Stack provider] from the Python Package Index (PyPI):
+. Install the required packages from PyPI:
@@
---- 
-pip install llama-stack-provider-lmeval
+pip install \
+  llama-stack \
+  llama-stack-client \
+  llama-stack-provider-lmeval
---- 
🤖 Prompt for AI Agents
In
modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc
around lines 39 to 45, the instructions only install the Llama Stack provider
but omit required server and client packages; update the installation step to
also install the llama-stack server CLI and client library by adding their
package names to the pip install command (or separate pip install lines) and
mention that both server and client must be installed before configuring
VLLM_URL and TRUSTYAI_LM_EVAL_NAMESPACE so subsequent steps don't fail.

+
[source,bash]
----
export VLLM_URL=https://$(oc get $(oc get ksvc -o name | grep predictor) --template='{{.status.url}}')/v1/completions
export TRUSTYAI_LM_EVAL_NAMESPACE=$(oc project | cut -d '"' -f2)
----
. Download the `providers.d` provider configuration directory and the `run.yaml` execution file:
+
[source, bash]
----
curl --create-dirs --output providers.d/remote/eval/trustyai_lmeval.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/providers.d/remote/eval/trustyai_lmeval.yaml

curl --create-dirs --output run.yaml https://raw.githubusercontent.com/trustyai-explainability/llama-stack-provider-lmeval/refs/heads/main/run.yaml
----
. Start the Llama Stack server in a virtual environment, which uses port `8321` by default:
+
[source,bash]
----
llama stack run run.yaml --image-type venv
----
. Create a Python script in a Jupyter workbench and import the following libraries and modules, to interact with the server and run an evaluation:
+
[source,python]
----
import os
import subprocess

import logging

import time
import pprint
----
. Start the Llama Stack Python client to interact with the running Llama Stack server:
+
[source,python]
----
BASE_URL = "http://localhost:8321"

def create_http_client():
from llama_stack_client import LlamaStackClient
return LlamaStackClient(base_url=BASE_URL)

client = create_http_client()
----
. Print a list of the current available benchmarks:
+
[source,python]
----
benchmarks = client.benchmarks.list()

pprint.print(f"Available benchmarks: {benchmarks}")
----
. LMEval provides access to over 100 preconfigured evaluation datasets. Register the ARC-Easy benchmark, a dataset of grade-school level, multiple-choice science questions:
+
[source,python]
----
client.benchmarks.register(
benchmark_id="trustyai_lmeval::arc_easy",
dataset_id="trustyai_lmeval::arc_easy",
scoring_functions=["string"],
provider_benchmark_id="string",
provider_id="trustyai_lmeval",
metadata={
"tokenizer": "google/flan-t5-small",
"tokenized_requests": False,
}
)
----
. Verify that the benchmark has been registered successfully:
+
[source,python]
----
benchmarks = client.benchmarks.list()
pprint.print(f"Available benchmarks: {benchmarks}")
----
. Run a benchmark evaluation job on your deployed model using the following input. Replace phi-3 with the name of your deployed model:
+
[source,python]
----
job = client.eval.run_eval(
benchmark_id="trustyai_lmeval::arc_easy",
benchmark_config={
"eval_candidate": {
"type": "model",
"model": "phi-3",
"provider_id": "trustyai_lmeval",
"sampling_params": {
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 256
},
},
"num_examples": 1000,
},
)

print(f"Starting job '{job.job_id}'")
----
. Monitor the status of the evaluation job using the following code. The job will run asynchronously, so you can check its status periodically:
[source, python]
----
def get_job_status(job_id, benchmark_id):
return client.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:
job = get_job_status(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy")
print(job)

if job.status in ['failed', 'completed']:
print(f"Job ended with status: {job.status}")
break

time.sleep(20)
----
. Retrieve the evaluation job results once the job status reports back as `completed`:
+
[source,python]
----
pprint.pprint(client.eval.jobs.retrieve(job_id=job.job_id, benchmark_id="trustyai_lmeval::arc_easy").scores)
----


22 changes: 22 additions & 0 deletions modules/using-llama-stack-with-trustyai.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
:_module-type: ASSEMBLY
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify assembly location vs. type

File sits under modules/ but declares ASSEMBLY. Confirm that upstream assemblies include this path correctly (or relocate to assemblies/).


🏁 Script executed:

#!/usr/bin/env bash
# Find all occurrences of using-llama-stack-with-trustyai.adoc and who includes them
fd -a 'using-llama-stack-with-trustyai.adoc'
rg -n 'include::.*using-llama-stack-with-trustyai\.adoc' -C2

Length of output: 574


Sync file location with includes
Move modules/using-llama-stack-with-trustyai.adoc into the assemblies/ directory (or update the include in monitoring-data-science-models.adoc:34 to reference modules/using-llama-stack-with-trustyai.adoc).

🤖 Prompt for AI Agents
In modules/using-llama-stack-with-trustyai.adoc around line 1, the file location
doesn't match the include path used from monitoring-data-science-models.adoc:34;
either move this file into the assemblies/ directory so the existing include
resolves, or update the include statement in monitoring-data-science-models.adoc
(line 34) to reference modules/using-llama-stack-with-trustyai.adoc; ensure the
include path is correct relative to the including file and update any build/CI
references if necessary.


ifdef::context[:parent-context: {context}]
[id="modules/using-llama-stack-with-trustyai_{context}"]
= Using llama stack with Trusty AI

This section contains tutorials for working with Llama Stack in Trusty AI. These tutorials demonstrate how to use various Llama Stack components and providers to evaluate and work with language models.

The following sections describe how to work with llama stack and provide example use cases:

* Using the Llama Stack external eval provider with lm-evaluation-harness in Trusty AI
* Running custom evaluations with LMEval Llama Stack external eval provider
* Use the trustyai-fms Guardrails Orchestrator with Llama-stack
include::../modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc[leveloffset=+1]
include::../modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]
Comment on lines +16 to +17
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Include the custom-evaluations module to match the listed use cases

Add the “Running custom evaluations …” module so the assembly renders all referenced sections.

 include::../modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc[leveloffset=+1]
+include::../modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc[leveloffset=+1]
 include::../modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
include::../modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc[leveloffset=+1]
include::../modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]
include::../modules/using-llama-stack-external-eval-provider-with-lm-evaluation-harness-in-TrustyAI.adoc[leveloffset=+1]
include::../modules/running-custom-evaluations-with-LMEval-and-llama-stack.adoc[leveloffset=+1]
include::../modules/using-guardrails-orchestrator-with-llama-stack.adoc[leveloffset=+1]
🤖 Prompt for AI Agents
In modules/using-llama-stack-with-trustyai.adoc around lines 16 to 17, the
assembly is missing the "Running custom evaluations …" module include referenced
elsewhere; add an include directive for the custom-evaluations module (matching
the other include style and leveloffset) so the assembled document renders the
referenced section.




ifdef::parent-context[:context: {parent-context}]
ifndef::parent-context[:!context:]
3 changes: 3 additions & 0 deletions monitoring-data-science-models.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,7 @@ include::assemblies/evaluating-large-language-models.adoc[leveloffset=+1]

include::assemblies/configuring-the-guardrails-orchestrator-service.adoc[leveloffset=+1]

include::assemblies/using-llama-stack-with-trustyai.adoc[leveloffset=+1]

// currently bias-monitoring is only in ODH
Comment on lines +34 to +36
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify include target exists and path is correct (assemblies vs modules).

The include points to assemblies/using-llama-stack-with-trustyai.adoc, but this PR shows modules/using-llama-stack-with-trustyai.adoc. If the assemblies/ file doesn’t exist, the build will fail. Update the path or add the assembly file accordingly.

Run to confirm presence and pick the correct path:


🏁 Script executed:

#!/bin/bash
# List any matching files so we can choose the right include path.
fd -a 'using-llama-stack-with-trustyai.adoc' -t f -H

Length of output: 116


Fix include path to modules/using-llama-stack-with-trustyai.adoc
assemblies/using-llama-stack-with-trustyai.adoc doesn’t exist; update the include in monitoring-data-science-models.adoc (line 34) to:

include::modules/using-llama-stack-with-trustyai.adoc[leveloffset=+1]
🤖 Prompt for AI Agents
In monitoring-data-science-models.adoc around lines 34 to 36 the include path
points to a non-existent assemblies/using-llama-stack-with-trustyai.adoc; change
the include to reference the correct file under modules by replacing that
include with modules/using-llama-stack-with-trustyai.adoc[leveloffset=+1] so the
document pulls the correct module file.

include::assemblies/bias-monitoring-tutorial.adoc[leveloffset=+1]