RHOAIENG-37346:Refactor Guardrails for Safety JTBD #1036

skrthomas · 2025-11-04T19:36:46Z

Description

How Has This Been Tested?

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Link to docs preview: https://opendatahub-documentation--1036.org.readthedocs.build/en/1036/enabling-ai-safety/index.html

NOTE: This preview is a basic asciidoc preview. It isn't how the content will look on docs.redhat.com or odh, but its a nice way to preview content in PRs.

New TOC:

Summary by CodeRabbit

Documentation
- Restructured AI safety documentation for improved clarity and organization
- Updated documentation titles to emphasize specific safety features (PII detection, prompt injection mitigation, content moderation)
- Added comprehensive guides covering Guardrails deployment and AI safety enablement
- Enhanced Guardrails Gateway documentation to highlight safety pipeline enforcement capabilities

coderabbitai · 2025-11-04T19:36:55Z

Walkthrough

Documentation restructuring reorganizes Guardrails safety content by removing an orchestrator-specific assembly, updating module titles and identifiers for clarity, creating new assemblies and a top-level document that group safety features by use case (PII detection, prompt injection, content moderation), and updating include directives accordingly.

Changes

Cohort / File(s)	Change Summary
Deleted Assembly `assemblies/configuring-the-guardrails-orchestrator-service.adoc`	Removed entire assembly documenting Guardrails Orchestrator deployment, configuration, and detector setup
Module Title and ID Refactoring `modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc`, `modules/detecting-hateful-and-profane-language.adoc`, `modules/detecting-pii-by-using-guardrails-with-llama-stack.adoc`, `modules/mitigating-prompt-injection-by-using-a-hugging-face-prompt-injection-detector.adoc`, `modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc`	Updated document IDs and titles to reflect feature focus rather than implementation details; expanded descriptions in some modules
New Content Organization `assemblies/using-guardrails-for-ai-safety.adoc`	New assembly grouping three safety feature categories (PII detection, prompt injection, content moderation) with module includes
New Top-Level Document `enabling-ai-safety.adoc`	New document defining book-style layout and includes two assemblies for Guardrails safety and gateway setup
New Assembly for Guardrails Orchestrator `assemblies/enabling-ai-safety-with-guardrails.adoc`	New assembly documenting TrustyAI Guardrails Orchestrator deployment, detector configuration, and observability
Include Directive Updates `assemblies/using-llama-stack-with-trustyai.adoc`, `monitoring-data-science-models.adoc`	Updated include directive to reference PII detection module instead of orchestrator module; removed orchestrator assembly include

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

No executable code or logic changes; documentation-only restructuring
Consistent pattern of title/ID refactoring across multiple modules
Verify include directives resolve correctly to new or reorganized modules
Confirm document hierarchy and context attributes are properly maintained in new assemblies

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly references the main objective: refactoring Guardrails documentation for a specific Jobs to be Done (JTBD) task, with a ticket identifier for traceability.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc (1)
4-5: Update module ID to reflect the new title for consistency.

The title has been updated to emphasize the safety outcome ("Preventing Prompt Injection"), but the ID on line 4 still reflects the old phrasing ("using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator"). For consistency and maintainability, consider updating the ID to reflect the new title.
-[id="using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator_{context}"]
+[id="preventing-prompt-injection-using-hugging-face-detector_{context}"]

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a63e879 and c886a25.

📒 Files selected for processing (4)

assemblies/configuring-the-guardrails-orchestrator-service.adoc (1 hunks)
modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc (1 hunks)
modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc (1 hunks)
modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc (1 hunks)

🔇 Additional comments (3)

modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc (1)

4-5: Verify that the main assembly references this updated module ID.

The title and ID have been updated to emphasize the safety outcome ("Filtering flagged content") rather than the operation. Ensure the main assembly file (configuring-the-guardrails-orchestrator-service.adoc) includes this module using the updated ID.

modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc (1)

6-11: Enhanced abstract and prerequisites strengthen production-readiness messaging.

The expanded abstract clearly articulates the use of preset guardrail pipelines for consistent safety policies in production, and the new Prerequisites section appropriately establishes that the guardrails gateway image must be configured beforehand. These changes align well with the PR's focus on safety-outcome messaging.

assemblies/configuring-the-guardrails-orchestrator-service.adoc (1)

4-11: Update assembly ID and verify title reflects safety-outcome JTBD messaging.

The ID and title have been successfully updated from an operational focus ("Configuring") to a safety-outcome focus ("Enable AI safety"). This aligns well with the PR objectives. Ensure external cross-references and inbound links are updated accordingly, as this is a high-traffic assembly page.

assemblies/configuring-the-guardrails-orchestrator-service.adoc

modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

assemblies/configuring-the-guardrails-orchestrator-service.adoc (1)
49-56: Fix critical include file reference with duplicate .adoc extension.

Line 55 must reference a file with one of the recognized AsciiDoc extensions (.asciidoc, .adoc, .ad, .asc, or .txt). The current reference includes a duplicate extension:
include::modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc[leveloffset=+1]
This will cause the include directive to fail during documentation builds, as the file path is malformed. Additionally, line 54 uses leveloffset=+2 for the regex detector while lines 52–53 use leveloffset=+1, creating an inconsistent heading hierarchy within the same section.

Apply this diff to fix the file reference and standardize leveloffsets:
 include::modules/guardrails-orchestrator-hap-scenario.adoc[leveloffset=+1]
 include::modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc[leveloffset=+1]
-include::modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc[leveloffset=+2]
-include::modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc[leveloffset=+1]
+include::modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc[leveloffset=+1]
+include::modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc[leveloffset=+1]

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98b8888 and 75dbb24.

📒 Files selected for processing (4)

assemblies/configuring-the-guardrails-orchestrator-service.adoc (1 hunks)
modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc (1 hunks)
modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc (1 hunks)
modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc (1 hunks)

✅ Files skipped from review due to trivial changes (1)

modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc

🚧 Files skipped from review as they are similar to previous changes (1)

modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc

🔇 Additional comments (4)

modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc (1)

4-5: Heading change improves the preventive posture of the module.

The updated heading shifts from a neutral/descriptive tone ("Using...") to an action-oriented, safety-focused tone ("Preventing..."), which better aligns with the PR objectives to emphasize AI safety practices. The change is clear and effective for users seeking guidance on prompt injection prevention.

Note: The module ID at line 4 (using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator_{context}) was not updated to reflect the new heading. If backward compatibility with existing links is not required, consider updating the ID to match the new heading (e.g., preventing-prompt-injection-with-hugging-face-detector_{context}). Otherwise, this is a non-issue.

assemblies/configuring-the-guardrails-orchestrator-service.adoc (3)

4-5: Assembly title and ID updated to emphasize safety-first approach.

The assembly-level title and ID changes align well with the PR objectives. The new title "Enabling AI safety with Guardrails" and ID enable-ai-safety-with-guardrails_{context} better communicate the purpose and focus of the documentation section.

11-35: Content restructuring improves clarity on Guardrails components.

The reorganization from a concise detector listing to descriptive component descriptions (Deploy, Configure & Use, Monitor, Enable OpenTelemetry) provides users with a clear roadmap of what they can accomplish. This is a solid documentation improvement that sets expectations upfront.

40-47: Verify intentionality of leveloffset hierarchy for detector/gateway modules.

Lines 44–46 use leveloffset=+2, creating a sub-subsection level under "Deploying and Configuring Guardrails components," while lines 40–43 use leveloffset=+1. This creates a heading hierarchy where built-in detectors, Hugging Face models, and gateway configuration are visually nested deeper than the Orchestrator deployment itself.

If this nesting is intentional (to group detector/gateway topics as child topics of deployment), the current structure is correct. However, if these modules should be peer topics at the same level as the Orchestrator, use leveloffset=+1 consistently.

Please confirm whether the leveloffset=+2 on lines 44–46 is intentional to create a sub-topic hierarchy, or should be changed to leveloffset=+1 for consistency with sibling modules.

assemblies/configuring-the-guardrails-orchestrator-service.adoc

skrthomas · 2025-11-04T20:54:59Z

modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc

 ifdef::context[:parent-context: {context}]
 [id="using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator_{context}"]
-= Using a Hugging Face Prompt Injection detector with the Guardrails Orchestrator
+= Preventing Prompt Injection by using a Hugging Face Prompt Injection detector


If this title is OK, I'll go ahead and update the id=, but that creates a kind of pain of needing to update it in multiple places, so I want to be sure the titles are okay from stakeholder POV before I make deep changes.

"Mitigating" is usually a safer word. "Preventing" is stronger and may not be 100% guranteed.

skrthomas · 2025-11-04T20:56:08Z

assemblies/enabling-ai-safety-with-guardrails.adoc

-[id="configuring-the-guardrails-orchestrator-service_{context}"]
-= Configuring the Guardrails Orchestrator service
+[id="enable-ai-safety-with-guardrails_{context}"]
+= Enabling AI safety with Guardrails

 The TrustyAI Guardrails Orchestrator service is a tool to invoke detections on text generation inputs and outputs, as well as standalone detections.


Do we want to leave this mention of "TrustyAI Guardrails" here? Or remove the "TrustyAI" bit?

So in the spirit of decoipling function from tool. Whenever we have to refer to something concrete, we can reference the tool that does the function.

In this case, the phrasing looks okay here but ti will be important to always communicate "tool x makes function y happen" in this case, maybe a blurb on how the trustyAI orchestratore service helps with enabling AI safety with guardrails.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc (1)
4-4: Unresolved: Update ID to follow AsciiDoc lowercase convention.

The ID still contains mixed-case "LLM", which deviates from AsciiDoc ID conventions (all lowercase with hyphens). This was flagged in a previous review and needs to be corrected for consistency across the documentation.

Apply this diff:
-[id="enforcing-configured-safety-pipelines-for-LLM-inference-using-guardrails-gateway_{context}"]
+[id="enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway_{context}"]

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 24ecbd0 and c694fa9.

📒 Files selected for processing (4)

assemblies/configuring-the-guardrails-orchestrator-service.adoc (1 hunks)
modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc (1 hunks)
modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc (1 hunks)
modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc
modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc

🔇 Additional comments (3)

assemblies/configuring-the-guardrails-orchestrator-service.adoc (3)
7-7: Unresolved: Address user question about "TrustyAI" branding.

A previous reviewer asked whether to retain or remove the "TrustyAI" reference here. Please clarify the documentation's branding approach: should this mention "TrustyAI Guardrails" or just "Guardrails Orchestrator"?

44-44: Verify leveloffset hierarchy for sub-modules.

Lines 44 and 54 use leveloffset=+2 while most sibling modules in the same sections use leveloffset=+1. This creates deeper nesting for those modules. Confirm this is intentional (i.e., the built-in detector and regex detector are genuinely sub-topics) or standardize to +1 for consistency.

If these should be at the same level as their siblings, apply this diff:
-include::modules/configuring-the-built-in-detector-and-guardrails-gateway.adoc[leveloffset=+2]
+include::modules/configuring-the-built-in-detector-and-guardrails-gateway.adoc[leveloffset=+1]
-include::modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc[leveloffset=+2]
+include::modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc[leveloffset=+1]
Also applies to: 54-54

55-55: Fixed: File extension and include reference corrected.

The include statement now correctly references the module file with a single .adoc extension. The previous double extension issue has been resolved. ✅

modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc

skrthomas · 2025-11-04T21:20:21Z

@zanetworker @RobGeada PTAL at this PR for the Guardrails refactoring and let me know your thoughts/comments/feedbacks

zanetworker · 2025-11-05T09:25:02Z

Do we have llama stack with guardrails here somewhere? LLama stack will need to interlink between the llama stack docs and our safety docs.

zanetworker · 2025-11-05T09:07:39Z

assemblies/configuring-the-guardrails-orchestrator-service.adoc

-[id="configuring-the-guardrails-orchestrator-service_{context}"]
-= Configuring the Guardrails Orchestrator service
+[id="enable-ai-safety-with-guardrails_{context}"]
+= Enabling AI safety with Guardrails


I'd stop an Enabling AI Safety, then guardails be hover, metadata, subtitle as discussed.

Potentially Guiardrails would be one sub topic in the future (not the only thing to do in AI safety, amongst others like safety validation, safety hub, etc).

I think for the sake of our file architecture, since we have the topmost book as Enabling AI Safety, i.e. the tile you see on the main docs page, we have to keep this name/id distinct, so I think we have to include "with guardrails" so that we have some unique IDs.

I'd then vote for removing (with guardrails) so that when they click on Enable AI safety, they can find examples of guardrails. Just trying to not pin us down to "just" guardrails because there will be more. If other folks have strong opinions, we can discuss how we plan on moving/restructuring for the future.

I see it looks like this, if "Enabling AI Safity" is what appears when I go to RHOAI docs high-level, then great, then we drill into enabling/using as you have it.

Ok perfect, that is how we have it then. I think this works.

zanetworker · 2025-11-05T09:24:01Z

modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc

 ifdef::context[:parent-context: {context}]
 [id="using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator_{context}"]
-= Using a Hugging Face Prompt Injection detector with the Guardrails Orchestrator
+= Preventing Prompt Injection by using a Hugging Face Prompt Injection detector


Probably for future reference, but can we decouple between "Deployment/configuration of the system" and "using the system", they are different personas usually.

Should we have a section for Guardrail use-cases where this would be one application of guardrails

Guardrails Use-Cases

PII & Sensitive Data Detection

Protecting User Privacy by Detecting Email Addresses (email)

Preventing Credit Card Leakage with Luhn-Validated Detection (credit-card)

Safeguarding Social Security Numbers in User Inputs (us-social-security-number)

Detecting Phone Numbers to Prevent Contact Information Leakage (us-phone-number)

Identifying IPv4 Addresses for Network Security Compliance (ipv4)

Detecting IPv6 Addresses in Technical Documentation (ipv6)

Validating UK Postal Codes for Geographic Data Privacy (uk-post-code)

Creating Custom PII Detectors with Regex Patterns ($CUSTOM_REGEX)

Preventing Prompt Injection by using a Hugging Face Prompt Injection detector

Prompt Security

Preventing Prompt Injection Attacks with DeBERTa Classifier (protectai/deberta-v3-base-prompt-injection-v2)

Detecting Jailbreak Attempts with Granite Guardian (ibm-granite/granite-guardian-*)

Content Safety & Moderations

Content Safety & Moderation

Filtering Toxic Content with Granite Guardian HAP (ibm-granite/granite-guardian-hap-38m)

Detecting Hate, Abuse, and Profanity in User Messages (HAP models)

Identifying Harmful Content Across Multiple Risk Categories (Granite Guardian models)

Blocking Social Bias in LLM Responses (Granite Guardian models)

Preventing Unethical Behavior Suggestions (Granite Guardian models)

Filtering Sexual Content from Conversations (Granite Guardian models)

Detecting Violence and Harmful Instructions (Granite Guardian models)

Identifying Profanity in User-Generated Content (Granite Guardian models)

...

https://github.com/trustyai-explainability/guardrails-detectors/tree/main/detectors

@zanetworker I am in favor of breaking these up into two sections and it wouldn't be difficult to do that in this PR by adding a new assembly and siphoning these "Use case" inclusions that you've highlighted there. In this particular iteration of the refactor, I tried to group the Config/Use cases; whereas previously the use cases were sort-of meshed with the config pieces, I grouped them separately into Configs then Use cases. The Monitoring user inputs with the GOrch service is interesting because it is a use-case for filtering hateful and profane language but it has a ConfigMap piece to it as well, but it is mostly a scenario piece. I will add another commit shortly where I reorganize the use cases into a separate assembly and re-title the "Monitoring user inputs with the Guardrails Orchestrator service to a variation of your suggested title "Detecting Hateful and profane language".

Regarding the question/comment about the llamastack in Guardrails, I think in the re-org it accidentally got deleted 😬 so thanks for asking about it. I re-added it.

The Guardrails with TAI content is both in the Guardrails section and in a standalone section Using TrustyAI with Llama Stack. Since this procedure to set up Llama Stack with Trusty AI is in both places, I've reorganized the appearance in the Guardrails section to appear in the use-case as a Detecting PII scenario, hoping to catch people intending to do the setup in the TrustyAI with Llama Stack section.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

assemblies/using-llama-stack-with-trustyai.adoc (1)
15-15: Bullet point at line 15 misaligns with the updated module title.

Line 15 still uses the generic "Using the trustyai-fms Guardrails Orchestrator with Llama Stack" wording, but line 20 now includes the PII-focused module detecting-pii-by-using-guardrails-with-llama-stack.adoc. Update the bullet point to accurately reflect that the example focuses on PII detection.

Apply this diff to align the bullet point:
-* Using the trustyai-fms Guardrails Orchestrator with Llama Stack
+* Detecting personally identifiable information (PII) by using Guardrails with Llama Stack
Also applies to: 20-20

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c464d0a and bdfd313.

📒 Files selected for processing (10)

assemblies/configuring-the-guardrails-orchestrator-service.adoc (0 hunks)
assemblies/enabling-ai-safety.adoc (1 hunks)
assemblies/using-guardrails-for-ai-safety.adoc (1 hunks)
assemblies/using-llama-stack-with-trustyai.adoc (1 hunks)
modules/detecting-hateful-and-profane-language.adoc (1 hunks)
modules/detecting-pii-by-using-guardrails-with-llama-stack.adoc (2 hunks)
modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc (1 hunks)
modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc (1 hunks)
modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc (1 hunks)
monitoring-data-science-models.adoc (1 hunks)

💤 Files with no reviewable changes (1)

assemblies/configuring-the-guardrails-orchestrator-service.adoc

✅ Files skipped from review due to trivial changes (1)

assemblies/enabling-ai-safety.adoc

🚧 Files skipped from review as they are similar to previous changes (3)

modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc
modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc
modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc

🔇 Additional comments (6)

monitoring-data-science-models.adoc (1)

32-34: Clean structural reorganization of the monitoring index.

The change cleanly replaces the outdated single Guardrails Orchestrator assembly with two focused assemblies for AI safety enablement and guardrails usage. The blank-line separator improves readability. Ensure that both referenced assemblies (enabling-ai-safety.adoc and using-guardrails-for-ai-safety.adoc) are present in the repository and that their IDs match the expected context conventions.

modules/detecting-pii-by-using-guardrails-with-llama-stack.adoc (2)

4-5: Well-focused module naming and ID updates.

The renaming from generic "using Guardrails Orchestrator with Llama Stack" to specific "Detecting PII by using Guardrails with Llama Stack" improves discoverability and clarifies the example's scope.

14-14: Description clarification enhances context.

The reword clearly positions the Guardrails Orchestrator as a Llama Stack safety component for PII detection. The narrative flow is improved.

modules/detecting-hateful-and-profane-language.adoc (1)

4-5: Focused and clear module reframing.

The HAP module ID and title are now use-case-specific and more discoverable. The description reword emphasizing HAP detection aligns well with the new name. This follows the positive refactoring pattern established in the PII module.

Also applies to: 8-8

assemblies/using-guardrails-for-ai-safety.adoc (2)

6-17: Well-organized assembly structure with logical subsections.

The three-subsection organization (Detecting PII, Securing Prompts, Moderating Content) provides clear categorization of guardrails use cases. All includes use consistent leveloffset=+1 formatting. The introductory context at line 6 is appropriately focused.

Note on interlinking: The PR comments from @zanetworker mention that interlinking between the llama-stack documentation and the safety/guardrails documentation needs attention. This assembly includes detecting-pii-by-using-guardrails-with-llama-stack.adoc, which bridges these topics. Verify that the llama-stack assembly (assemblies/using-llama-stack-with-trustyai.adoc) includes appropriate cross-references back to this safety assembly for bidirectional navigation.

19-20: Proper context restoration.

The parent-context conditional blocks follow standard AsciiDoc assembly patterns.

assemblies/using-guardrails-for-ai-safety.adoc

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

enabling-ai-safety.adoc (1)

1-20: Ensure cross-references and navigation from related documentation.

The creation of a new dedicated safety documentation entry point is good for organization, but you should verify that:

The included assemblies contain proper cross-references (xrefs) to related documentation, particularly the LLaMA stack documentation mentioned in the PR comments.

Navigation paths from parent/related documentation (e.g., monitoring, model deployment guides) link appropriately to this new page or its sections.

The document is discoverable from the main documentation navigation structure.

This aligns with zanetworker's comment about the need for interlinking between LLaMA stack docs and safety/guardrails documentation.

If you need help auditing or adding cross-references in the included assemblies or related documentation, I can help verify the interlinking structure.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fc4169 and f4ca79c.

📒 Files selected for processing (3)

assemblies/enabling-ai-safety-with-guardrails.adoc (1 hunks)
enabling-ai-safety.adoc (1 hunks)
monitoring-data-science-models.adoc (0 hunks)

💤 Files with no reviewable changes (1)

monitoring-data-science-models.adoc

🚧 Files skipped from review as they are similar to previous changes (1)

assemblies/enabling-ai-safety-with-guardrails.adoc

🔇 Additional comments (2)

enabling-ai-safety.adoc (2)

1-14: Front matter and AsciiDoc configuration look correct.

The Jekyll front matter, document attributes, and AsciiDoc configuration are properly structured. The use of :context: safety aligns with the assembly-based documentation approach.

18-18: Verify included assembly paths and structure.

The includes reference two new assemblies that split the previous consolidated Guardrails documentation. Ensure that:

The assembly file paths (assemblies/enabling-ai-safety-with-guardrails.adoc and assemblies/using-guardrails-for-ai-safety.adoc) are correct relative to this file's location.

The leveloffset=+1 produces the intended heading hierarchy.

Per your PR comments, also verify that interlinking between the LLaMA stack documentation and these Guardrails assemblies is properly handled to support cross-referencing and discoverability.

Also applies to: 20-20

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

modules/mitigating-prompt-injection-by-using-a-hugging-face-prompt-injection-detector.adoc (1)
43-47: Consider clarifying prerequisite scope.

The prerequisite about configuring the Guardrails Orchestrator service (lines 43–47) is specific to Scenario 1. Given the updated title emphasizes "Mitigating Prompt Injection" more broadly, consider adding a brief clarification that this prerequisite applies to the Orchestrator API scenario, while Scenario 2 supports standalone detection without it. This would help users navigate the document based on their use case.

Example clarification:
 ifdef::upstream[]
-* You are familiar with how to configure and deploy the Guardrails Orchestrator service. See link:{odhdocshome}/monitoring_data_science_models/#deploying-the-guardrails-orchestrator-service_monitor[Deploying the Guardrails Orchestrator].
+* (For Scenario 1 only) You are familiar with how to configure and deploy the Guardrails Orchestrator service. See link:{odhdocshome}/monitoring_data_science_models/#deploying-the-guardrails-orchestrator-service_monitor[Deploying the Guardrails Orchestrator].
 endif::[]
 ifndef::upstream[]
-* You are familiar with how to configure and deploy the Guardrails Orchestrator service. See link:{rhoaidocshome}{default-format-url}/monitoring_data_science_models/configuring-the-guardrails-orchestrator-service_monitor#deploying-the-guardrails-orchestrator-service_monitor[Deploying the Guardrails Orchestrator]
+* (For Scenario 1 only) You are familiar with how to configure and deploy the Guardrails Orchestrator service. See link:{rhoaidocshome}{default-format-url}/monitoring_data_science_models/configuring-the-guardrails-orchestrator-service_monitor#deploying-the-guardrails-orchestrator-service_monitor[Deploying the Guardrails Orchestrator]
 endif::[]

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4ca79c and 4f777e0.

📒 Files selected for processing (2)

assemblies/using-guardrails-for-ai-safety.adoc (1 hunks)
modules/mitigating-prompt-injection-by-using-a-hugging-face-prompt-injection-detector.adoc (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

assemblies/using-guardrails-for-ai-safety.adoc

🔇 Additional comments (1)

modules/mitigating-prompt-injection-by-using-a-hugging-face-prompt-injection-detector.adoc (1)

4-5: ID and title changes are safe—no broken cross-references detected.

The old ID is not referenced anywhere in the codebase, and the new ID is correctly defined and included in the using-guardrails-for-ai-safety.adoc assembly. The refactoring aligns with the JTBD approach, broadening the scope from Orchestrator-specific to outcome-focused documentation.

RobGeada · 2025-11-06T16:42:31Z

assemblies/enabling-ai-safety-with-guardrails.adoc

+The following sections describe the Guardrails components, how to deploy them and provide example use cases of how to protect your AI applications using these tools:
+
+Deploy a Guardrails Orchestrator instance:: 
+The guardrails orchestrator is the main networking layer of the guardrails ecosystem, and “orchestrates” the network requests between the user, generative models, and detector servers. 


generative models
this should be singular - it's only one generative model per orchestrator

RobGeada · 2025-11-06T16:43:09Z

assemblies/enabling-ai-safety-with-guardrails.adoc

+Configure and use the built-in detectors:: 
+The Guardrails framework provides a set of “built-in” detectors out-of-the-box, that provides a number of simple detection algorithms. You can use the following detector with `trustyai_fms` orchestrator server, which is an external provider for Llama Stack that allows you to configure and use the Guardrails Orchestrator and compatible detection models through the Llama Stack API.:
+
+* *Regex Detectors*: Pattern-based content detection for structured rule enforcement. These are the built-in detectors in the Guardrails Orchestrator service. Learn more about the link:https://github.com/trustyai-explainability/guardrails-regex-detector[guardrails-regex-detector].


There are a number of other built-in detection algorithms beyond just regex - I can write up a doc about them

RobGeada · 2025-11-06T16:43:30Z

assemblies/enabling-ai-safety-with-guardrails.adoc

+Any text classification model from link:https://huggingface.co/ibm-granite/granite-guardian-hap-38m[Huggingface] can be used as a detector model within the Guardrails ecosystem.
+
+* *Hugging Face Detectors*: Compatible with most Hugging Face `AutoModelForSequenceClassification` models, such as `granite-guardian-hap-38m` or `deberta-v3-base-prompt-injection-v2`. Learn more about the detector algorithms for the link:https://github.com/trustyai-explainability/guardrails-detectors[FMS Guardrails Orchestrator].
+* *vLLM Detector Adapter*: Content detection compatible with Hugging Face `AutoModelForCausalLM` models, for example `ibm-granite/granite-guardian-3.1-2b`. Learn more about link:https://github.com/foundation-model-stack/vllm-detector-adapter[vllm-detector-adapter].


vLLM Detector Adapter

This is not a thing, we shouldn't highlight it here

RobGeada · 2025-11-06T16:45:24Z

assemblies/enabling-ai-safety-with-guardrails.adoc

+
+It is underpinned by the open-source project link:https://github.com/foundation-model-stack/fms-guardrails-orchestrator[FMS-Guardrails Orchestrator] from IBM. You can deploy the Guardrails Orchestrator service through a Custom Resource Definition (CRD) that is managed by the TrustyAI Operator.
+
+The following sections describe the Guardrails components, how to deploy them and provide example use cases of how to protect your AI applications using these tools:


I'm not sure I understand the structure of this section- it mixes together component definitions (orchestrator, detectors, etc) with verbs (configure the guardrails gateway, monitor user inputs, enable telemetry)

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

skrthomas force-pushed the RHOAIENG-37346 branch 2 times, most recently from 98b8888 to 75dbb24 Compare November 4, 2025 20:31

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

assemblies/configuring-the-guardrails-orchestrator-service.adoc Outdated Show resolved Hide resolved

skrthomas commented Nov 4, 2025

View reviewed changes

skrthomas force-pushed the RHOAIENG-37346 branch 2 times, most recently from 24ecbd0 to c694fa9 Compare November 4, 2025 21:00

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc Outdated Show resolved Hide resolved

skrthomas force-pushed the RHOAIENG-37346 branch from c694fa9 to c464d0a Compare November 4, 2025 21:07

RHOAIENG-37346:Refactor Guardrails for Safety JTBD

5212a44

skrthomas force-pushed the RHOAIENG-37346 branch from c464d0a to 5212a44 Compare November 4, 2025 21:08

skrthomas requested a review from RobGeada November 4, 2025 21:20

zanetworker reviewed Nov 5, 2025

View reviewed changes

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

assemblies/using-guardrails-for-ai-safety.adoc Outdated Show resolved Hide resolved

PM feedback edits

a7a405f

skrthomas force-pushed the RHOAIENG-37346 branch 2 times, most recently from fa49cb5 to 6fc4169 Compare November 6, 2025 15:07

Creating AI Safety book

f4ca79c

skrthomas force-pushed the RHOAIENG-37346 branch from 6fc4169 to f4ca79c Compare November 6, 2025 15:09

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

renaming hugging face prompt injection file

4f777e0

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

RobGeada suggested changes Nov 6, 2025

View reviewed changes


		It is underpinned by the open-source project link:https://github.com/foundation-model-stack/fms-guardrails-orchestrator[FMS-Guardrails Orchestrator] from IBM. You can deploy the Guardrails Orchestrator service through a Custom Resource Definition (CRD) that is managed by the TrustyAI Operator.

		The following sections describe the Guardrails components, how to deploy them and provide example use cases of how to protect your AI applications using these tools:

RHOAIENG-37346:Refactor Guardrails for Safety JTBD #1036

Are you sure you want to change the base?

RHOAIENG-37346:Refactor Guardrails for Safety JTBD #1036

Conversation

skrthomas commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skrthomas commented Nov 4, 2025

Uh oh!

zanetworker commented Nov 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zanetworker Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Guardrails Use-Cases

PII & Sensitive Data Detection

Prompt Security

Content Safety & Moderations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skrthomas commented Nov 4, 2025 •

edited

Loading

coderabbitai bot commented Nov 4, 2025 •

edited

Loading

zanetworker Nov 5, 2025 •

edited

Loading