-
Notifications
You must be signed in to change notification settings - Fork 37
RHOAIENG-37346:Refactor Guardrails for Safety JTBD #1036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughDocumentation restructuring reorganizes Guardrails safety content by removing an orchestrator-specific assembly, updating module titles and identifiers for clarity, creating new assemblies and a top-level document that group safety features by use case (PII detection, prompt injection, content moderation), and updating include directives accordingly. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc (1)
4-5: Update module ID to reflect the new title for consistency.The title has been updated to emphasize the safety outcome ("Preventing Prompt Injection"), but the ID on line 4 still reflects the old phrasing ("using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator"). For consistency and maintainability, consider updating the ID to reflect the new title.
-[id="using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator_{context}"] +[id="preventing-prompt-injection-using-hugging-face-detector_{context}"]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
assemblies/configuring-the-guardrails-orchestrator-service.adoc(1 hunks)modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc(1 hunks)modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc(1 hunks)modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc(1 hunks)
🔇 Additional comments (3)
modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc (1)
4-5: Verify that the main assembly references this updated module ID.The title and ID have been updated to emphasize the safety outcome ("Filtering flagged content") rather than the operation. Ensure the main assembly file (
configuring-the-guardrails-orchestrator-service.adoc) includes this module using the updated ID.modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc (1)
6-11: Enhanced abstract and prerequisites strengthen production-readiness messaging.The expanded abstract clearly articulates the use of preset guardrail pipelines for consistent safety policies in production, and the new Prerequisites section appropriately establishes that the guardrails gateway image must be configured beforehand. These changes align well with the PR's focus on safety-outcome messaging.
assemblies/configuring-the-guardrails-orchestrator-service.adoc (1)
4-11: Update assembly ID and verify title reflects safety-outcome JTBD messaging.The ID and title have been successfully updated from an operational focus ("Configuring") to a safety-outcome focus ("Enable AI safety"). This aligns well with the PR objectives. Ensure external cross-references and inbound links are updated accordingly, as this is a high-traffic assembly page.
assemblies/configuring-the-guardrails-orchestrator-service.adoc
Outdated
Show resolved
Hide resolved
assemblies/configuring-the-guardrails-orchestrator-service.adoc
Outdated
Show resolved
Hide resolved
modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc
Show resolved
Hide resolved
98b8888 to
75dbb24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
assemblies/configuring-the-guardrails-orchestrator-service.adoc (1)
49-56: Fix critical include file reference with duplicate.adocextension.Line 55 must reference a file with one of the recognized AsciiDoc extensions (.asciidoc, .adoc, .ad, .asc, or .txt). The current reference includes a duplicate extension:
include::modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc[leveloffset=+1]This will cause the include directive to fail during documentation builds, as the file path is malformed. Additionally, line 54 uses
leveloffset=+2for the regex detector while lines 52–53 useleveloffset=+1, creating an inconsistent heading hierarchy within the same section.Apply this diff to fix the file reference and standardize leveloffsets:
include::modules/guardrails-orchestrator-hap-scenario.adoc[leveloffset=+1] include::modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc[leveloffset=+1] -include::modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc[leveloffset=+2] -include::modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc[leveloffset=+1] +include::modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc[leveloffset=+1] +include::modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc[leveloffset=+1]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
assemblies/configuring-the-guardrails-orchestrator-service.adoc(1 hunks)modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc(1 hunks)modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc(1 hunks)modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc
🚧 Files skipped from review as they are similar to previous changes (1)
- modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc.adoc
🔇 Additional comments (4)
modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc (1)
4-5: Heading change improves the preventive posture of the module.The updated heading shifts from a neutral/descriptive tone ("Using...") to an action-oriented, safety-focused tone ("Preventing..."), which better aligns with the PR objectives to emphasize AI safety practices. The change is clear and effective for users seeking guidance on prompt injection prevention.
Note: The module ID at line 4 (
using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator_{context}) was not updated to reflect the new heading. If backward compatibility with existing links is not required, consider updating the ID to match the new heading (e.g.,preventing-prompt-injection-with-hugging-face-detector_{context}). Otherwise, this is a non-issue.assemblies/configuring-the-guardrails-orchestrator-service.adoc (3)
4-5: Assembly title and ID updated to emphasize safety-first approach.The assembly-level title and ID changes align well with the PR objectives. The new title "Enabling AI safety with Guardrails" and ID
enable-ai-safety-with-guardrails_{context}better communicate the purpose and focus of the documentation section.
11-35: Content restructuring improves clarity on Guardrails components.The reorganization from a concise detector listing to descriptive component descriptions (Deploy, Configure & Use, Monitor, Enable OpenTelemetry) provides users with a clear roadmap of what they can accomplish. This is a solid documentation improvement that sets expectations upfront.
40-47: Verify intentionality of leveloffset hierarchy for detector/gateway modules.Lines 44–46 use
leveloffset=+2, creating a sub-subsection level under "Deploying and Configuring Guardrails components," while lines 40–43 useleveloffset=+1. This creates a heading hierarchy where built-in detectors, Hugging Face models, and gateway configuration are visually nested deeper than the Orchestrator deployment itself.If this nesting is intentional (to group detector/gateway topics as child topics of deployment), the current structure is correct. However, if these modules should be peer topics at the same level as the Orchestrator, use
leveloffset=+1consistently.Please confirm whether the
leveloffset=+2on lines 44–46 is intentional to create a sub-topic hierarchy, or should be changed toleveloffset=+1for consistency with sibling modules.
assemblies/configuring-the-guardrails-orchestrator-service.adoc
Outdated
Show resolved
Hide resolved
| ifdef::context[:parent-context: {context}] | ||
| [id="using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator_{context}"] | ||
| = Using a Hugging Face Prompt Injection detector with the Guardrails Orchestrator | ||
| = Preventing Prompt Injection by using a Hugging Face Prompt Injection detector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this title is OK, I'll go ahead and update the id=, but that creates a kind of pain of needing to update it in multiple places, so I want to be sure the titles are okay from stakeholder POV before I make deep changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Mitigating" is usually a safer word. "Preventing" is stronger and may not be 100% guranteed.
| [id="configuring-the-guardrails-orchestrator-service_{context}"] | ||
| = Configuring the Guardrails Orchestrator service | ||
| [id="enable-ai-safety-with-guardrails_{context}"] | ||
| = Enabling AI safety with Guardrails | ||
|
|
||
| The TrustyAI Guardrails Orchestrator service is a tool to invoke detections on text generation inputs and outputs, as well as standalone detections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to leave this mention of "TrustyAI Guardrails" here? Or remove the "TrustyAI" bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the spirit of decoipling function from tool. Whenever we have to refer to something concrete, we can reference the tool that does the function.
In this case, the phrasing looks okay here but ti will be important to always communicate "tool x makes function y happen" in this case, maybe a blurb on how the trustyAI orchestratore service helps with enabling AI safety with guardrails.
24ecbd0 to
c694fa9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc (1)
4-4: Unresolved: Update ID to follow AsciiDoc lowercase convention.The ID still contains mixed-case "LLM", which deviates from AsciiDoc ID conventions (all lowercase with hyphens). This was flagged in a previous review and needs to be corrected for consistency across the documentation.
Apply this diff:
-[id="enforcing-configured-safety-pipelines-for-LLM-inference-using-guardrails-gateway_{context}"] +[id="enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway_{context}"]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
assemblies/configuring-the-guardrails-orchestrator-service.adoc(1 hunks)modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc(1 hunks)modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc(1 hunks)modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc
- modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc
🔇 Additional comments (3)
assemblies/configuring-the-guardrails-orchestrator-service.adoc (3)
7-7: Unresolved: Address user question about "TrustyAI" branding.A previous reviewer asked whether to retain or remove the "TrustyAI" reference here. Please clarify the documentation's branding approach: should this mention "TrustyAI Guardrails" or just "Guardrails Orchestrator"?
44-44: Verify leveloffset hierarchy for sub-modules.Lines 44 and 54 use
leveloffset=+2while most sibling modules in the same sections useleveloffset=+1. This creates deeper nesting for those modules. Confirm this is intentional (i.e., the built-in detector and regex detector are genuinely sub-topics) or standardize to+1for consistency.If these should be at the same level as their siblings, apply this diff:
-include::modules/configuring-the-built-in-detector-and-guardrails-gateway.adoc[leveloffset=+2] +include::modules/configuring-the-built-in-detector-and-guardrails-gateway.adoc[leveloffset=+1]-include::modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc[leveloffset=+2] +include::modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc[leveloffset=+1]Also applies to: 54-54
55-55: Fixed: File extension and include reference corrected.The include statement now correctly references the module file with a single
.adocextension. The previous double extension issue has been resolved. ✅
modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc
Outdated
Show resolved
Hide resolved
c694fa9 to
c464d0a
Compare
c464d0a to
5212a44
Compare
|
@zanetworker @RobGeada PTAL at this PR for the Guardrails refactoring and let me know your thoughts/comments/feedbacks |
|
Do we have llama stack with guardrails here somewhere? LLama stack will need to interlink between the llama stack docs and our safety docs. |
| [id="configuring-the-guardrails-orchestrator-service_{context}"] | ||
| = Configuring the Guardrails Orchestrator service | ||
| [id="enable-ai-safety-with-guardrails_{context}"] | ||
| = Enabling AI safety with Guardrails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd stop an Enabling AI Safety, then guardails be hover, metadata, subtitle as discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially Guiardrails would be one sub topic in the future (not the only thing to do in AI safety, amongst others like safety validation, safety hub, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the sake of our file architecture, since we have the topmost book as Enabling AI Safety, i.e. the tile you see on the main docs page, we have to keep this name/id distinct, so I think we have to include "with guardrails" so that we have some unique IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd then vote for removing (with guardrails) so that when they click on Enable AI safety, they can find examples of guardrails. Just trying to not pin us down to "just" guardrails because there will be more. If other folks have strong opinions, we can discuss how we plan on moving/restructuring for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok perfect, that is how we have it then. I think this works.
| ifdef::context[:parent-context: {context}] | ||
| [id="using-a-hugging-face-prompt-injection-detector-with-guardrails-orchestrator_{context}"] | ||
| = Using a Hugging Face Prompt Injection detector with the Guardrails Orchestrator | ||
| = Preventing Prompt Injection by using a Hugging Face Prompt Injection detector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably for future reference, but can we decouple between "Deployment/configuration of the system" and "using the system", they are different personas usually.
Should we have a section for Guardrail use-cases where this would be one application of guardrails
Guardrails Use-Cases
PII & Sensitive Data Detection
- Protecting User Privacy by Detecting Email Addresses (email)
- Preventing Credit Card Leakage with Luhn-Validated Detection (credit-card)
- Safeguarding Social Security Numbers in User Inputs (us-social-security-number)
- Detecting Phone Numbers to Prevent Contact Information Leakage (us-phone-number)
- Identifying IPv4 Addresses for Network Security Compliance (ipv4)
- Detecting IPv6 Addresses in Technical Documentation (ipv6)
- Validating UK Postal Codes for Geographic Data Privacy (uk-post-code)
- Creating Custom PII Detectors with Regex Patterns ($CUSTOM_REGEX)
-
- Preventing Prompt Injection by using a Hugging Face Prompt Injection detector
Prompt Security
- Preventing Prompt Injection Attacks with DeBERTa Classifier (protectai/deberta-v3-base-prompt-injection-v2)
- Detecting Jailbreak Attempts with Granite Guardian (ibm-granite/granite-guardian-*)
Content Safety & Moderations
- Content Safety & Moderation
- Filtering Toxic Content with Granite Guardian HAP (ibm-granite/granite-guardian-hap-38m)
- Detecting Hate, Abuse, and Profanity in User Messages (HAP models)
- Identifying Harmful Content Across Multiple Risk Categories (Granite Guardian models)
- Blocking Social Bias in LLM Responses (Granite Guardian models)
- Preventing Unethical Behavior Suggestions (Granite Guardian models)
- Filtering Sexual Content from Conversations (Granite Guardian models)
- Detecting Violence and Harmful Instructions (Granite Guardian models)
- Identifying Profanity in User-Generated Content (Granite Guardian models)
...
https://github.com/trustyai-explainability/guardrails-detectors/tree/main/detectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zanetworker I am in favor of breaking these up into two sections and it wouldn't be difficult to do that in this PR by adding a new assembly and siphoning these "Use case" inclusions that you've highlighted there. In this particular iteration of the refactor, I tried to group the Config/Use cases; whereas previously the use cases were sort-of meshed with the config pieces, I grouped them separately into Configs then Use cases. The Monitoring user inputs with the GOrch service is interesting because it is a use-case for filtering hateful and profane language but it has a ConfigMap piece to it as well, but it is mostly a scenario piece. I will add another commit shortly where I reorganize the use cases into a separate assembly and re-title the "Monitoring user inputs with the Guardrails Orchestrator service to a variation of your suggested title "Detecting Hateful and profane language".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the question/comment about the llamastack in Guardrails, I think in the re-org it accidentally got deleted 😬 so thanks for asking about it. I re-added it.
The Guardrails with TAI content is both in the Guardrails section and in a standalone section Using TrustyAI with Llama Stack. Since this procedure to set up Llama Stack with Trusty AI is in both places, I've reorganized the appearance in the Guardrails section to appear in the use-case as a Detecting PII scenario, hoping to catch people intending to do the setup in the TrustyAI with Llama Stack section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
assemblies/using-llama-stack-with-trustyai.adoc (1)
15-15: Bullet point at line 15 misaligns with the updated module title.Line 15 still uses the generic "Using the trustyai-fms Guardrails Orchestrator with Llama Stack" wording, but line 20 now includes the PII-focused module
detecting-pii-by-using-guardrails-with-llama-stack.adoc. Update the bullet point to accurately reflect that the example focuses on PII detection.Apply this diff to align the bullet point:
-* Using the trustyai-fms Guardrails Orchestrator with Llama Stack +* Detecting personally identifiable information (PII) by using Guardrails with Llama StackAlso applies to: 20-20
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
assemblies/configuring-the-guardrails-orchestrator-service.adoc(0 hunks)assemblies/enabling-ai-safety.adoc(1 hunks)assemblies/using-guardrails-for-ai-safety.adoc(1 hunks)assemblies/using-llama-stack-with-trustyai.adoc(1 hunks)modules/detecting-hateful-and-profane-language.adoc(1 hunks)modules/detecting-pii-by-using-guardrails-with-llama-stack.adoc(2 hunks)modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc(1 hunks)modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc(1 hunks)modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc(1 hunks)monitoring-data-science-models.adoc(1 hunks)
💤 Files with no reviewable changes (1)
- assemblies/configuring-the-guardrails-orchestrator-service.adoc
✅ Files skipped from review due to trivial changes (1)
- assemblies/enabling-ai-safety.adoc
🚧 Files skipped from review as they are similar to previous changes (3)
- modules/using-a-hugging-face-prompt-injection-detector-with-the-guardrails-orchestrator.adoc
- modules/enforcing-configured-safety-pipelines-for-llm-inference-using-guardrails-gateway.adoc
- modules/filtering-flagged-content-by-sending-requests-to-the-regex-detector.adoc
🔇 Additional comments (6)
monitoring-data-science-models.adoc (1)
32-34: Clean structural reorganization of the monitoring index.The change cleanly replaces the outdated single Guardrails Orchestrator assembly with two focused assemblies for AI safety enablement and guardrails usage. The blank-line separator improves readability. Ensure that both referenced assemblies (
enabling-ai-safety.adocandusing-guardrails-for-ai-safety.adoc) are present in the repository and that their IDs match the expected context conventions.modules/detecting-pii-by-using-guardrails-with-llama-stack.adoc (2)
4-5: Well-focused module naming and ID updates.The renaming from generic "using Guardrails Orchestrator with Llama Stack" to specific "Detecting PII by using Guardrails with Llama Stack" improves discoverability and clarifies the example's scope.
14-14: Description clarification enhances context.The reword clearly positions the Guardrails Orchestrator as a Llama Stack safety component for PII detection. The narrative flow is improved.
modules/detecting-hateful-and-profane-language.adoc (1)
4-5: Focused and clear module reframing.The HAP module ID and title are now use-case-specific and more discoverable. The description reword emphasizing HAP detection aligns well with the new name. This follows the positive refactoring pattern established in the PII module.
Also applies to: 8-8
assemblies/using-guardrails-for-ai-safety.adoc (2)
6-17: Well-organized assembly structure with logical subsections.The three-subsection organization (Detecting PII, Securing Prompts, Moderating Content) provides clear categorization of guardrails use cases. All includes use consistent
leveloffset=+1formatting. The introductory context at line 6 is appropriately focused.Note on interlinking: The PR comments from @zanetworker mention that interlinking between the llama-stack documentation and the safety/guardrails documentation needs attention. This assembly includes
detecting-pii-by-using-guardrails-with-llama-stack.adoc, which bridges these topics. Verify that the llama-stack assembly (assemblies/using-llama-stack-with-trustyai.adoc) includes appropriate cross-references back to this safety assembly for bidirectional navigation.
19-20: Proper context restoration.The parent-context conditional blocks follow standard AsciiDoc assembly patterns.
fa49cb5 to
6fc4169
Compare
6fc4169 to
f4ca79c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
enabling-ai-safety.adoc (1)
1-20: Ensure cross-references and navigation from related documentation.The creation of a new dedicated safety documentation entry point is good for organization, but you should verify that:
- The included assemblies contain proper cross-references (xrefs) to related documentation, particularly the LLaMA stack documentation mentioned in the PR comments.
- Navigation paths from parent/related documentation (e.g., monitoring, model deployment guides) link appropriately to this new page or its sections.
- The document is discoverable from the main documentation navigation structure.
This aligns with zanetworker's comment about the need for interlinking between LLaMA stack docs and safety/guardrails documentation.
If you need help auditing or adding cross-references in the included assemblies or related documentation, I can help verify the interlinking structure.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
assemblies/enabling-ai-safety-with-guardrails.adoc(1 hunks)enabling-ai-safety.adoc(1 hunks)monitoring-data-science-models.adoc(0 hunks)
💤 Files with no reviewable changes (1)
- monitoring-data-science-models.adoc
🚧 Files skipped from review as they are similar to previous changes (1)
- assemblies/enabling-ai-safety-with-guardrails.adoc
🔇 Additional comments (2)
enabling-ai-safety.adoc (2)
1-14: Front matter and AsciiDoc configuration look correct.The Jekyll front matter, document attributes, and AsciiDoc configuration are properly structured. The use of
:context: safetyaligns with the assembly-based documentation approach.
18-18: Verify included assembly paths and structure.The includes reference two new assemblies that split the previous consolidated Guardrails documentation. Ensure that:
- The assembly file paths (
assemblies/enabling-ai-safety-with-guardrails.adocandassemblies/using-guardrails-for-ai-safety.adoc) are correct relative to this file's location.- The leveloffset=+1 produces the intended heading hierarchy.
Per your PR comments, also verify that interlinking between the LLaMA stack documentation and these Guardrails assemblies is properly handled to support cross-referencing and discoverability.
Also applies to: 20-20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
modules/mitigating-prompt-injection-by-using-a-hugging-face-prompt-injection-detector.adoc (1)
43-47: Consider clarifying prerequisite scope.The prerequisite about configuring the Guardrails Orchestrator service (lines 43–47) is specific to Scenario 1. Given the updated title emphasizes "Mitigating Prompt Injection" more broadly, consider adding a brief clarification that this prerequisite applies to the Orchestrator API scenario, while Scenario 2 supports standalone detection without it. This would help users navigate the document based on their use case.
Example clarification:
ifdef::upstream[] -* You are familiar with how to configure and deploy the Guardrails Orchestrator service. See link:{odhdocshome}/monitoring_data_science_models/#deploying-the-guardrails-orchestrator-service_monitor[Deploying the Guardrails Orchestrator]. +* (For Scenario 1 only) You are familiar with how to configure and deploy the Guardrails Orchestrator service. See link:{odhdocshome}/monitoring_data_science_models/#deploying-the-guardrails-orchestrator-service_monitor[Deploying the Guardrails Orchestrator]. endif::[] ifndef::upstream[] -* You are familiar with how to configure and deploy the Guardrails Orchestrator service. See link:{rhoaidocshome}{default-format-url}/monitoring_data_science_models/configuring-the-guardrails-orchestrator-service_monitor#deploying-the-guardrails-orchestrator-service_monitor[Deploying the Guardrails Orchestrator] +* (For Scenario 1 only) You are familiar with how to configure and deploy the Guardrails Orchestrator service. See link:{rhoaidocshome}{default-format-url}/monitoring_data_science_models/configuring-the-guardrails-orchestrator-service_monitor#deploying-the-guardrails-orchestrator-service_monitor[Deploying the Guardrails Orchestrator] endif::[]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
assemblies/using-guardrails-for-ai-safety.adoc(1 hunks)modules/mitigating-prompt-injection-by-using-a-hugging-face-prompt-injection-detector.adoc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- assemblies/using-guardrails-for-ai-safety.adoc
🔇 Additional comments (1)
modules/mitigating-prompt-injection-by-using-a-hugging-face-prompt-injection-detector.adoc (1)
4-5: ID and title changes are safe—no broken cross-references detected.The old ID is not referenced anywhere in the codebase, and the new ID is correctly defined and included in the
using-guardrails-for-ai-safety.adocassembly. The refactoring aligns with the JTBD approach, broadening the scope from Orchestrator-specific to outcome-focused documentation.
| The following sections describe the Guardrails components, how to deploy them and provide example use cases of how to protect your AI applications using these tools: | ||
|
|
||
| Deploy a Guardrails Orchestrator instance:: | ||
| The guardrails orchestrator is the main networking layer of the guardrails ecosystem, and “orchestrates” the network requests between the user, generative models, and detector servers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generative models
this should be singular - it's only one generative model per orchestrator
| Configure and use the built-in detectors:: | ||
| The Guardrails framework provides a set of “built-in” detectors out-of-the-box, that provides a number of simple detection algorithms. You can use the following detector with `trustyai_fms` orchestrator server, which is an external provider for Llama Stack that allows you to configure and use the Guardrails Orchestrator and compatible detection models through the Llama Stack API.: | ||
| + | ||
| * *Regex Detectors*: Pattern-based content detection for structured rule enforcement. These are the built-in detectors in the Guardrails Orchestrator service. Learn more about the link:https://github.com/trustyai-explainability/guardrails-regex-detector[guardrails-regex-detector]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a number of other built-in detection algorithms beyond just regex - I can write up a doc about them
| Any text classification model from link:https://huggingface.co/ibm-granite/granite-guardian-hap-38m[Huggingface] can be used as a detector model within the Guardrails ecosystem. | ||
| + | ||
| * *Hugging Face Detectors*: Compatible with most Hugging Face `AutoModelForSequenceClassification` models, such as `granite-guardian-hap-38m` or `deberta-v3-base-prompt-injection-v2`. Learn more about the detector algorithms for the link:https://github.com/trustyai-explainability/guardrails-detectors[FMS Guardrails Orchestrator]. | ||
| * *vLLM Detector Adapter*: Content detection compatible with Hugging Face `AutoModelForCausalLM` models, for example `ibm-granite/granite-guardian-3.1-2b`. Learn more about link:https://github.com/foundation-model-stack/vllm-detector-adapter[vllm-detector-adapter]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM Detector Adapter
This is not a thing, we shouldn't highlight it here
|
|
||
| It is underpinned by the open-source project link:https://github.com/foundation-model-stack/fms-guardrails-orchestrator[FMS-Guardrails Orchestrator] from IBM. You can deploy the Guardrails Orchestrator service through a Custom Resource Definition (CRD) that is managed by the TrustyAI Operator. | ||
|
|
||
| The following sections describe the Guardrails components, how to deploy them and provide example use cases of how to protect your AI applications using these tools: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the structure of this section- it mixes together component definitions (orchestrator, detectors, etc) with verbs (configure the guardrails gateway, monitor user inputs, enable telemetry)

Description
How Has This Been Tested?
Merge criteria:
Link to docs preview: https://opendatahub-documentation--1036.org.readthedocs.build/en/1036/enabling-ai-safety/index.html
NOTE: This preview is a basic asciidoc preview. It isn't how the content will look on docs.redhat.com or odh, but its a nice way to preview content in PRs.
New TOC:

Summary by CodeRabbit