Separated guard and fallback logic into initial and final.#32
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (3)
🧰 Additional context used🧬 Code graph analysis (2)RAGManager/app/agents/nodes/__init__.py (6)
RAGManager/app/agents/nodes/guard_inicial.py (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
RAGManager/app/agents/graph.py (1)
5-19: Import symbols do not exist in codebase—code will fail with ImportError at runtime.The imports on lines 5–19 reference symbols that don't exist:
guard_inicialandguard_finalare imported, but onlyguardis defined inapp.agents.nodes.guardroute_after_guard_inicialandroute_after_guard_finalare imported, but onlyroute_after_guardandroute_after_fallback_finalare defined inapp.agents.routingEither create the missing node and router functions or correct the imports to use the existing symbols (
guard,route_after_guard,route_after_fallback_final).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
RAGManager/app/agents/graph.py(4 hunks)
🔇 Additional comments (1)
RAGManager/app/agents/graph.py (1)
27-37:and
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
RAGManager/app/agents/nodes/__init__.py(1 hunks)RAGManager/app/agents/nodes/fallback_final.py(1 hunks)RAGManager/app/agents/nodes/fallback_inicial.py(2 hunks)RAGManager/app/agents/nodes/guard_final.py(1 hunks)RAGManager/app/agents/nodes/guard_inicial.py(5 hunks)RAGManager/app/agents/routing.py(2 hunks)RAGManager/app/core/config.py(1 hunks)RAGManager/pyproject.toml(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
RAGManager/app/agents/routing.py (1)
RAGManager/app/agents/state.py (1)
AgentState(6-49)
RAGManager/app/agents/nodes/guard_final.py (1)
RAGManager/app/agents/state.py (1)
AgentState(6-49)
RAGManager/app/agents/nodes/__init__.py (4)
RAGManager/app/agents/nodes/guard_final.py (1)
guard_final(23-74)RAGManager/app/agents/nodes/guard_inicial.py (1)
guard_inicial(24-75)RAGManager/app/agents/nodes/parafraseo.py (1)
parafraseo(6-32)RAGManager/app/agents/nodes/retriever.py (1)
retriever(6-32)
RAGManager/app/agents/nodes/guard_inicial.py (1)
RAGManager/app/agents/state.py (1)
AgentState(6-49)
RAGManager/app/agents/nodes/fallback_final.py (1)
RAGManager/app/agents/state.py (1)
AgentState(6-49)
🔇 Additional comments (8)
RAGManager/app/agents/nodes/fallback_inicial.py (1)
1-1: Good security practice with logging.The decision to explicitly not log the malicious prompt content is appropriate and helps prevent potential log poisoning, sensitive data exposure, and security audit issues.
Also applies to: 28-28
RAGManager/app/agents/nodes/fallback_final.py (1)
1-30: LGTM! Clean simplification of fallback behavior.The refactored
fallback_finalnode correctly implements the new design where risky content detection stops processing. The security-conscious logging (not logging response content) and clear error message align well with the separation of concerns in the guard/fallback flow.RAGManager/app/agents/nodes/__init__.py (1)
8-9: LGTM! Public API correctly updated.The module exports are properly updated to replace the single
guardwithguard_inicialandguard_final, aligning with the architectural changes.RAGManager/app/agents/nodes/guard_inicial.py (1)
1-75: LGTM! Consistent renaming with preserved logic.The renaming from
guardtoguard_inicialis applied consistently throughout the file. The jailbreak detection logic, error handling, and security-conscious logging are all preserved correctly.RAGManager/app/agents/routing.py (1)
6-37: LGTM! Routing functions correctly renamed.The routing functions are consistently renamed from
route_after_guardtoroute_after_guard_inicialand fromroute_after_fallback_finaltoroute_after_guard_final. The logic and return values are preserved correctly.RAGManager/app/agents/nodes/guard_final.py (2)
38-45: LGTM! Empty response handling is correct.The guard correctly treats empty responses as safe, which is appropriate since there's no content to leak PII.
15-20: No changes needed. The DetectPII initialization is correctly configured with the appropriate parameters (pii_entitiesandon_fail). DetectPII does not supportthresholdordeviceparameters—these are specific to other validators like DetectJailbreak. The current implementation aligns with the guardrails-ai DetectPII 0.6.2 API specification.RAGManager/pyproject.toml (1)
20-22: Dependency versions are valid and free from known vulnerabilities.All three specified versions exist on PyPI and contain no known CVEs:
guardrails-ai 0.6.2: Free from CVE-2024-6961 and CVE-2024-45858 (both fixed in earlier releases).presidio-analyzer 2.2.360: No known vulnerabilities.presidio-anonymizer 2.2.360: No known vulnerabilities.Note:
guardrails-ai 0.7.1is available as a newer patch if you wish to consider upgrading for potential improvements.
| def fallback_inicial(state: AgentState) -> AgentState: | ||
| """ | ||
| Fallback Inicial node - Stops processing when malicious content is detected. | ||
|
|
||
| This node: | ||
| 1. Sets error message indicating that the user's intentions break the chatbot's rules | ||
| 2. Stops the flow by routing to END | ||
|
|
||
| Args: | ||
| state: Agent state containing the prompt flagged as malicious | ||
|
|
||
| Returns: | ||
| Updated state with error_message set, ready to route to END | ||
| """ | ||
| updated_state = state.copy() | ||
|
|
||
| # Set error message for malicious content | ||
| updated_state["error_message"] = "The user's intentions break the chatbot's rules." | ||
| logger.warning("Malicious content detected. Stopping processing. Prompt content not logged for security.") | ||
|
|
||
| return updated_state |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider enhancing docstring with integration context.
The function implementation is clean and focused. To improve maintainability, consider adding a brief note in the docstring about:
- Which node (guard_inicial) calls this fallback
- Which routing function (route_after_guard_inicial) handles the error_message
This would help future maintainers understand the flow without searching the codebase.
Example enhancement:
def fallback_inicial(state: AgentState) -> AgentState:
"""
Fallback Inicial node - Stops processing when malicious content is detected.
+
+ Called by: guard_inicial when malicious content is detected
+ Routing: route_after_guard_inicial routes to END when error_message is set
This node:
1. Sets error message indicating that the user's intentions break the chatbot's rules
2. Stops the flow by routing to END📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def fallback_inicial(state: AgentState) -> AgentState: | |
| """ | |
| Fallback Inicial node - Stops processing when malicious content is detected. | |
| This node: | |
| 1. Sets error message indicating that the user's intentions break the chatbot's rules | |
| 2. Stops the flow by routing to END | |
| Args: | |
| state: Agent state containing the prompt flagged as malicious | |
| Returns: | |
| Updated state with error_message set, ready to route to END | |
| """ | |
| updated_state = state.copy() | |
| # Set error message for malicious content | |
| updated_state["error_message"] = "The user's intentions break the chatbot's rules." | |
| logger.warning("Malicious content detected. Stopping processing. Prompt content not logged for security.") | |
| return updated_state | |
| def fallback_inicial(state: AgentState) -> AgentState: | |
| """ | |
| Fallback Inicial node - Stops processing when malicious content is detected. | |
| Called by: guard_inicial when malicious content is detected | |
| Routing: route_after_guard_inicial routes to END when error_message is set | |
| This node: | |
| 1. Sets error message indicating that the user's intentions break the chatbot's rules | |
| 2. Stops the flow by routing to END | |
| Args: | |
| state: Agent state containing the prompt flagged as malicious | |
| Returns: | |
| Updated state with error_message set, ready to route to END | |
| """ | |
| updated_state = state.copy() | |
| # Set error message for malicious content | |
| updated_state["error_message"] = "The user's intentions break the chatbot's rules." | |
| logger.warning("Malicious content detected. Stopping processing. Prompt content not logged for security.") | |
| return updated_state |
🤖 Prompt for AI Agents
In RAGManager/app/agents/nodes/fallback_inicial.py around lines 10 to 30, the
docstring is missing integration context; update it to include a brief note
stating that this fallback is invoked by the guard_inicial node and that routing
of the updated state (with error_message) is handled by
route_after_guard_inicial, so maintainers can understand the control flow
without searching the codebase; keep the existing implementation unchanged and
add a short, one- or two-sentence mention of the caller and routing function in
the docstring.
| # Set error message for malicious content | ||
| updated_state["error_message"] = "The user's intentions break the chatbot's rules." |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify routing mechanism and error message integration
# First, examine the file in question
echo "=== Content of fallback_inicial.py ==="
cat -n RAGManager/app/agents/nodes/fallback_inicial.pyRepository: ucudal/reto-xmas-2025-goland-ia-backend
Length of output: 1280
🏁 Script executed:
#!/bin/bash
# Search for routing logic that checks error_message
echo "=== Searching for routing logic that checks error_message ==="
rg -n -C3 'error_message' --type=py | head -100Repository: ucudal/reto-xmas-2025-goland-ia-backend
Length of output: 7845
🏁 Script executed:
#!/bin/bash
# Search for route_after_guard_inicial implementation
echo "=== Searching for route_after_guard_inicial ==="
rg -n 'route_after_guard_inicial' --type=py -A 20 | head -150Repository: ucudal/reto-xmas-2025-goland-ia-backend
Length of output: 4308
🏁 Script executed:
#!/bin/bash
# Search for guard_inicial references and how fallback_inicial integrates
echo "=== Searching for guard_inicial integration ==="
rg -n 'guard_inicial|fallback_inicial' --type=py -B 2 -A 2 | head -150Repository: ucudal/reto-xmas-2025-goland-ia-backend
Length of output: 6408
🏁 Script executed:
#!/bin/bash
# Find the graph/workflow definition to understand routing
echo "=== Searching for graph or workflow setup ==="
fd -e py | xargs rg -l 'add_node.*fallback_inicial|fallback_inicial.*add_node' | head -5Repository: ucudal/reto-xmas-2025-goland-ia-backend
Length of output: 159
The error message is overly generic and overwrites more specific violation context.
The routing mechanism is explicit and correct—route_after_guard_inicial checks the is_malicious flag (set by guard_inicial) to decide whether to route to fallback_inicial, which then sets the error message before routing to END.
However, fallback_inicial sets a generic message that overwrites the more specific message from guard_inicial ("Jailbreak attempt detected. Your request contains content that violates security policies."). Consider whether fallback_inicial should:
- Preserve the error message set by
guard_inicialinstead of replacing it - Pass through the original message or add context about the detection type
🤖 Prompt for AI Agents
In RAGManager/app/agents/nodes/fallback_inicial.py around lines 26-27,
fallback_inicial currently unconditionally overwrites any existing error_message
with a generic string; change it to preserve the more specific message from
guard_inicial by only setting the generic message when no error_message exists
(or append minimal context instead of replacing), e.g., check for an existing
updated_state["error_message"] and leave it unchanged if present, otherwise set
the fallback message (or concatenate a short indicator like " (fallback
detected)" to the existing message).
| except Exception as e: | ||
| # If validation fails due to error, log it but don't block the request | ||
| # This is a safety measure - if Guardrails fails, we allow the request | ||
| # but log the error for monitoring | ||
| logger.error(f"Error during PII detection: {e}") | ||
| updated_state["is_risky"] = False | ||
| updated_state["error_message"] = None |
There was a problem hiding this comment.
Reconsider fail-safe behavior for PII detection.
The exception handler allows requests to proceed when validation fails due to an error (sets is_risky = False). While this prevents blocking legitimate requests when Guardrails fails, it creates a security risk: if PII detection fails silently, sensitive information could be leaked to users.
Consider:
- Failing closed (blocking the request) when PII detection errors occur, especially in production
- Adding metrics/alerting to detect when PII validation is failing
- Using a circuit breaker pattern to degrade gracefully during persistent failures
This contrasts with guard_inicial where failing open (allowing jailbreak attempts through) is more appropriate since false positives are more problematic than false negatives for user input validation.
except Exception as e:
# If validation fails due to error, log it but don't block the request
- # This is a safety measure - if Guardrails fails, we allow the request
+ # SECURITY: Consider failing closed for PII detection in production
# but log the error for monitoring
logger.error(f"Error during PII detection: {e}")
- updated_state["is_risky"] = False
- updated_state["error_message"] = None
+ # For production: fail closed to prevent PII leakage
+ updated_state["is_risky"] = True
+ updated_state["error_message"] = (
+ "Unable to validate response safety. Please try again later."
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| except Exception as e: | |
| # If validation fails due to error, log it but don't block the request | |
| # This is a safety measure - if Guardrails fails, we allow the request | |
| # but log the error for monitoring | |
| logger.error(f"Error during PII detection: {e}") | |
| updated_state["is_risky"] = False | |
| updated_state["error_message"] = None | |
| except Exception as e: | |
| # If validation fails due to error, log it but don't block the request | |
| # SECURITY: Consider failing closed for PII detection in production | |
| # but log the error for monitoring | |
| logger.error(f"Error during PII detection: {e}") | |
| # For production: fail closed to prevent PII leakage | |
| updated_state["is_risky"] = True | |
| updated_state["error_message"] = ( | |
| "Unable to validate response safety. Please try again later." | |
| ) |
| guardrails_pii_entities: list[str] = Field( | ||
| default=[ | ||
| "EMAIL_ADDRESS", | ||
| "PHONE_NUMBER", | ||
| "CREDIT_CARD", | ||
| "SSN", | ||
| "US_PASSPORT", | ||
| "US_DRIVER_LICENSE", | ||
| "IBAN_CODE", | ||
| "IP_ADDRESS", | ||
| ], | ||
| description="List of PII entity types to detect using DetectPII validator.", | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
guardrails-ai DetectPII validator version 0.6.2 valid entity types
💡 Result:
Here are the valid PII entity types supported by the Guardrails DetectPII/GuardrailsPII validator (as exposed in the validator hub / repo documentation). [1][2]
- CREDIT_CARD
- CRYPTO
- DATE_TIME
- EMAIL_ADDRESS
- IBAN_CODE
- IP_ADDRESS
- NRP
- LOCATION
- PERSON
- PHONE_NUMBER
- MEDICAL_LICENSE
- URL
- US_BANK_NUMBER
- US_DRIVER_LICENSE
- US_ITIN
- US_PASSPORT
- US_SSN
- UK_NHS
- ES_NIF
- ES_NIE
- IT_FISCAL_CODE
- IT_DRIVER_LICENSE
- IT_VAT_CODE
- IT_PASSPORT
- IT_IDENTITY_CARD
- PL_PESEL
- SG_NRIC_FIN
- SG_UEN
- AU_ABN
- AU_ACN
- AU_TFN
- AU_MEDICARE
- IN_PAN
- IN_AADHAAR
- IN_VEHICLE_REGISTRATION
- IN_VOTER
- IN_PASSPORT
- FI_PERSONAL_IDENTITY_CODE
Sources: Guardrails PII validator docs and DetectPII README.
[1] Guardrails PII validator (guardrails-ai/guardrails_pii).
[2] Guardrails DetectPII (guardrails-ai/detect_pii).
🌐 Web query:
presidio-analyzer 2.2.360 supported PII entity types
💡 Result:
Here are the PII entity types that Presidio Analyzer supports (as listed in the Presidio docs). Global entities, then country/region-specific entities:
Global:
CREDIT_CARD, CRYPTO, DATE_TIME, EMAIL_ADDRESS, IBAN_CODE, IP_ADDRESS, NRP, LOCATION, PERSON, PHONE_NUMBER, MEDICAL_LICENSE, URL. [1]
USA:
US_BANK_NUMBER, US_DRIVER_LICENSE, US_ITIN, US_PASSPORT, US_SSN. [1]
UK:
UK_NHS, UK_NINO. [1]
Spain:
ES_NIF, ES_NIE. [1]
Italy:
IT_FISCAL_CODE, IT_DRIVER_LICENSE, IT_VAT_CODE, IT_PASSPORT, IT_IDENTITY_CARD. [1]
Poland:
PL_PESEL. [1]
Singapore:
SG_NRIC_FIN, SG_UEN. [1]
Australia:
AU_ABN, AU_ACN, AU_TFN, AU_MEDICARE. [1]
India:
IN_PAN, IN_AADHAAR, IN_VEHICLE_REGISTRATION, IN_VOTER, IN_PASSPORT, IN_GSTIN. [1]
Finland:
FI_PERSONAL_IDENTITY_CODE. [1]
Korea:
KR_RRN. [1]
Thai:
TH_TNIN. [1]
For how to retrieve the runtime list programmatically (AnalyzerEngine.get_supported_entities), see the Analyzer API reference. [2]
Sources:
[1] Presidio — Supported entities documentation. [2] Presidio Analyzer Python API (get_supported_entities).
Change SSN to US_SSN in the default PII entity types list.
The entity type SSN is not recognized by guardrails-ai DetectPII validator and presidio-analyzer. The correct entity type is US_SSN. All other current entity types in the list (EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, US_PASSPORT, US_DRIVER_LICENSE, IBAN_CODE, IP_ADDRESS) are valid.
Additionally, consider including PERSON, LOCATION, DATE_TIME, US_BANK_NUMBER, MEDICAL_LICENSE, and CRYPTO in the default list, as these are all valid and commonly used PII entity types supported by presidio-analyzer.
🤖 Prompt for AI Agents
In RAGManager/app/core/config.py around lines 50 to 62, the default PII entity
list contains "SSN" which is not recognized by the DetectPII validator; replace
"SSN" with "US_SSN" and update the default list to optionally include additional
commonly used presidio-supported entities such as "PERSON", "LOCATION",
"DATE_TIME", "US_BANK_NUMBER", "MEDICAL_LICENSE", and "CRYPTO" if you want
broader detection; ensure the list items remain string literals and the Field
description remains unchanged.
|
@JuanFKurucz I've divided guard and fallback logic into two due to different things being accounted for when processing. First check is for malicious intent, therefore guard_inicial checks for Jailbreak attempts (functionality can be modified or expanded if fit) and guard_final checks for PII information (again, functionality can be modified or expanded if necessary). When fallback nodes are called I used static responses instead of LLM calls, with LLM call's cost in mind. If we want the LLM calls I can revert this change. |
🔍 PR Validation Results
|
Separated guard and fallback logic into initial and final reducing coupling.