Skip to content

Fix/workflow structure#34

Merged
brunoalbin23 merged 2 commits intomainfrom
fix/WorkflowStructure
Dec 15, 2025
Merged

Fix/workflow structure#34
brunoalbin23 merged 2 commits intomainfrom
fix/WorkflowStructure

Conversation

@JPAmorin
Copy link
Copy Markdown
Collaborator

Standing logic had a flaw where malicious messages would be saved to the database without prior checking for intention. Current logic delegates saving chat message (new prompt) and retrieving full chat history to parafraseo node. Endpoint not developed yet.

@JPAmorin JPAmorin self-assigned this Dec 14, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 14, 2025

📝 Walkthrough

Summary by CodeRabbit

  • Refactoring
    • Restructured agent workflow with two-stage content validation for enhanced consistency
    • Reorganized database operations and chat history retrieval workflow
    • Enhanced paraphrasing logic to incorporate chat history context

✏️ Tip: You can customize this high-level summary in your review settings.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Fix/workflow structure' is vague and generic, using non-descriptive terms that don't clearly convey the specific change or the primary purpose of the pull request. Use a more specific title that describes the actual fix, such as 'Move database operations from agent_host to parafraseo node' or 'Fix malicious content validation before database persistence'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly explains the problem and the solution, addressing the flaw where malicious messages were saved without validation and how the new logic defers this responsibility to the parafraseo node.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/WorkflowStructure

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

🔍 PR Validation Results

Check Status
Build ✅ success
Trivy Check Security tab

View detailed results

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1e20d4 and 72dc726.

📒 Files selected for processing (3)
  • RAGManager/app/agents/graph.py (1 hunks)
  • RAGManager/app/agents/nodes/agent_host.py (2 hunks)
  • RAGManager/app/agents/nodes/parafraseo.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
RAGManager/app/agents/nodes/parafraseo.py (1)
RAGManager/app/agents/state.py (1)
  • AgentState (6-50)
🔇 Additional comments (6)
RAGManager/app/agents/graph.py (2)

21-32: LGTM! Clear documentation of the two-stage validation flow.

The updated docstring accurately describes the new workflow structure where malicious content is validated before any DB operations occur, and again after context building. This aligns with the PR objective to prevent saving malicious messages.


56-82: The guard node reuse pattern is correctly implemented and requires no changes.

The route_after_guard function returns either "malicious" or "continue" based solely on the is_malicious flag. Each add_conditional_edges call defines its own routing dictionary, which maps these return values to appropriate destinations:

  • First guard invocation: "continue" → "parafraseo"
  • Second guard invocation: "continue" → END

This is the correct pattern for conditional edges in LangGraph—the routing function returns a key, and the caller determines how that key is mapped to destinations. The function does not need context awareness because routing differentiation is handled by the conditional edge definitions themselves.

RAGManager/app/agents/nodes/agent_host.py (3)

1-1: LGTM! Documentation updated to reflect deferred DB operations.

The updated docstring clearly indicates that DB operations are now deferred to the parafraseo node, which aligns with the PR objective.


42-45: LGTM! State preparation correctly defers chat_messages population.

The code correctly sets chat_messages to None with a clear comment indicating it will be populated in the parafraseo node after validation.


30-40: Add early return after setting error_message for missing user_id.

When user_id is missing, the code logs an error and sets error_message in the state (line 39), but then continues execution and attempts to set prompt, initial_context, and chat_messages (lines 43-45). This could lead to unexpected behavior downstream.

Apply this diff to return early after detecting the error:

     if not user_id:
         logger.error("user_id is required in state but was not provided")
         updated_state["error_message"] = "user_id is required"
+        return updated_state
-        return updated_state
 
     # Set prompt and initial context (no DB operations)

Likely an incorrect or invalid review comment.

RAGManager/app/agents/nodes/parafraseo.py (1)

11-11: No action required. The model name "gpt-5-nano" is a valid OpenAI model as of December 2025. GPT-5 was released prior to this date, and "gpt-5-nano" is an officially supported model variant.

Comment on lines +16 to +28
Parafraseo node - Saves message to DB, retrieves chat history, and paraphrases user input.

This node:
1. Takes the adjusted text from Fallback Inicial
2. Paraphrases it to improve clarity or adjust format
3. Prepares text for retrieval step
1. Saves the user's message to the chat session in PostgreSQL
2. Retrieves all chat messages for the session (including the newly saved message)
3. Paraphrases the user input using chat history to improve clarity
4. Prepares text for retrieval step

Args:
state: Agent state containing adjusted_text
state: Agent state containing prompt, chat_session_id, and user_id

Returns:
Updated state with paraphrased_text set
Updated state with chat_messages, paraphrased_text set
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Documentation claims functionality not yet implemented.

The docstring states that this node "Saves the user's message to the chat session in PostgreSQL" and "Retrieves all chat messages for the session" (lines 19-20), but the implementation shows this is not yet developed (see TODO at lines 32-44). The function currently uses empty chat history with a warning log.

Update the docstring to reflect the current implementation status:

     """
-    Parafraseo node - Saves message to DB, retrieves chat history, and paraphrases user input.
+    Parafraseo node - Paraphrases user input (chat history integration pending).
 
     This node:
-    1. Saves the user's message to the chat session in PostgreSQL
-    2. Retrieves all chat messages for the session (including the newly saved message)
-    3. Paraphrases the user input using chat history to improve clarity
-    4. Prepares text for retrieval step
+    1. TODO: Save the user's message to the chat session (endpoint not yet implemented)
+    2. TODO: Retrieve all chat messages for the session (endpoint not yet implemented)
+    3. Paraphrases the user input (currently without chat history)
+    4. Prepares text for retrieval step
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Parafraseo node - Saves message to DB, retrieves chat history, and paraphrases user input.
This node:
1. Takes the adjusted text from Fallback Inicial
2. Paraphrases it to improve clarity or adjust format
3. Prepares text for retrieval step
1. Saves the user's message to the chat session in PostgreSQL
2. Retrieves all chat messages for the session (including the newly saved message)
3. Paraphrases the user input using chat history to improve clarity
4. Prepares text for retrieval step
Args:
state: Agent state containing adjusted_text
state: Agent state containing prompt, chat_session_id, and user_id
Returns:
Updated state with paraphrased_text set
Updated state with chat_messages, paraphrased_text set
Parafraseo node - Paraphrases user input (chat history integration pending).
This node:
1. TODO: Save the user's message to the chat session (endpoint not yet implemented)
2. TODO: Retrieve all chat messages for the session (endpoint not yet implemented)
3. Paraphrases the user input (currently without chat history)
4. Prepares text for retrieval step
Args:
state: Agent state containing prompt, chat_session_id, and user_id
Returns:
Updated state with chat_messages, paraphrased_text set
🤖 Prompt for AI Agents
In RAGManager/app/agents/nodes/parafraseo.py around lines 16 to 28, the
docstring inaccurately states that the node saves messages to PostgreSQL and
retrieves session chat history, but the implementation currently leaves those
steps unimplemented (see TODO and a warning about using empty chat history).
Update the docstring to accurately reflect current behavior: state that saving
to DB and retrieving chat history are not implemented yet, that the node logs a
warning and proceeds with an empty chat history, and optionally note the TODO
for future implementation and where to resume work (e.g., DB save and retrieval
functions).

Comment on lines +32 to +44
# TODO: Implement endpoint call to save message and retrieve chat history
# This should:
# 1. Use an LLM or paraphrasing model to rephrase the text
# 2. Improve clarity, adjust tone, or format as needed
# 3. Set paraphrased_text with the result

# Paraphrase the last message using history
# 1. Call an endpoint (not yet developed) that:
# - Saves the current user message to the chat session
# - Retrieves all chat messages for the session (including the newly saved message)
# - Returns the updated chat_messages list
# 2. Update state with chat_messages from the endpoint response
# 3. Handle errors appropriately (session not found, permission denied, etc.)

# Placeholder: For now, we'll use empty chat history
# Once the endpoint is implemented, replace this with the actual endpoint call
updated_state["chat_messages"] = None
logger.warning("Chat history retrieval endpoint not yet implemented - using empty history")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM! Clear TODO with implementation plan.

The TODO block provides clear guidance on what needs to be implemented, including the endpoint responsibilities and error handling considerations. The placeholder implementation with a warning log is appropriate for the current state.

Once the endpoint is developed, this integration will complete the security fix described in the PR objectives. Would you like me to help draft the endpoint integration code when ready?

Comment on lines +51 to 55
# Use messages from state (will include chat history once endpoint is implemented)
messages = [SystemMessage(content=system_instruction)] + state.get("messages", [])

response = llm.invoke(messages)
updated_state = state.copy() # Create a copy of the state to update
updated_state["paraphrased_text"] = response.content
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent state access - should use chat_messages instead of messages.

Line 52 uses state.get("messages", []) to build the paraphrasing context, but line 43 sets updated_state["chat_messages"] = None. Once the endpoint is implemented, the paraphrasing should use the retrieved chat history from chat_messages, not the raw messages from the initial state.

Consider updating the code to use chat_messages for consistency:

-    # Use messages from state (will include chat history once endpoint is implemented)
-    messages = [SystemMessage(content=system_instruction)] + state.get("messages", [])
+    # Use chat_messages from state (will include chat history once endpoint is implemented)
+    chat_msgs = state.get("chat_messages") or state.get("messages", [])
+    messages = [SystemMessage(content=system_instruction)] + chat_msgs

This ensures the paraphrasing will automatically use the retrieved chat history once chat_messages is populated by the endpoint.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Use messages from state (will include chat history once endpoint is implemented)
messages = [SystemMessage(content=system_instruction)] + state.get("messages", [])
response = llm.invoke(messages)
updated_state = state.copy() # Create a copy of the state to update
updated_state["paraphrased_text"] = response.content
# Use chat_messages from state (will include chat history once endpoint is implemented)
chat_msgs = state.get("chat_messages") or state.get("messages", [])
messages = [SystemMessage(content=system_instruction)] + chat_msgs
response = llm.invoke(messages)
updated_state["paraphrased_text"] = response.content
🤖 Prompt for AI Agents
In RAGManager/app/agents/nodes/parafraseo.py around lines 51–55, the code builds
messages using state.get("messages", []) but earlier sets
updated_state["chat_messages"] = None, so switch to using
state.get("chat_messages", []) when composing the messages list; update the
messages assignment to use chat_messages and ensure the rest of the function
references updated_state["paraphrased_text"] unchanged so the paraphrase will
include retrieved chat history once chat_messages is populated.

@brunoalbin23 brunoalbin23 merged commit a55f04b into main Dec 15, 2025
5 checks passed
@brunoalbin23 brunoalbin23 deleted the fix/WorkflowStructure branch December 15, 2025 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants