Skip to content

Feat/rag rmq integration#55

Merged
JPAmorin merged 5 commits intomainfrom
feat/rag-rmq-integration
Dec 18, 2025
Merged

Feat/rag rmq integration#55
JPAmorin merged 5 commits intomainfrom
feat/rag-rmq-integration

Conversation

@ignaciovalle20
Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions
Copy link
Copy Markdown

🔍 PR Validation Results

Check Status
Build ✅ success
Trivy Check Security tab

View detailed results

Copilot AI review requested due to automatic review settings December 17, 2025 23:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates RabbitMQ message queue functionality into the RAG Manager service to enable asynchronous PDF processing. The integration allows the service to consume MinIO events from a RabbitMQ queue and automatically process uploaded PDF files through the existing pipeline without requiring direct API calls.

Key Changes:

  • Added RabbitMQ consumer that processes MinIO S3 bucket events for automatic PDF processing
  • Integrated the consumer as a daemon thread that starts on application startup
  • Added configuration management for RabbitMQ connection parameters with URL encoding support

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
RAGManager/pyproject.toml Added pika>=1.3.0 dependency for RabbitMQ client support
RAGManager/main.py Integrated RabbitMQ consumer startup in daemon thread; added health check endpoints
RAGManager/app/workers/pdf_processor_consumer.py New consumer implementation for processing MinIO events and triggering PDF pipeline
RAGManager/app/workers/init.py New workers package initialization file
RAGManager/app/core/rabbitmq.py New RabbitMQ connection handler with connection management and message consumption
RAGManager/app/core/config.py Added RabbitMQ configuration settings and connection URL property with credential encoding

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +100 to +120
logger.info("Starting PDF processor consumer")

try:
# Create RabbitMQ connection
rabbitmq = RabbitMQConnection()
rabbitmq.connect()

# Start consuming messages
queue_name = settings.rabbitmq_queue_name
logger.info(f"Consuming messages from queue: {queue_name}")

rabbitmq.consume_messages(
queue_name=queue_name,
callback=message_callback
)

except KeyboardInterrupt:
logger.info("Consumer interrupted by user")
except Exception as e:
logger.error(f"Fatal error in consumer: {e}", exc_info=True)
raise
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start_consumer function does not handle connection failures gracefully after startup. If the RabbitMQ connection drops during operation, the consumer will crash and the exception will be raised in the daemon thread, but there's no mechanism to restart it. Consider implementing automatic reconnection logic with exponential backoff to make the consumer more resilient to network issues.

Copilot uses AI. Check for mistakes.
url = settings.rabbitmq_url
logger.info(f"Connecting to RabbitMQ at {settings.rabbitmq_host}:{settings.rabbitmq_port}")
logger.debug(
f"RabbitMQ URL: amqp://{settings.rabbitmq_user}:***@{settings.rabbitmq_host}:{settings.rabbitmq_port}/"
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RabbitMQ connection credentials are logged in plain debug output which could expose sensitive information if debug logging is enabled in production. The password should be redacted in all log messages. Consider using '***' for all credential fields in the debug output, not just the password in line 26.

Suggested change
f"RabbitMQ URL: amqp://{settings.rabbitmq_user}:***@{settings.rabbitmq_host}:{settings.rabbitmq_port}/"
f"RabbitMQ URL: amqp://***:***@{settings.rabbitmq_host}:{settings.rabbitmq_port}/"

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +34
consumer_thread = threading.Thread(target=start_consumer, daemon=True)
consumer_thread.start()
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RabbitMQ consumer is started in a daemon thread without any error handling or graceful shutdown mechanism. If the consumer thread crashes or the connection fails after startup, the application will continue running but will silently stop processing messages. Consider storing the thread reference and implementing proper error monitoring or a shutdown event handler to detect and log consumer failures.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +35
consumer_thread = threading.Thread(target=start_consumer, daemon=True)
consumer_thread.start()
logging.info("RabbitMQ consumer started successfully")
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RabbitMQ connection is not being closed properly on application shutdown. The consumer thread is a daemon thread that will be forcefully terminated when the main process exits, potentially leaving the RabbitMQ connection open. Consider implementing a shutdown event handler that calls rabbitmq.close() to ensure graceful cleanup of resources.

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +73
# Set QoS to process one message at a time
self.channel.basic_qos(prefetch_count=1)
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The QoS setting prefetch_count=1 will process messages sequentially one at a time. While this ensures message ordering and prevents overwhelming the system, it may be inefficient for the PDF processing workload. Consider whether parallel processing of multiple PDFs would be beneficial, and if so, increase the prefetch_count or run multiple consumer instances.

Suggested change
# Set QoS to process one message at a time
self.channel.basic_qos(prefetch_count=1)
# Set QoS prefetch count (default to processing one message at a time)
prefetch_count = getattr(settings, "rabbitmq_prefetch_count", 1)
logger.info(f"Setting RabbitMQ QoS prefetch_count={prefetch_count}")
self.channel.basic_qos(prefetch_count=prefetch_count)

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +51
def message_callback(ch, method, properties, body):
"""
Callback function to process RabbitMQ messages.

Args:
ch: Channel
method: Method
properties: Properties
body: Message body (bytes)
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message callback function has generic parameter names (ch, method, properties, body) that don't provide clear type information. While these are conventional in pika, consider adding type hints to improve code clarity and enable better IDE support. For example: ch: pika.channel.Channel, method: pika.spec.Basic.Deliver, properties: pika.spec.BasicProperties, body: bytes.

Copilot uses AI. Check for mistakes.
@property
def rabbitmq_url(self) -> str:
"""Returns the RabbitMQ connection URL with URL-encoded credentials."""
from urllib.parse import quote_plus
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rabbitmq_url property imports urllib.parse.quote_plus inside the method. This import should be moved to the module level for better performance, as the import will be executed every time the property is accessed. Consider moving it to the top of the file with other imports.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +76
document_id = process_pdf_pipeline(pdf_path)
logger.info(f"PDF processed successfully: {pdf_path} -> Document ID: {document_id}")

# Acknowledge the message
ch.basic_ack(delivery_tag=method.delivery_tag)
logger.info(f"Message acknowledged for: {pdf_path}")
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback function acknowledges messages even when the PDF processing pipeline only returns a mock document_id (when NotImplementedError is caught). This means messages will be permanently removed from the queue even though the processing is not fully complete. Consider implementing a dead-letter queue or requeuing strategy for partial failures until the pipeline is fully implemented.

Copilot uses AI. Check for mistakes.
Comment on lines +86 to +90
except Exception as e:
logger.error(f"Error processing message: {e}", exc_info=True)
# NACK without requeue to avoid infinite loops
# In production, consider implementing a dead-letter queue
ch.basic_nack(delivery_tag=method.delivery_tag, requeue=False)
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consumer lacks retry logic or circuit breaker patterns. If the process_pdf_pipeline encounters transient failures (e.g., temporary database unavailability), the message will be discarded (NACK without requeue). Consider implementing exponential backoff retries for transient errors before giving up, or using a dead-letter exchange for failed messages that can be retried later.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,99 @@
import json
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'json' is not used.

Suggested change
import json

Copilot uses AI. Check for mistakes.
…' and enhance endpoint validation in MinIO client setup.
@JPAmorin JPAmorin merged commit 9b3b121 into main Dec 18, 2025
@JPAmorin JPAmorin deleted the feat/rag-rmq-integration branch December 18, 2025 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants