-
Notifications
You must be signed in to change notification settings - Fork 1
[CAI-470][CAI-604] t2 Lambda chatbot index #1647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…l into CAI-527-gemini2.5
…l into CAI-527-gemini2.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 55 out of 59 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
if SETTINGS.index_id: | ||
Settings.llm = llm | ||
Settings.embed_model = embed_model | ||
Settings.chunk_size = SETTINGS.chunk_size | ||
Settings.chunk_overlap = SETTINGS.chunk_overlap | ||
Settings.node_parser = SentenceSplitter( | ||
chunk_size=SETTINGS.chunk_size, | ||
chunk_overlap=SETTINGS.chunk_overlap, | ||
) | ||
|
||
redis_vector_store = RedisVectorStore( | ||
redis_client=REDIS_CLIENT, overwrite=False, schema=REDIS_SCHEMA | ||
) | ||
|
||
LOGGER.info("Loading vector index from Redis...") | ||
storage_context = StorageContext.from_defaults( | ||
vector_store=redis_vector_store, | ||
docstore=REDIS_DOCSTORE, | ||
index_store=REDIS_INDEX_STORE, | ||
) | ||
|
||
index = load_index_from_storage( | ||
storage_context=storage_context, index_id=SETTINGS.index_id | ||
) | ||
|
||
return index | ||
else: | ||
raise ValueError( | ||
"No index_id provided or the index_id provided is wrong. Please check out SETTINGS.index_id in your configuration." | ||
) |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message should be more specific about what constitutes a 'wrong' index_id. Consider clarifying whether it means the index doesn't exist or the ID format is invalid.
if SETTINGS.index_id: | |
Settings.llm = llm | |
Settings.embed_model = embed_model | |
Settings.chunk_size = SETTINGS.chunk_size | |
Settings.chunk_overlap = SETTINGS.chunk_overlap | |
Settings.node_parser = SentenceSplitter( | |
chunk_size=SETTINGS.chunk_size, | |
chunk_overlap=SETTINGS.chunk_overlap, | |
) | |
redis_vector_store = RedisVectorStore( | |
redis_client=REDIS_CLIENT, overwrite=False, schema=REDIS_SCHEMA | |
) | |
LOGGER.info("Loading vector index from Redis...") | |
storage_context = StorageContext.from_defaults( | |
vector_store=redis_vector_store, | |
docstore=REDIS_DOCSTORE, | |
index_store=REDIS_INDEX_STORE, | |
) | |
index = load_index_from_storage( | |
storage_context=storage_context, index_id=SETTINGS.index_id | |
) | |
return index | |
else: | |
raise ValueError( | |
"No index_id provided or the index_id provided is wrong. Please check out SETTINGS.index_id in your configuration." | |
) | |
def _is_valid_index_id(index_id: str) -> bool: | |
# Example: index_id must be non-empty and alphanumeric (customize as needed) | |
return bool(index_id) and index_id.isidentifier() | |
def _index_exists(index_id: str) -> bool: | |
# Check if the index exists in Redis by looking for its schema | |
# This assumes the index name is stored as SETTINGS.index_id | |
try: | |
# FT._LIST returns all index names; FT.INFO returns info for a specific index | |
# We'll use FT.INFO to check existence | |
return REDIS_CLIENT.execute_command("FT.INFO", index_id) is not None | |
except Exception: | |
return False | |
if not SETTINGS.index_id: | |
raise ValueError( | |
"No index_id provided. Please set SETTINGS.index_id in your configuration." | |
) | |
if not _is_valid_index_id(SETTINGS.index_id): | |
raise ValueError( | |
f"Invalid index_id format: '{SETTINGS.index_id}'. The index_id must be a valid identifier (alphanumeric and underscores)." | |
) | |
if not _index_exists(SETTINGS.index_id): | |
raise ValueError( | |
f"Index with id '{SETTINGS.index_id}' does not exist in Redis. Please ensure the index is created and available." | |
) | |
Settings.llm = llm | |
Settings.embed_model = embed_model | |
Settings.chunk_size = SETTINGS.chunk_size | |
Settings.chunk_overlap = SETTINGS.chunk_overlap | |
Settings.node_parser = SentenceSplitter( | |
chunk_size=SETTINGS.chunk_size, | |
chunk_overlap=SETTINGS.chunk_overlap, | |
) | |
redis_vector_store = RedisVectorStore( | |
redis_client=REDIS_CLIENT, overwrite=False, schema=REDIS_SCHEMA | |
) | |
LOGGER.info("Loading vector index from Redis...") | |
storage_context = StorageContext.from_defaults( | |
vector_store=redis_vector_store, | |
docstore=REDIS_DOCSTORE, | |
index_store=REDIS_INDEX_STORE, | |
) | |
index = load_index_from_storage( | |
storage_context=storage_context, index_id=SETTINGS.index_id | |
) | |
return index |
Copilot uses AI. Check for mistakes.
|
||
PRODUCTS = get_product_list() | ||
LOGGER = get_logger(__name__) | ||
PRODUCTS = get_product_list() + ["api", "webinars"] |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coding 'api' and 'webinars' in the PRODUCTS list makes it difficult to maintain. Consider moving these to configuration or making them configurable through environment variables.
PRODUCTS = get_product_list() + ["api", "webinars"] | |
PRODUCTS = get_product_list() + getattr(SETTINGS, "EXTRA_PRODUCTS", []) |
Copilot uses AI. Check for mistakes.
except Exception as e: | ||
LOGGER.warning( | ||
f"File {object_key} not in metadata files. Skipping because {e}" | ||
) | ||
continue |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a broad exception handler makes debugging difficult. Consider catching specific exceptions like ValueError or IndexError to provide more targeted error handling.
Copilot uses AI. Check for mistakes.
LOGGER.info(f"sqs response: {sqs_response}") | ||
if sqs_queue_evaluate is None: | ||
LOGGER.warning( | ||
f"sqs_queue_evaluate is None, cannot send message {evaluation_data}" |
Copilot
AI
Oct 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging evaluation_data in the warning message could expose sensitive information. Consider logging only non-sensitive metadata or a summary instead of the full evaluation data.
f"sqs_queue_evaluate is None, cannot send message {evaluation_data}" | |
f"sqs_queue_evaluate is None, cannot send message. trace_id={evaluation_data.get('trace_id')}, num_messages={len(evaluation_data.get('messages', []) if evaluation_data.get('messages') else [])}" |
Copilot uses AI. Check for mistakes.
Branch is not up to date with base branch@mdciri it seems this Pull Request is not updated with base branch. |
Jira Pull Request LinkThis Pull Request refers to the following Jira issue CAI-470 |
This PR exceeds the recommended size of 800 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
List of Changes
chatbot-index
folderMotivation and Context
The aim is to add, delete, or update only the necessary web pages in the vector index and do not create a new one from scratch all the times.
How Has This Been Tested?
In the development environment
Screenshots (if appropriate):
Types of changes
Checklist: