Skip to content

Conversation

mdciri
Copy link
Collaborator

@mdciri mdciri commented Aug 29, 2025

List of Changes

  • creation of the chatbot-index folder
  • creation of modules and lambda function for the vector index updating
  • update of the github action for the vector index creation

Motivation and Context

The aim is to add, delete, or update only the necessary web pages in the vector index and do not create a new one from scratch all the times.

How Has This Been Tested?

In the development environment

Screenshots (if appropriate):

Types of changes

  • Chore (nothing changes by a user perspective)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 55 out of 59 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +72 to +101
if SETTINGS.index_id:
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = SETTINGS.chunk_size
Settings.chunk_overlap = SETTINGS.chunk_overlap
Settings.node_parser = SentenceSplitter(
chunk_size=SETTINGS.chunk_size,
chunk_overlap=SETTINGS.chunk_overlap,
)

redis_vector_store = RedisVectorStore(
redis_client=REDIS_CLIENT, overwrite=False, schema=REDIS_SCHEMA
)

LOGGER.info("Loading vector index from Redis...")
storage_context = StorageContext.from_defaults(
vector_store=redis_vector_store,
docstore=REDIS_DOCSTORE,
index_store=REDIS_INDEX_STORE,
)

index = load_index_from_storage(
storage_context=storage_context, index_id=SETTINGS.index_id
)

return index
else:
raise ValueError(
"No index_id provided or the index_id provided is wrong. Please check out SETTINGS.index_id in your configuration."
)
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should be more specific about what constitutes a 'wrong' index_id. Consider clarifying whether it means the index doesn't exist or the ID format is invalid.

Suggested change
if SETTINGS.index_id:
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = SETTINGS.chunk_size
Settings.chunk_overlap = SETTINGS.chunk_overlap
Settings.node_parser = SentenceSplitter(
chunk_size=SETTINGS.chunk_size,
chunk_overlap=SETTINGS.chunk_overlap,
)
redis_vector_store = RedisVectorStore(
redis_client=REDIS_CLIENT, overwrite=False, schema=REDIS_SCHEMA
)
LOGGER.info("Loading vector index from Redis...")
storage_context = StorageContext.from_defaults(
vector_store=redis_vector_store,
docstore=REDIS_DOCSTORE,
index_store=REDIS_INDEX_STORE,
)
index = load_index_from_storage(
storage_context=storage_context, index_id=SETTINGS.index_id
)
return index
else:
raise ValueError(
"No index_id provided or the index_id provided is wrong. Please check out SETTINGS.index_id in your configuration."
)
def _is_valid_index_id(index_id: str) -> bool:
# Example: index_id must be non-empty and alphanumeric (customize as needed)
return bool(index_id) and index_id.isidentifier()
def _index_exists(index_id: str) -> bool:
# Check if the index exists in Redis by looking for its schema
# This assumes the index name is stored as SETTINGS.index_id
try:
# FT._LIST returns all index names; FT.INFO returns info for a specific index
# We'll use FT.INFO to check existence
return REDIS_CLIENT.execute_command("FT.INFO", index_id) is not None
except Exception:
return False
if not SETTINGS.index_id:
raise ValueError(
"No index_id provided. Please set SETTINGS.index_id in your configuration."
)
if not _is_valid_index_id(SETTINGS.index_id):
raise ValueError(
f"Invalid index_id format: '{SETTINGS.index_id}'. The index_id must be a valid identifier (alphanumeric and underscores)."
)
if not _index_exists(SETTINGS.index_id):
raise ValueError(
f"Index with id '{SETTINGS.index_id}' does not exist in Redis. Please ensure the index is created and available."
)
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = SETTINGS.chunk_size
Settings.chunk_overlap = SETTINGS.chunk_overlap
Settings.node_parser = SentenceSplitter(
chunk_size=SETTINGS.chunk_size,
chunk_overlap=SETTINGS.chunk_overlap,
)
redis_vector_store = RedisVectorStore(
redis_client=REDIS_CLIENT, overwrite=False, schema=REDIS_SCHEMA
)
LOGGER.info("Loading vector index from Redis...")
storage_context = StorageContext.from_defaults(
vector_store=redis_vector_store,
docstore=REDIS_DOCSTORE,
index_store=REDIS_INDEX_STORE,
)
index = load_index_from_storage(
storage_context=storage_context, index_id=SETTINGS.index_id
)
return index

Copilot uses AI. Check for mistakes.


PRODUCTS = get_product_list()
LOGGER = get_logger(__name__)
PRODUCTS = get_product_list() + ["api", "webinars"]
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding 'api' and 'webinars' in the PRODUCTS list makes it difficult to maintain. Consider moving these to configuration or making them configurable through environment variables.

Suggested change
PRODUCTS = get_product_list() + ["api", "webinars"]
PRODUCTS = get_product_list() + getattr(SETTINGS, "EXTRA_PRODUCTS", [])

Copilot uses AI. Check for mistakes.

Comment on lines +102 to +106
except Exception as e:
LOGGER.warning(
f"File {object_key} not in metadata files. Skipping because {e}"
)
continue
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a broad exception handler makes debugging difficult. Consider catching specific exceptions like ValueError or IndexError to provide more targeted error handling.

Copilot uses AI. Check for mistakes.

LOGGER.info(f"sqs response: {sqs_response}")
if sqs_queue_evaluate is None:
LOGGER.warning(
f"sqs_queue_evaluate is None, cannot send message {evaluation_data}"
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging evaluation_data in the warning message could expose sensitive information. Consider logging only non-sensitive metadata or a summary instead of the full evaluation data.

Suggested change
f"sqs_queue_evaluate is None, cannot send message {evaluation_data}"
f"sqs_queue_evaluate is None, cannot send message. trace_id={evaluation_data.get('trace_id')}, num_messages={len(evaluation_data.get('messages', []) if evaluation_data.get('messages') else [])}"

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Branch is not up to date with base branch

@mdciri it seems this Pull Request is not updated with base branch.
Please proceed with a merge or rebase to solve this.

Copy link
Contributor

github-actions bot commented Oct 16, 2025

Jira Pull Request Link

This Pull Request refers to the following Jira issue CAI-470

Copy link
Contributor

This PR exceeds the recommended size of 800 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants