Skip to content

[CAI-741][CAI-762] Chatbot-index creates new indexes on Redis#2008

Open
mdciri wants to merge 38 commits intomainfrom
CAI-741-refactor-for-new-index
Open

[CAI-741][CAI-762] Chatbot-index creates new indexes on Redis#2008
mdciri wants to merge 38 commits intomainfrom
CAI-741-refactor-for-new-index

Conversation

@mdciri
Copy link
Collaborator

@mdciri mdciri commented Feb 9, 2026

List of Changes

This pull request introduces several enhancements and new features to the chatbot indexer, with a focus on supporting structured data. The most important changes include adding support for structured document indexing, refactoring the document ingestion and index creation workflow to be more flexible.

  • Added support for indexing structured documents by introducing a new get_structured_docs() function and updating the main document ingestion logic to accept flags for including static, dynamic, API, and structured data.
  • Refactored the index creation process to use command-line arguments for specifying which types of data to include, and updated the GitHub Actions workflow and composite action to pass these options through inputs.
  • Enhanced the Redis index build process to allow for selective cleaning of index data, and made the schema and index ID configurable via environment variables and parameters.

Motivation and Context

This refactor should allow the creation of new vector indexes on Redis

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Chore (nothing changes by a user perspective)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

mdciri and others added 8 commits February 5, 2026 16:23
* Add package.json to parser app

* Add changeset, update package-lock

* Update apps/parser/package.json

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add package.json to parser app

* Add changeset, update package-lock

* Update apps/parser/package.json

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add puppeteer dependency in parser app

* Add changeset

* Update .changeset/three-cups-change.md

Co-authored-by: marcobottaro <39835990+marcobottaro@users.noreply.github.com>

* Update apps/parser/package.json

Co-authored-by: marcobottaro <39835990+marcobottaro@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: marcobottaro <39835990+marcobottaro@users.noreply.github.com>
@changeset-bot
Copy link

changeset-bot bot commented Feb 9, 2026

🦋 Changeset detected

Latest commit: 6c65f57

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
chatbot-index Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the chatbot indexer to support configurable, selective document ingestion (including structured docs) and Redis index creation/cleanup, while updating GitHub Actions inputs to drive the new CLI flags.

Changes:

  • Added structured document ingestion and flag-based document selection for indexing.
  • Refactored Redis index schema/creation/loading to support configurable index_id and targeted cleanup.
  • Updated workflow/action wiring and added Terraform skill documentation files.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
apps/parser/package.json Adds a new parser app package manifest with Puppeteer dependency.
apps/chatbot-index/src/modules/vector_index.py Introduces schema factory + parameterized index build/load/cleanup behavior.
apps/chatbot-index/src/modules/settings.py Switches index_id configuration to environment variable.
apps/chatbot-index/src/modules/documents.py Adds structured docs ingestion and flags to select doc sources.
apps/chatbot-index/src/modules/create_vector_index.py Adds CLI flags to choose which document types to include and whether to clean Redis.
apps/chatbot-index/src/modules/codec.py Adds helper to safely parse JSON-encoded strings.
apps/chatbot-index/config/params.yaml Removes index_id from params configuration.
.github/workflows/chatbot_create_index.yaml Adds workflow input to choose indexing mode; passes inputs to composite action.
.github/actions/chatbot/action.yaml Adds composite action inputs and passes corresponding CLI flags to indexer.
.github/skills/terraform-style-guide/SKILL.md Adds Terraform style guide documentation.
.github/skills/terraform-refactor-module/SKILL.md Adds Terraform module refactor skill documentation.
.changeset/three-cups-change.md Changeset for parser Puppeteer addition.
.changeset/eager-colts-smile.md Changeset for chatbot-index structured docs support.
.changeset/cuddly-pumas-cross.md Changeset for creating parser app.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mdciri and others added 6 commits February 10, 2026 14:32
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@mdciri mdciri requested a review from Copilot February 10, 2026 13:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mdciri and others added 4 commits February 10, 2026 14:48
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…veloper-portal into CAI-741-refactor-for-new-index
mdciri and others added 2 commits February 10, 2026 15:00
Copy link
Collaborator

@batdevis batdevis Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use both AWS client and AWS resource for S3?

I think it is possible to use only AWS resourse.

@mdciri mdciri changed the title [CAI-741][CAI-762] Refactor of chatbot-index to create new indexes on Redis [CAI-741][CAI-762] Chatbot-index creates new indexes on Redis Feb 10, 2026
import argparse

from src.modules.logger import get_logger
from src.modules.vector_index import DiscoveryVectorIndex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiscoveryVectorIndex is not present in src.modules.vector_index

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an idea, you could rename the class : )

@mdciri mdciri requested review from batdevis and Copilot February 10, 2026 15:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 11, 2026

Jira Pull Request Link

This Pull Request refers to the following Jira issue CAI-741

@github-actions
Copy link
Contributor

Branch is not up to date with base branch

@batdevis it seems this Pull Request is not updated with base branch.
Please proceed with a merge or rebase to solve this.

@github-actions
Copy link
Contributor

This PR exceeds the recommended size of 800 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants