This demo is a more advanced version of the basic demo that is detailed in this article.
Features:
- Fully automated document ingestion pipeline, with two custom ingestion methods: Red Hat public documentation and generic document using Docling.
- Multiple models support, including running them in parallel to highlight differences in content and speed.
- Multi language support on the UI, the audio, and the generated answers.
- Patternfly 6 based UI, using the Chatbot component.
All the documents are ingested and stored in a Milvus database, using different elements and techniques described below.
The public Red Hat documentation is ingested using a custom parser and splitter. The goal is to get the most targeted and precise information as possible, removing all unnecessary information that would be extracted with a "non-content-aware process", like tables of contents, disclaimers, references links,... Although sometimes useful or even necessary when accessing or reading the document, they don't bring anything from a context retrieval perspective in a RAG solution.
More details and examples about this document parsing can be found in the rh-doc-splits-generation folder.
Another parser uses Docling to extract information from documents accessible through a public URL. They can be web pages, PDFs, etc.
This implementation uses a remote Docling service API, powered by Docling Serve, to perform the extraction.
To make it easier to manage the documentation elements that will be ingested, the collections folder holds their definition. They are simple JSON files indicating for each collection and version the documents sources and the ingestion mechanism to use (Red Hat doc ingester or docling). Documents that are common across different versions of a specific collection can also be added.
For each collection and version, a flag indicates whether it should be added, updated, or removed (see more details in the products-documentation-ingestion folder).
Whenever a change is merged into the main branch of this folder, a pipeline defined in tekton is triggered on the OpenShift cluster where the application is deployed. This pipeline handles creating, updating, or removing collections as needed.
The app backend is a simple FastAPI application that handles all the communications with the models servers, the vector database and the client.
The configuration is set in a single config.json file, which makes it easier to be kept in a Secret mounted at runtime for the deployment on OpenShift.
This is a Patternfly 6 application, connected to the backend through Websocket to receive and display the content streamed by the backend.
-
The application container image is available at https://quay.io/repository/rh-aiservices-bu/rh-kb-chat.
-
Deployment files examples are available in the Deployment folder.
-
An example configuration file for accessing the models and vector database is available here. Once modified with your own values, it must be created as a Secret with:
oc create secret generic kb-chatbot --from-file=config.json
-
To display the username at the top-right corner of the application and in chat messages, OAuth protection must be enabled via a sidecar container.
