Merge branch 'open-edge-platform:main' into rm_datastore

hteeyeoh · web-flow · commit fe5263ea2f36 · 2025-05-06T14:14:45.000+08:00
diff --git a/README.md b/README.md
@@ -26,7 +26,7 @@ Key components of the **Edge AI Libraries**:
 | [Model Registry](microservices/model-registry) | Microservice | [Link](microservices/model-registry/docs/user-guide/get-started.md) | [API Reference](microservices/model-registry/docs/user-guide/api-docs/openapi.yaml) |
 | [Intel® Geti™](https://github.com/open-edge-platform/geti) | Tool | [Link](https://geti.intel.com/) | [Docs](https://docs.geti.intel.com) |
 | [Visual Pipeline and Performance Evaluation Tool](tools/visual-pipeline-and-platform-evaluation-tool) | Tool | [Link](tools/visual-pipeline-and-platform-evaluation-tool/docs/user-guide/get-started.md) | [Build](tools/visual-pipeline-and-platform-evaluation-tool/docs/user-guide/how-to-build-source.md) instructions |
-| [Chat Question and Answer](sample-applications/chat-question-and-answer) | Sample Application |  [Link](sample-applications/chat-question-and-answer-core/docs/user-guide/get-started.md) | [Build](sample-applications/chat-question-and-answer/docs/user-guide/build-from-source.md) instructions |
+| [Chat Question and Answer](sample-applications/chat-question-and-answer) | Sample Application |  [Link](sample-applications/chat-question-and-answer/docs/user-guide/get-started.md) | [Build](sample-applications/chat-question-and-answer/docs/user-guide/build-from-source.md) instructions |
 | [Chat Question and Answer Core](sample-applications/chat-question-and-answer-core) | Sample Application | [Link](sample-applications/chat-question-and-answer-core/docs/user-guide/get-started.md) | [Build](sample-applications/chat-question-and-answer-core/docs/user-guide/build-from-source.md) instructions |
 
 
diff --git a/sample-applications/chat-question-and-answer-core/docs/user-guide/build-from-source.md b/sample-applications/chat-question-and-answer-core/docs/user-guide/build-from-source.md
@@ -84,12 +84,19 @@ You should see entries for both `chatqna` and `chatqna-ui`.
 ## Running the Application Container
 After building the images for the `Chat Question-and-Answer Core` application, you can run the application container using `docker compose` by following these steps:
 
-1. Start the Docker containers with the previously built images:
+1. **Set Up Environment Variables**:   
+      ```bash
+       export HUGGINGFACEHUB_API_TOKEN=<your-huggingface-token>
+       source scripts/setup_env.sh
+      ```
+   Configure the models to be used (LLM, Embeddings, Rerankers) in the `scripts/setup_env.sh` as needed. Refer to and use   the same  list of models as documented in [Chat Question-and-Answer](../../../chat-question-and-answer/docs/user-guide/get-started.md#supported-models). 
+
+2. Start the Docker containers with the previously built images:
    ```bash
    docker compose -f docker/compose.yaml up
    ```
 
-2. Access the application:
+3. Access the application:
    - Open your web browser and navigate to `http://<host-ip>:5173` to view the application dashboard.
 
 ## Verification
diff --git a/sample-applications/chat-question-and-answer/docs/user-guide/get-started.md b/sample-applications/chat-question-and-answer/docs/user-guide/get-started.md
@@ -41,7 +41,7 @@ The sample application has been validated with a few models just to validate the
 ### LLM Models validated for each model server
 | Model Server | Models Validated |
    |--------------|-------------------|
-   | `TEI` | `Intel/neural-chat-7b-v3-3`, `Qwen/Qwen2.5-7B-Instruct`, `microsoft/Phi-3.5-mini-instruct`, `meta-llama/Llama-3.1-8B-instruct`, `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
+   | `vLLM` | `Intel/neural-chat-7b-v3-3`, `Qwen/Qwen2.5-7B-Instruct`, `microsoft/Phi-3.5-mini-instruct`, `meta-llama/Llama-3.1-8B-instruct`, `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
    | `OVMS` | `Intel/neural-chat-7b-v3-3`, `Qwen/Qwen2.5-7B-Instruct`, `microsoft/Phi-3.5-mini-instruct`, `meta-llama/Llama-3.1-8B-instruct`, `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
    | `TGI` | `Intel/neural-chat-7b-v3-3`, `Qwen/Qwen2.5-7B-Instruct`, `microsoft/Phi-3.5-mini-instruct`, `meta-llama/Llama-3.1-8B-instruct`, `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
 
diff --git a/sample-applications/chat-question-and-answer/docs/user-guide/overview-architecture.md b/sample-applications/chat-question-and-answer/docs/user-guide/overview-architecture.md
@@ -33,7 +33,7 @@ ChatQ&A application is a combination of the core LangChain application logic tha
 ### Application Flow
 
 1. **Input Sources**:
-   - **Documents**: The document ingestion microservice supports ingesting from various document formats. Supported formats are word and pdf.
+   - **Documents**: The document ingestion microservice supports ingesting documents in various formats. Supported formats are word and pdf.    
    - **Web pages**: Contents of accessible web pages can also be parsed and used as input for the RAG pipeline.
 2. **Create the context**
    - **Upload input documents and web links**: The UI microservice allows the developer to interact with the ChatQ&A backend. It provides the interface to upload the documents and weblinks on which the RAG pipeline will be executed. The documents are uploaded and stored in object store. MinIO is the database used for object store.
@@ -66,12 +66,12 @@ The application flow is illustrated in the flow diagram below. The diagram shows
 
 2. **Document ingestion microservice**:
    - **What it is**: Document ingestion microservice provides capability to ingest contents from documents and web links, create the necessary context, and retrieve the right context based on user query.
-   - **How it's used**: Document ingestion microservice provides a REST API endpoint that can be used to manage the contents. The ChatQ&A backend uses this API to access its capabilities.
-   - **Benefits**: The core part of the document ingestion microservice is the vector handling capability which is optimized for target deployment hardware. Selection of the vectorDB is based on performance considerations. Rest of the document ingestion microservice can be treated as sample reference implementaiton.
+   - **How it's used**: Document ingestion microservice provides a `documents` REST API endpoint that can be used to manage the contents. The ChatQ&A backend uses this API to access its capabilities.
+   - **Benefits**: The core part of the document ingestion microservice is the vector handling capability which is optimized for target deployment hardware. Selection of the vectorDB is based on performance considerations. Rest of the document ingestion microservice can be treated as sample reference implementation.
 
 3. **ChatQ&A backend microservice**:
    - **What it is**: ChatQ&A backend microservice is a LangChain based implementation of ChatQ&A RAG pipeline providing required handling of the user queries.
-   - **How it’s used**: A REST API endpoint is provided which is used by the UI front end to send user queries and trigger the RAG pipeline.
+   - **How it’s used**: A `streamlog` REST API endpoint is provided which is used by the UI front end to send user queries and trigger the RAG pipeline.
    - **Benefits**: The microservice provides a reference of how LangChain framework is used to implement ChatQ&A using Intel Edge AI inference microservices.
 
 4. **ChatQ&A UI**:
diff --git a/sample-applications/chat-question-and-answer/docs/user-guide/overview.md b/sample-applications/chat-question-and-answer/docs/user-guide/overview.md
@@ -47,8 +47,6 @@ Refer to the [Get Started](./get-started.md) page to get started with the sample
 
 2. **Generation [Q&A]**: This part allows the user to query the document database and generate responses. The LLM inference microservice, embedding inference microservice, and reranking microservice work together to provide accurate and efficient answers to user queries. When a user submits a question, the embedding model hosted by the chosen model serving (default is OVMS) transforms it into an embedding, enabling semantic comparison with stored document embeddings. The vector database searches for relevant embeddings, returning a ranked list of documents based on semantic similarity. The LLM Inference Microservice generates a context-aware response from the final set of documents. It is possible to use any supported models to run with the applications. Detailed documentation provides full information on validated models and models supported overall.
 
-Further details on the system architecture and customizable options are available [here](./overview-architecture.md).
-
 Detailed hardware and software requirements are available [here](./system-requirements.md).
 
 [This sample application is ready for deployment with Edge Orchestrator. Download the deployment package and follow the instructions](deploy-with-edge-orchestrator.md)
diff --git a/sample-applications/chat-question-and-answer/setup.sh b/sample-applications/chat-question-and-answer/setup.sh
@@ -13,6 +13,13 @@ export INDEX_NAME=intel-rag
 export EMBEDDING_ENDPOINT_URL=http://tei-embedding-service
 #Setup the host IP
 export HOST_IP=$(hostname -I | awk '{print $1}')
+# The above command does not work on EMT. Two options:
+# 1. Check with:
+#    ip -o route get to 8.8.8.8 | sed -n 's/.*src \([0-9.]\+\).*/\1/p'
+#    But this approach could also have an issue based on kind of 
+#    deployment (airgapped or not). Need to check for a better solution.
+#    IP address of 8.8.8.8 is Google address.
+# 2. Eliminate the need for hostname.
 
 # UI ENV variables
 export APP_ENDPOINT_URL=http://$HOST_IP:8100
diff --git a/tools/visual-pipeline-and-platform-evaluation-tool/Dockerfile.vippet b/tools/visual-pipeline-and-platform-evaluation-tool/Dockerfile.vippet
@@ -32,6 +32,6 @@ RUN pip install -r requirements.txt
 
 ADD diagrams/ /home/dlstreamer/vippet/diagrams
 
-ADD app.py collect.py optimize.py pipeline.py device.py /home/dlstreamer/vippet/
+ADD app.py collect.py optimize.py pipeline.py device.py explore.py /home/dlstreamer/vippet/
 
 CMD ["python", "app.py"]
diff --git a/tools/visual-pipeline-and-platform-evaluation-tool/app.py b/tools/visual-pipeline-and-platform-evaluation-tool/app.py
@@ -14,6 +14,7 @@
 from optimize import OptimizationResult, PipelineOptimizer
 from pipeline import SmartNVRPipeline, Transportation2Pipeline
 from device import DeviceDiscovery
+from explore import GstInspector
 
 css_code = """
 
@@ -93,6 +94,7 @@
 # pipeline = Transportation2Pipeline()
 pipeline = SmartNVRPipeline()
 device_discovery = DeviceDiscovery()
+gst_inspector = GstInspector()
 
 # Download File
 def download_file(url, local_filename):
@@ -576,6 +578,7 @@ def on_run(
                         constants=constants,
                         param_grid=param_grid,
                         channels=(recording_channels, inferencing_channels),
+                        elements=gst_inspector.get_elements(),
                     )
                     collector.collect()
                     time.sleep(3)
diff --git a/tools/visual-pipeline-and-platform-evaluation-tool/explore.py b/tools/visual-pipeline-and-platform-evaluation-tool/explore.py
@@ -0,0 +1,67 @@
+import subprocess
+from threading import Lock
+
+class GstInspector:
+    """
+    A singleton class to inspect GStreamer elements using gst-inspect-1.0.
+    This class provides a method to retrieve the list of GStreamer elements
+    and their descriptions.
+
+    These is an example of the output from the command:
+
+    videoanalytics:  gvaclassify: Object classification (requires GstVideoRegionOfInterestMeta on input)
+    videoanalytics:  gvadetect: Object detection (generates GstVideoRegionOfInterestMeta)
+    videoanalytics:  gvainference: Generic full-frame inference (generates GstGVATensorMeta)
+
+    Those elements will be returned in a list of tuples:
+
+    [
+        ("videoanalytics", "gvaclassify", "<description>"),
+        ("videoanalytics", "gvadetect", "<description>"),
+        ("videoanalytics", "gvainference", "<description>")
+    ]
+    """
+    _instance = None
+    _lock = Lock()
+
+    def __new__(cls, *args, **kwargs):
+        with cls._lock:
+            if cls._instance is None:
+                cls._instance = super(GstInspector, cls).__new__(cls)
+                cls._instance._initialize()
+        return cls._instance
+
+    def _initialize(self):
+        self.elements = self._get_gst_elements()
+
+    def _get_gst_elements(self):
+        try:
+            result = subprocess.run(
+                ["gst-inspect-1.0"],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                text=True,
+                check=True
+            )
+            lines = result.stdout.splitlines()
+            elements = []
+            for line in lines:
+                if ":  " in line:
+                    plugin, rest = line.split(":  ", 1)
+                    if ": " in rest:
+                        element, description = rest.split(": ", 1)
+                        elements.append((plugin.strip(), element.strip(), description.strip()))
+
+            return sorted(elements)
+
+        except subprocess.CalledProcessError as e:
+            print(f"Error running gst-inspect-1.0: {e}")
+            return []
+
+    def get_elements(self):
+        return self.elements
+
+if __name__ == "__main__":
+    inspector = GstInspector()
+    for element in inspector.get_elements():
+        print(element)
diff --git a/tools/visual-pipeline-and-platform-evaluation-tool/optimize.py b/tools/visual-pipeline-and-platform-evaluation-tool/optimize.py
@@ -31,13 +31,15 @@ def __init__(
         param_grid: Dict[str, List[str]],
         poll_interval: int = 1,
         channels: int | tuple[int, int] = 1,
+        elements: List[tuple[str, str, str]] = [],
     ):
 
         # Initialize class variables
         self.pipeline = pipeline
         self.constants = constants
         self.param_grid = param_grid
         self.poll_interval = poll_interval
+        self.elements = elements
 
         # Set the number of channels
         self.channels = (
@@ -65,29 +67,12 @@ def _iterate_param_grid(self, param_grid: Dict[str, List[str]]):
 
     def optimize(self):
 
-        # Run gst-inspect-1.0 to get the list of elements
-        process = Popen(["gst-inspect-1.0", "va"], stdout=PIPE, stderr=PIPE)
-        elements = process.communicate()[0].decode("utf-8").split("\n")
-
-        # Log the elements
-        self.logger.info("Elements:")
-        self.logger.info(pprint.pformat(elements))
-
-        # Find the available encoder
-        # Note that the selected encoder is the last one on the list.
-        # This is usually vah264lpenc if the encoder is available.
-        # Otherwise, fallback to the only available encoder, usually vah264enc.
-        encoder = [element for element in elements if "vah264enc" in element or "vah264lpenc" in element][-1]
-        encoder = encoder.split(":")[0].strip()
-
-        # Log the encoder
-        self.logger.info(f"Encoder: {encoder}")
 
         for params in self._iterate_param_grid(self.param_grid):
 
             # Evaluate the pipeline with the given parameters, constants, and channels
             _pipeline = self.pipeline.evaluate(
-                self.constants, params, self.regular_channels, self.inference_channels, encoder
+                self.constants, params, self.regular_channels, self.inference_channels, self.elements
             )
 
             # Log the command
diff --git a/tools/visual-pipeline-and-platform-evaluation-tool/pipeline.py b/tools/visual-pipeline-and-platform-evaluation-tool/pipeline.py