Skip to content

Commit addede3

Browse files
bharaghahteeyeoh
andauthored
Fix for issues identified during RC1 validation (10Nov) (open-edge-platform#1209)
Co-authored-by: Hoong Tee, Yeoh <[email protected]>
1 parent 47628eb commit addede3

File tree

11 files changed

+60
-33
lines changed

11 files changed

+60
-33
lines changed

microservices/document-ingestion/pgvector/docker/compose-dev.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,8 @@ services:
99
http_proxy: ${http_proxy}
1010
https_proxy: ${https_proxy}
1111
no_proxy: ${no_proxy}
12-
image: intel/document-ingestion:1.2.0-dev
12+
#TODO: Configure image version as an env parameter
13+
image: intel/document-ingestion:1.2.2-dev
1314
environment:
1415
DEFAULT_BUCKET: "intel.gai.dev.test"
1516
OBJECT_PREFIX: "test"

microservices/document-ingestion/pgvector/docker/compose.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ services:
2424
http_proxy: ${http_proxy}
2525
https_proxy: ${https_proxy}
2626
no_proxy: ${no_proxy}
27-
image: intel/document-ingestion:1.2.0
27+
#TODO: Configure image version as an env parameter
28+
image: intel/document-ingestion:1.2.2
2829
environment:
2930
http_proxy: ${http_proxy}
3031
https_proxy: ${https_proxy}
@@ -41,8 +42,8 @@ services:
4142
MINIO_HOST: ${MINIO_HOST:-minio-server}
4243
MINIO_API_PORT: ${MINIO_API_PORT:-9000}
4344
# Raise error if following required env vars is not set
44-
MINIO_ACCESS_KEY: ${MINIO_ACCESS_KEY:?error}
45-
MINIO_SECRET_KEY: ${MINIO_SECRET_KEY:?error}
45+
MINIO_ROOT_USER: ${MINIO_USER:?error}
46+
MINIO_ROOT_PASSWORD: ${MINIO_PASSWD:?error}
4647
ports:
4748
- "${DATAPREP_HOST_PORT:-8000}:8000"
4849
depends_on:

microservices/document-ingestion/pgvector/docs/get-started.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ This method provides the fastest way to get started with the microservice.
9999
source ./run.sh --conf
100100
# This will output docker compose configs with all the environment variables resolved. The user can verify whether they are configured correctly.
101101
```
102+
The valid configuration will ensure the latest prebuilt image from `intel` registry is downloaded. The scripts take care of this.
102103
5. **Start the Microservices**:
103104
There are different options provided to start the microservices.
104105
```bash

microservices/document-ingestion/pgvector/run.sh

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ if [ "$1" = "--nosetup" ] && [ "$#" -eq 1 ]; then
118118

119119
# Verify the configuration of docker compose
120120
elif [ "$1" = "--conf" ] && [ "$#" -eq 1 ]; then
121-
docker compose config
121+
docker compose -f docker/compose.yaml config
122122

123123
# Stop and remove containers and networks (basic down)
124124
elif [ "$1" = "--down" ] && [ "$#" -eq 1 ]; then
@@ -184,28 +184,42 @@ elif [ "$#" -eq 0 ]; then
184184
# Remove all project-related Docker images
185185
elif [ "$1" = "--clean" ] && [ "$#" -eq 1 ]; then
186186
echo "Removing all ${PROJECT_NAME} related Docker images..."
187-
docker images --filter "label=project=${PROJECT_NAME}" -q | xargs -r docker rmi -f
188-
# Fallback: also remove legacy images that may not have labels
189-
docker images | grep "${PROJECT_NAME}" | awk '{print $3}' | xargs -r docker rmi -f
190-
docker images | grep "intel/document-ingestion" | awk '{print $3}' | xargs -r docker rmi -f
187+
188+
# Use docker compose to remove all images from services
189+
docker compose -f docker/compose.yaml down --rmi all 2>/dev/null || true
190+
191+
# Also remove dev environment images if exists
192+
if [ -f "docker/compose-dev.yaml" ]; then
193+
docker compose -f docker/compose.yaml -f docker/compose-dev.yaml down --rmi all 2>/dev/null || true
194+
fi
195+
196+
# Remove any remaining labeled images
197+
docker images --filter "label=project=${PROJECT_NAME}" -q | xargs -r docker rmi -f 2>/dev/null || true
198+
191199
echo "Cleanup completed!"
192200

193201
# Remove specific service image using labels
194202
elif [ "$1" = "--clean" ] && [ "$2" = "dataprep" ] && [ "$#" -eq 2 ]; then
195203
echo "Removing dataprep service images..."
196204
docker images --filter "label=project=${PROJECT_NAME}" --filter "label=service=dataprep" -q | xargs -r docker rmi -f
197205
# Fallback: also remove legacy images that may not have labels
198-
docker images | grep "intel/document-ingestion" | awk '{print $3}' | xargs -r docker rmi -f
206+
docker images | grep "intel/document-ingestion" | awk '{print $3}' | xargs -r docker rmi -f 2>/dev/null || true
199207
echo "Dataprep images removed!"
200208

201209
# Complete cleanup - stop containers, remove containers, networks, volumes, and images
202210
elif [ "$1" = "--purge" ] && [ "$#" -eq 1 ]; then
203211
echo "Performing complete cleanup..."
204-
docker compose -f docker/compose.yaml down --volumes --remove-orphans
205-
docker images --filter "label=project=${PROJECT_NAME}" -q | xargs -r docker rmi -f
206-
# Fallback cleanup for legacy images
207-
docker images | grep "${PROJECT_NAME}" | awk '{print $3}' | xargs -r docker rmi -f
208-
docker images | grep "intel/document-ingestion" | awk '{print $3}' | xargs -r docker rmi -f
212+
213+
# Stop everything and remove all resources including images
214+
docker compose -f docker/compose.yaml down --rmi all --volumes --remove-orphans 2>/dev/null || true
215+
216+
if [ -f "docker/compose-dev.yaml" ]; then
217+
docker compose -f docker/compose.yaml -f docker/compose-dev.yaml down --rmi all --volumes --remove-orphans 2>/dev/null || true
218+
fi
219+
220+
# Clean any remaining labeled images
221+
docker images --filter "label=project=${PROJECT_NAME}" -q | xargs -r docker rmi -f 2>/dev/null || true
222+
209223
echo "Complete cleanup finished!"
210224

211225
else

sample-applications/chat-question-and-answer/docs/user-guide/get-started.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,11 +41,13 @@ The sample application has been validated with a few models just to validate the
4141
### LLM Models validated for each model server
4242
| Model Server | Models Validated |
4343
|--------------|-------------------|
44-
| `vLLM` | `Intel/neural-chat-7b-v3-3`, `Qwen/Qwen2.5-7B-Instruct`, `microsoft/Phi-3.5-mini-instruct`, `meta-llama/Llama-3.1-8B-instruct`, `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
44+
| `vLLM` (deprecated) | `Intel/neural-chat-7b-v3-3`, `Qwen/Qwen2.5-7B-Instruct`, `microsoft/Phi-3.5-mini-instruct`, `meta-llama/Llama-3.1-8B-instruct`, `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
4545
| `OVMS` | `Intel/neural-chat-7b-v3-3`, `Qwen/Qwen2.5-7B-Instruct`, `microsoft/Phi-3.5-mini-instruct`, `meta-llama/Llama-3.1-8B-instruct`, `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
46-
| `TGI` | `Intel/neural-chat-7b-v3-3`, `Qwen/Qwen2.5-7B-Instruct`, `microsoft/Phi-3.5-mini-instruct`, `meta-llama/Llama-3.1-8B-instruct`, `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
46+
| `TGI` (deprecated) | `Intel/neural-chat-7b-v3-3`, `Qwen/Qwen2.5-7B-Instruct`, `microsoft/Phi-3.5-mini-instruct`, `meta-llama/Llama-3.1-8B-instruct`, `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` |
4747

48-
Note: Limited validation was done on DeepSeek model.
48+
**Note:**
49+
1. Limited validation was done on DeepSeek model.
50+
2. Effective 2025.2.0 release, support for vLLM and TGI is deprecated. The functionality is not guaranteed to work and the user is advised not to use them. Should there be a strong requirement for the same, please raise an issue in github.
4951

5052
### Reranker Models validated
5153
| Model Server | Models Validated |
@@ -98,7 +100,7 @@ Visit https://huggingface.co/settings/tokens to get your token.
98100
export LLM_MODEL=Qwen/Qwen2.5-7B-Instruct
99101
export EMBEDDING_MODEL_NAME=Alibaba-NLP/gte-large-en-v1.5
100102
export RERANKER_MODEL=BAAI/bge-reranker-base
101-
export DEVICE="CPU" #Options: CPU for VLLM and TGI. GPU is only enabled for openvino model server(OVMS) .
103+
export DEVICE="CPU" #Options: GPU is enabled for openvino model server(OVMS) .
102104
export OTLP_ENDPOINT_TRACE=<otlp-endpoint-trace> # Optional. Set only if there is an OTLP endpoint available or can be ignored
103105
export OTLP_ENDPOINT=<otlp-endpoint> # Optional. Set only if there is an OTLP endpoint available or can be ignored
104106
```
@@ -111,7 +113,7 @@ Visit https://huggingface.co/settings/tokens to get your token.
111113
export TAG=2.0.0
112114
source setup.sh llm=<model-server> embed=<embedding>
113115
# Below are the options
114-
# model-server: VLLM , OVMS, TGI
116+
# model-server: VLLM (deprecated), OVMS, TGI (deprecated)
115117
# embedding: OVMS, TEI
116118
```
117119

sample-applications/chat-question-and-answer/docs/user-guide/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Technical Architecture
3232

3333
The ChatQ&A sample application includes the following components:
3434

35-
- **LLM inference microservice**: Intel's optimized `OpenVINO Model Server (OVMS) <https://github.com/openvinotoolkit/model_server>`__ is used to efficiently run large language models on Intel hardware. Developers also have other model serving options if required. vLLM with OpenVINO backend and TGI are the options provided.
35+
- **LLM inference microservice**: Intel's optimized `OpenVINO Model Server (OVMS) <https://github.com/openvinotoolkit/model_server>`__ is used to efficiently run large language models on Intel hardware. Developers also have other model serving options if required. vLLM with OpenVINO backend and TGI are the options provided. (*Note: vLLM and TGI are deprecated effective 2025.2.0 release.*)
3636
- **Embedding inference microservice**: Intel's optimized `OpenVINO Model Server (OVMS) <https://github.com/openvinotoolkit/model_server>`__ and Huggingface `Text Embeddings Inference <https://github.com/huggingface/text-embeddings-inference>`__ microservice are the options provided to run embedding models efficiently on target Intel hardware. OVMS is the default option due to performance benefits on Intel hardware.
3737
- **Reranking inference microservice**: Huggingface `Text Embeddings Inference <https://github.com/huggingface/text-embeddings-inference>`__ microservice is the model serving choice available.
3838
- **Document ingestion microservice**: The sample `document ingestion <https://github.com/open-edge-platform/edge-ai-libraries/tree/release-1.2.0/microservices/document-ingestion/pgvector>`__ microservice allows ingestion of common document formats like PDF and DOC, and contents from web links. It supports a REST endpoint to ingest the documents. The ingestion process creates embeddings of the documents and stores them in the preferred vector database. The modular architecture allows users to customize the vector database. The sample application uses `PGvector <https://github.com/pgvector/pgvector>`__ database. The raw documents are stored in the `MinIO <https://github.com/minio/minio>`__ datastore, which is also customizable.

sample-applications/chat-question-and-answer/docs/user-guide/overview-architecture.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,10 +96,10 @@ The application flow is illustrated in the flow diagram below. The diagram shows
9696
The ChatQ&A sample application is designed with modularity in mind, allowing developers to:
9797
1. **Change inference microservices**:
9898
- The default option is OVMS for LLM and TEI for embeddings and reranker.
99-
- Use other model servers like vLLM with OpenVINO backend, and TGI to host LLM models.
99+
- (*Deprecated effective 2025.2.0*) Use other model servers like vLLM with OpenVINO backend, and TGI to host LLM models.
100100
- Mandatory requirement is OpenAI API compliance. Note that other model servers are not guaranteed to provide same performance as default options.
101101
2. **Load different LLM, Embedding, and Reranker models**:
102-
- Use different models from Hugging Face OpenVINO model hub or vLLM model hub. The models are passed as a parameter to corresponding model servers.
102+
- Use different models from Hugging Face OpenVINO model hub or vLLM model hub. The models are passed as a parameter to corresponding model servers. (*vLLM support is deprecated effective 2025.2.0*)
103103
3. **Use other GenAI frameworks like Haystack and LlamaIndex**:
104104
- Integrate the inference microservices into an application backend developed on other frameworks similar to the LangChain integration provided in this sample application.
105105
4. **Deploy on diverse Intel target hardware and deployment scenarios**:

sample-applications/chat-question-and-answer/docs/user-guide/overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Key features include:
2626

2727
The ChatQ&A sample application includes the following components:
2828

29-
- **LLM inference microservice**: Intel's optimized [OpenVINO Model Server (OVMS)](https://github.com/openvinotoolkit/model_server) is used to efficiently run large language models on Intel hardware. Developers also have other model serving options if required. vLLM with OpenVINO backend and TGI are the options provided.
29+
- **LLM inference microservice**: Intel's optimized [OpenVINO Model Server (OVMS)](https://github.com/openvinotoolkit/model_server) is used to efficiently run large language models on Intel hardware. Developers also have other model serving options if required. vLLM with OpenVINO backend and TGI are the options provided. (*vLLM and TGI support is deprecated effective 2025.2.0*)
3030
- **Embedding inference microservice**: Intel's optimized [OpenVINO Model Server (OVMS)](https://github.com/openvinotoolkit/model_server) and Huggingface [Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference) microservice are the options provided to run embedding models efficiently on target Intel hardware. OVMS is the default option due to performance benefits on Intel hardware.
3131
- **Reranking inference microservice**: Huggingface [Text Embeddings Inference](https://github.com/huggingface/text-embeddings-inference) microservice is the model serving choice available.
3232
- **Document ingestion microservice**: The sample [document ingestion](../../../../microservices/document-ingestion/) microservice allows ingestion of common document formats like PDF and DOC, and contents from web links. It supports a REST endpoint to ingest the documents. The ingestion process creates embeddings of the documents and stores them in the preferred vector database. The modular architecture allows users to customize the vector database. The sample application uses [PGvector](https://github.com/pgvector/pgvector) database. The raw documents are stored in the [MinIO](https://github.com/minio/minio) datastore, which is also customizable.

sample-applications/chat-question-and-answer/docs/user-guide/release-notes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
- The upload button is temporarily disabled during chat response generation to prevent delays. File or link uploads trigger embedding generation, which runs on the same OVMS server as the LLM, potentially slowing response streaming if both run together.
1717
- Chat data is stored in localStorage for session continuity. After container restarts, old chats may reappear — clear your browser’s localStorage to start fresh.
1818
- Limited validation done on EMT-S due to EMT-S issues. Not recommended to use ChatQnA modular on EMT-S until full validation is completed.
19-
- TGI on EMT 3.0 on Core&trade; configuration has a long startup time due to resource constraints. Alternative is to use TGI only on Xeon® based systems.
19+
- TGI on EMT 3.0 on Core&trade; configuration has a long startup time due to resource constraints. Alternative is to use TGI only on Xeon® based systems. (*Low priority as TGI and vLLM is deprecated effective 2025.2.0*)
2020
- DeepSeek/Phi Models are observed, at times, to continue generating response in an endless loop. Close the browser and restart in such cases.
2121

2222
## Previous Releases

sample-applications/chat-question-and-answer/docs/user-guide/system-requirements.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ This page provides detailed hardware, software, and platform requirements to hel
2020
## Minimum Configuration
2121
The recommended minimum configuration depends on the model serving used.
2222
- For OVMS based deployment, recommendation for memory is 64GB and storage is 128 GB. This is applicable for both Ubuntu and EMT 3.0.
23-
- For vLLM based deployment, recommendation for memory is 128GB. Minimum storage is 128GB, but check based on the model configuration. Memory configuration can be reduced by changing the default KV_CACHE_SPACE to a lower value. Lower KV_CACHE has impact on the performance and accuracy of the pipeline. This is applicable for both Ubuntu and EMT 3.0.
24-
- For TGI based deloyment on EMT 3.0, recommendation is to run it on Xeon® based systems. TGI on Core&trade; is observed to take a long time to startup with no guarantee that it will be functional. No such limitations on Ubuntu based systems for TGI.
23+
- For vLLM based deployment, recommendation for memory is 128GB. Minimum storage is 128GB, but check based on the model configuration. Memory configuration can be reduced by changing the default KV_CACHE_SPACE to a lower value. Lower KV_CACHE has impact on the performance and accuracy of the pipeline. This is applicable for both Ubuntu and EMT 3.0. (*vLLM is deprecated effective 2025.2.0*)
24+
- For TGI based deloyment on EMT 3.0, recommendation is to run it on Xeon® based systems. TGI on Core&trade; is observed to take a long time to startup with no guarantee that it will be functional. No such limitations on Ubuntu based systems for TGI. (*TGI is deprecated effective 2025.2.0*)
2525

2626
Further requirements is dependent on the specific configuration of the application like KV cache, context size etc. Any changes to the default parameters of the sample application should be assessed for memory and storage implications. Raise a git issue in case of any required support for smaller configurations.
2727

0 commit comments

Comments
 (0)