-
Notifications
You must be signed in to change notification settings - Fork 20
Open
Description
Here is the output of the successful deployment:
PLAY RECAP **************************************************************************************************************************************************
master1 : ok=71 changed=18 unreachable=0 failed=0 skipped=109 rescued=0 ignored=1
Tuesday 28 October 2025 23:12:39 +0000 (0:00:00.106) 0:00:44.212 *******
===============================================================================
utils : Wait for node-topology-optimizer pods to be ready (single/multi-node support) ---------------------------------------------------------------- 5.68s
utils : Get detailed CPU information ----------------------------------------------------------------------------------------------------------------- 4.26s
utils : Get accurate CPU counts per NUMA node -------------------------------------------------------------------------------------------------------- 4.22s
utils : Check for AMX support ------------------------------------------------------------------------------------------------------------------------ 4.19s
utils : Get total number of NUMA nodes --------------------------------------------------------------------------------------------------------------- 4.18s
utils : Get total number of sockets ------------------------------------------------------------------------------------------------------------------ 4.18s
Transfer Dependency keycloak-realmcreationfile ------------------------------------------------------------------------------------------------------- 3.71s
utils : Check for AVX-512 support -------------------------------------------------------------------------------------------------------------------- 2.92s
Update vLLM Helm chart values with optimized CPU and memory settings --------------------------------------------------------------------------------- 1.02s
Deploy CPU based LLM model llama 8b Installation ---------------------------------------------------------------------------------------------------- 0.55s
inference-tools : Ensure Python pip module is installed ---------------------------------------------------------------------------------------------- 0.52s
inference-tools : Install Kubernetes Python SDK ------------------------------------------------------------------------------------------------------ 0.52s
inference-tools : Install Deployment Client ---------------------------------------------------------------------------------------------------------- 0.50s
inference-tools : Ensure jq is installed ------------------------------------------------------------------------------------------------------------- 0.49s
Create/Update Kubernetes Secret for Hugging Face Token ----------------------------------------------------------------------------------------------- 0.43s
Fetch the keycloak client secret --------------------------------------------------------------------------------------------------------------------- 0.39s
Delete Ingress resource Llama8b from default namespace ----------------------------------------------------------------------------------------------- 0.36s
Delete Ingress resource Llama8b from auth-apisix namespace ------------------------------------------------------------------------------------------- 0.35s
utils : Delete node-topology-optimizer daemonset ----------------------------------------------------------------------------------------------------- 0.35s
utils : Enhanced CPU topology and socket detection for optimal NRI balloon policy -------------------------------------------------------------------- 0.34s
Inference LLM Model is deployed successfully.
-------------------------------------------------------------------------------------
| AI LLM Model Deployment Complete! |
| The model is transitioning to a state ready for Inference. |
| This may take some time depending on system resources and other factors. |
| Please standby... |
--------------------------------------------------------------------------------------
Accessing Deployed Models for Inference
https://github.com/opea-project/Enterprise-Inference/blob/main/docs/accessing-deployed-models.md
Please refer to this comprehensive guide for detailed instructions.
And here is the command I issue that gives me back the error:
curl -k ${BASE_URL}/Meta-Llama-3.1-8B-Instruct-vllmcpu/v1/completions -X POST -d '{"model": "meta-llama/Meta-Llama-3.1-8B-Instruct", "prompt": "What is Deep Learning?", "max_tokens": 25, "temperature": 0}' -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN"Error:
{"error":"Unable to find matching target resource method","error_description":"For more on this error consult the server log at the debug level."}
Here is a little bit more on the state of the K8s server.
kubectl get nodesError:
E1028 23:48:07.183507 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.185027 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.186393 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.187612 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.189079 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?
But if I use sudo, I do get the services:
sudo kubectl get nodesOutput:
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane 67m v1.31.4
Or if I inquire about the pods, I do show them running.
sudo kubectl get pods --all-namespacesOutput:
NAMESPACE NAME READY STATUS RESTARTS AGE
auth-apisix auth-apisix-77784f9df6-mlwvb 1/1 Running 0 61m
auth-apisix auth-apisix-etcd-0 1/1 Running 0 61m
auth-apisix auth-apisix-ingress-controller-57889db898-gms9t 1/1 Running 0 61m
default keycloak-0 1/1 Running 0 63m
default keycloak-postgresql-0 1/1 Running 0 63m
default vllm-llama-8b-cpu-6766887fc5-fnbzx 1/1 Running 0 57m
ingress-nginx ingress-nginx-controller-77674b4c66-rv8td 1/1 Running 0 64m
kube-system calico-kube-controllers-5db5978889-rrf8z 1/1 Running 0 65m
kube-system calico-node-f9vtc 1/1 Running 0 65m
kube-system coredns-d665d669-vfq9j 1/1 Running 0 65m
kube-system dns-autoscaler-597dccb9b9-xgbqc 1/1 Running 0 65m
kube-system kube-apiserver-master1 1/1 Running 1 65m
kube-system kube-controller-manager-master1 1/1 Running 2 65m
kube-system kube-proxy-brzlx 1/1 Running 0 65m
kube-system kube-scheduler-master1 1/1 Running 1 65m
kube-system kubernetes-dashboard-7f4d4b895-g5wxv 1/1 Running 0 65m
kube-system kubernetes-metrics-scraper-6d4c5d99f9-nmkv6 1/1 Running 0 65m
kube-system nodelocaldns-5fslh 1/1 Running 0 65m
kube-system nri-resource-policy-balloons-chj5q 1/1 Running 0 46m
kube-system registry-d29bm 1/1 Running 0 65m
local-path-storage local-path-provisioner-68b545849f-x88cd 1/1 Running 0 65m
observability alertmanager-observability-kube-prometh-alertmanager-0 2/2 Running 0 59m
observability logs-stack-loki-chunks-cache-0 2/2 Running 0 58m
observability logs-stack-loki-results-cache-0 2/2 Running 0 58m
observability logs-stack-minio-0 1/1 Running 0 58m
observability logs-stack-otelcol-logs-agent-2rkw6 1/1 Running 0 58m
observability loki-backend-0 2/2 Running 2 (58m ago) 58m
observability loki-canary-l2fn6 1/1 Running 0 58m
observability loki-read-65f4d5454-5s9hq 1/1 Running 0 58m
observability loki-write-0 1/1 Running 0 58m
observability observability-grafana-7cfbbc56d9-h98m5 3/3 Running 0 59m
observability observability-kube-prometh-operator-7f9f746444-mrv7k 1/1 Running 0 59m
observability observability-kube-state-metrics-5d5dbd6d87-txksl 1/1 Running 0 59m
observability observability-prometheus-node-exporter-kq7rb 1/1 Running 0 59m
observability prometheus-observability-kube-prometh-prometheus-0 2/2 Running 0 59m
observability prometheus-observability-kube-prometh-prometheus-1 2/2 Running 0 59m
Metadata
Metadata
Assignees
Labels
No labels