Skip to content

Model endpoint does not work after server is deployed successfully on single node #29

@bconsolvo

Description

@bconsolvo

Here is the output of the successful deployment:

PLAY RECAP **************************************************************************************************************************************************
master1                    : ok=71   changed=18   unreachable=0    failed=0    skipped=109  rescued=0    ignored=1   

Tuesday 28 October 2025  23:12:39 +0000 (0:00:00.106)       0:00:44.212 ******* 
=============================================================================== 
utils : Wait for node-topology-optimizer pods to be ready (single/multi-node support) ---------------------------------------------------------------- 5.68s
utils : Get detailed CPU information ----------------------------------------------------------------------------------------------------------------- 4.26s
utils : Get accurate CPU counts per NUMA node -------------------------------------------------------------------------------------------------------- 4.22s
utils : Check for AMX support ------------------------------------------------------------------------------------------------------------------------ 4.19s
utils : Get total number of NUMA nodes --------------------------------------------------------------------------------------------------------------- 4.18s
utils : Get total number of sockets ------------------------------------------------------------------------------------------------------------------ 4.18s
Transfer Dependency keycloak-realmcreationfile ------------------------------------------------------------------------------------------------------- 3.71s
utils : Check for AVX-512 support -------------------------------------------------------------------------------------------------------------------- 2.92s
Update vLLM Helm chart values with optimized CPU and memory settings --------------------------------------------------------------------------------- 1.02s
Deploy CPU based LLM model  llama 8b Installation ---------------------------------------------------------------------------------------------------- 0.55s
inference-tools : Ensure Python pip module is installed ---------------------------------------------------------------------------------------------- 0.52s
inference-tools : Install Kubernetes Python SDK ------------------------------------------------------------------------------------------------------ 0.52s
inference-tools : Install Deployment Client ---------------------------------------------------------------------------------------------------------- 0.50s
inference-tools : Ensure jq is installed ------------------------------------------------------------------------------------------------------------- 0.49s
Create/Update Kubernetes Secret for Hugging Face Token ----------------------------------------------------------------------------------------------- 0.43s
Fetch the keycloak client secret --------------------------------------------------------------------------------------------------------------------- 0.39s
Delete Ingress resource Llama8b from default namespace ----------------------------------------------------------------------------------------------- 0.36s
Delete Ingress resource Llama8b from auth-apisix namespace ------------------------------------------------------------------------------------------- 0.35s
utils : Delete node-topology-optimizer daemonset ----------------------------------------------------------------------------------------------------- 0.35s
utils : Enhanced CPU topology and socket detection for optimal NRI balloon policy -------------------------------------------------------------------- 0.34s
Inference LLM Model is deployed successfully.
-------------------------------------------------------------------------------------
|  AI LLM Model Deployment Complete!                                                |
|  The model is transitioning to a state ready for Inference.                       |
|  This may take some time depending on system resources and other factors.         |
|  Please standby...                                                                |
--------------------------------------------------------------------------------------

Accessing Deployed Models for Inference
https://github.com/opea-project/Enterprise-Inference/blob/main/docs/accessing-deployed-models.md

Please refer to this comprehensive guide for detailed instructions.

And here is the command I issue that gives me back the error:

curl -k ${BASE_URL}/Meta-Llama-3.1-8B-Instruct-vllmcpu/v1/completions -X POST -d '{"model": "meta-llama/Meta-Llama-3.1-8B-Instruct", "prompt": "What is Deep Learning?", "max_tokens": 25, "temperature": 0}' -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN"

Error:

{"error":"Unable to find matching target resource method","error_description":"For more on this error consult the server log at the debug level."}

Here is a little bit more on the state of the K8s server.

kubectl get nodes

Error:

E1028 23:48:07.183507 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.185027 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.186393 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.187612 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.189079 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?

But if I use sudo, I do get the services:

sudo kubectl get nodes

Output:

NAME      STATUS   ROLES           AGE   VERSION
master1   Ready    control-plane   67m   v1.31.4

Or if I inquire about the pods, I do show them running.

sudo kubectl get pods --all-namespaces

Output:

NAMESPACE            NAME                                                     READY   STATUS    RESTARTS      AGE
auth-apisix          auth-apisix-77784f9df6-mlwvb                             1/1     Running   0             61m
auth-apisix          auth-apisix-etcd-0                                       1/1     Running   0             61m
auth-apisix          auth-apisix-ingress-controller-57889db898-gms9t          1/1     Running   0             61m
default              keycloak-0                                               1/1     Running   0             63m
default              keycloak-postgresql-0                                    1/1     Running   0             63m
default              vllm-llama-8b-cpu-6766887fc5-fnbzx                       1/1     Running   0             57m
ingress-nginx        ingress-nginx-controller-77674b4c66-rv8td                1/1     Running   0             64m
kube-system          calico-kube-controllers-5db5978889-rrf8z                 1/1     Running   0             65m
kube-system          calico-node-f9vtc                                        1/1     Running   0             65m
kube-system          coredns-d665d669-vfq9j                                   1/1     Running   0             65m
kube-system          dns-autoscaler-597dccb9b9-xgbqc                          1/1     Running   0             65m
kube-system          kube-apiserver-master1                                   1/1     Running   1             65m
kube-system          kube-controller-manager-master1                          1/1     Running   2             65m
kube-system          kube-proxy-brzlx                                         1/1     Running   0             65m
kube-system          kube-scheduler-master1                                   1/1     Running   1             65m
kube-system          kubernetes-dashboard-7f4d4b895-g5wxv                     1/1     Running   0             65m
kube-system          kubernetes-metrics-scraper-6d4c5d99f9-nmkv6              1/1     Running   0             65m
kube-system          nodelocaldns-5fslh                                       1/1     Running   0             65m
kube-system          nri-resource-policy-balloons-chj5q                       1/1     Running   0             46m
kube-system          registry-d29bm                                           1/1     Running   0             65m
local-path-storage   local-path-provisioner-68b545849f-x88cd                  1/1     Running   0             65m
observability        alertmanager-observability-kube-prometh-alertmanager-0   2/2     Running   0             59m
observability        logs-stack-loki-chunks-cache-0                           2/2     Running   0             58m
observability        logs-stack-loki-results-cache-0                          2/2     Running   0             58m
observability        logs-stack-minio-0                                       1/1     Running   0             58m
observability        logs-stack-otelcol-logs-agent-2rkw6                      1/1     Running   0             58m
observability        loki-backend-0                                           2/2     Running   2 (58m ago)   58m
observability        loki-canary-l2fn6                                        1/1     Running   0             58m
observability        loki-read-65f4d5454-5s9hq                                1/1     Running   0             58m
observability        loki-write-0                                             1/1     Running   0             58m
observability        observability-grafana-7cfbbc56d9-h98m5                   3/3     Running   0             59m
observability        observability-kube-prometh-operator-7f9f746444-mrv7k     1/1     Running   0             59m
observability        observability-kube-state-metrics-5d5dbd6d87-txksl        1/1     Running   0             59m
observability        observability-prometheus-node-exporter-kq7rb             1/1     Running   0             59m
observability        prometheus-observability-kube-prometh-prometheus-0       2/2     Running   0             59m
observability        prometheus-observability-kube-prometh-prometheus-1       2/2     Running   0             59m

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions