Model endpoint does not work after server is deployed successfully on single node

Here is the output of the successful deployment:

```text
PLAY RECAP **************************************************************************************************************************************************
master1                    : ok=71   changed=18   unreachable=0    failed=0    skipped=109  rescued=0    ignored=1   

Tuesday 28 October 2025  23:12:39 +0000 (0:00:00.106)       0:00:44.212 ******* 
=============================================================================== 
utils : Wait for node-topology-optimizer pods to be ready (single/multi-node support) ---------------------------------------------------------------- 5.68s
utils : Get detailed CPU information ----------------------------------------------------------------------------------------------------------------- 4.26s
utils : Get accurate CPU counts per NUMA node -------------------------------------------------------------------------------------------------------- 4.22s
utils : Check for AMX support ------------------------------------------------------------------------------------------------------------------------ 4.19s
utils : Get total number of NUMA nodes --------------------------------------------------------------------------------------------------------------- 4.18s
utils : Get total number of sockets ------------------------------------------------------------------------------------------------------------------ 4.18s
Transfer Dependency keycloak-realmcreationfile ------------------------------------------------------------------------------------------------------- 3.71s
utils : Check for AVX-512 support -------------------------------------------------------------------------------------------------------------------- 2.92s
Update vLLM Helm chart values with optimized CPU and memory settings --------------------------------------------------------------------------------- 1.02s
Deploy CPU based LLM model  llama 8b Installation ---------------------------------------------------------------------------------------------------- 0.55s
inference-tools : Ensure Python pip module is installed ---------------------------------------------------------------------------------------------- 0.52s
inference-tools : Install Kubernetes Python SDK ------------------------------------------------------------------------------------------------------ 0.52s
inference-tools : Install Deployment Client ---------------------------------------------------------------------------------------------------------- 0.50s
inference-tools : Ensure jq is installed ------------------------------------------------------------------------------------------------------------- 0.49s
Create/Update Kubernetes Secret for Hugging Face Token ----------------------------------------------------------------------------------------------- 0.43s
Fetch the keycloak client secret --------------------------------------------------------------------------------------------------------------------- 0.39s
Delete Ingress resource Llama8b from default namespace ----------------------------------------------------------------------------------------------- 0.36s
Delete Ingress resource Llama8b from auth-apisix namespace ------------------------------------------------------------------------------------------- 0.35s
utils : Delete node-topology-optimizer daemonset ----------------------------------------------------------------------------------------------------- 0.35s
utils : Enhanced CPU topology and socket detection for optimal NRI balloon policy -------------------------------------------------------------------- 0.34s
Inference LLM Model is deployed successfully.
-------------------------------------------------------------------------------------
|  AI LLM Model Deployment Complete!                                                |
|  The model is transitioning to a state ready for Inference.                       |
|  This may take some time depending on system resources and other factors.         |
|  Please standby...                                                                |
--------------------------------------------------------------------------------------

Accessing Deployed Models for Inference
https://github.com/opea-project/Enterprise-Inference/blob/main/docs/accessing-deployed-models.md

Please refer to this comprehensive guide for detailed instructions.
```

And here is the command I issue that gives me back the error:

```bash
curl -k ${BASE_URL}/Meta-Llama-3.1-8B-Instruct-vllmcpu/v1/completions -X POST -d '{"model": "meta-llama/Meta-Llama-3.1-8B-Instruct", "prompt": "What is Deep Learning?", "max_tokens": 25, "temperature": 0}' -H 'Content-Type: application/json' -H "Authorization: Bearer $TOKEN"
```

Error:

```text
{"error":"Unable to find matching target resource method","error_description":"For more on this error consult the server log at the debug level."}
```

Here is a little bit more on the state of the K8s server. 
```bash
kubectl get nodes
```
Error:
```text
E1028 23:48:07.183507 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.185027 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.186393 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.187612 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
E1028 23:48:07.189079 1680236 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp 127.0.0.1:8080: connect: connection refused"
The connection to the server localhost:8080 was refused - did you specify the right host or port?

```

But if I use sudo, I do get the services:

```bash
sudo kubectl get nodes
```

Output:

```text
NAME      STATUS   ROLES           AGE   VERSION
master1   Ready    control-plane   67m   v1.31.4
```

Or if I inquire about the pods, I do show them running.

```bash
sudo kubectl get pods --all-namespaces
```
Output: 
```text
NAMESPACE            NAME                                                     READY   STATUS    RESTARTS      AGE
auth-apisix          auth-apisix-77784f9df6-mlwvb                             1/1     Running   0             61m
auth-apisix          auth-apisix-etcd-0                                       1/1     Running   0             61m
auth-apisix          auth-apisix-ingress-controller-57889db898-gms9t          1/1     Running   0             61m
default              keycloak-0                                               1/1     Running   0             63m
default              keycloak-postgresql-0                                    1/1     Running   0             63m
default              vllm-llama-8b-cpu-6766887fc5-fnbzx                       1/1     Running   0             57m
ingress-nginx        ingress-nginx-controller-77674b4c66-rv8td                1/1     Running   0             64m
kube-system          calico-kube-controllers-5db5978889-rrf8z                 1/1     Running   0             65m
kube-system          calico-node-f9vtc                                        1/1     Running   0             65m
kube-system          coredns-d665d669-vfq9j                                   1/1     Running   0             65m
kube-system          dns-autoscaler-597dccb9b9-xgbqc                          1/1     Running   0             65m
kube-system          kube-apiserver-master1                                   1/1     Running   1             65m
kube-system          kube-controller-manager-master1                          1/1     Running   2             65m
kube-system          kube-proxy-brzlx                                         1/1     Running   0             65m
kube-system          kube-scheduler-master1                                   1/1     Running   1             65m
kube-system          kubernetes-dashboard-7f4d4b895-g5wxv                     1/1     Running   0             65m
kube-system          kubernetes-metrics-scraper-6d4c5d99f9-nmkv6              1/1     Running   0             65m
kube-system          nodelocaldns-5fslh                                       1/1     Running   0             65m
kube-system          nri-resource-policy-balloons-chj5q                       1/1     Running   0             46m
kube-system          registry-d29bm                                           1/1     Running   0             65m
local-path-storage   local-path-provisioner-68b545849f-x88cd                  1/1     Running   0             65m
observability        alertmanager-observability-kube-prometh-alertmanager-0   2/2     Running   0             59m
observability        logs-stack-loki-chunks-cache-0                           2/2     Running   0             58m
observability        logs-stack-loki-results-cache-0                          2/2     Running   0             58m
observability        logs-stack-minio-0                                       1/1     Running   0             58m
observability        logs-stack-otelcol-logs-agent-2rkw6                      1/1     Running   0             58m
observability        loki-backend-0                                           2/2     Running   2 (58m ago)   58m
observability        loki-canary-l2fn6                                        1/1     Running   0             58m
observability        loki-read-65f4d5454-5s9hq                                1/1     Running   0             58m
observability        loki-write-0                                             1/1     Running   0             58m
observability        observability-grafana-7cfbbc56d9-h98m5                   3/3     Running   0             59m
observability        observability-kube-prometh-operator-7f9f746444-mrv7k     1/1     Running   0             59m
observability        observability-kube-state-metrics-5d5dbd6d87-txksl        1/1     Running   0             59m
observability        observability-prometheus-node-exporter-kq7rb             1/1     Running   0             59m
observability        prometheus-observability-kube-prometh-prometheus-0       2/2     Running   0             59m
observability        prometheus-observability-kube-prometh-prometheus-1       2/2     Running   0             59m
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model endpoint does not work after server is deployed successfully on single node #29

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model endpoint does not work after server is deployed successfully on single node #29

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions