-
Notifications
You must be signed in to change notification settings - Fork 109
Description
What happened?
I have been following the instructions from How to run Llama-3-8B with Kubernetes, but I am unable to get it working. The container is in the RUNNING state, and I have added the -s 0.0.0.0:8080 flag to open the service (learned from LlamaEdge). However, I am still unable to access it, and it does not seem to be working as expected.
Here is a summary of the steps I followed:
I applied the Kubernetes deployment YAML to start the Llama API (runtime and wasm-sandboxer all setup in right ways):
apiVersion: apps/v1
kind: Deployment
metadata:
name: llama
labels:
app: llama
spec:
replicas: 1
selector:
matchLabels:
app: llama
template:
metadata:
labels:
app: llama
spec:
containers:
- command:
- llama-api-server.wasm
args: ["--prompt-template", "llama-3-chat", "--ctx-size", "4096", "--model-name", "Llama-3-8B", "-s", "0.0.0.0:8080"]
env:
- name: io.kuasar.wasm.nn_preload
value: default:GGML:AUTO:Meta-Llama-3-8B-Instruct-Q5_K_M.gguf
image: docker.io/kuasario/llama-api-server:v1
name: llama-api-server
runtimeClassName: kuasar-wasm- The container is running successfully (kubectl get pods shows RUNNING, It means that all the prerequisites have been met ).
- I am using the correct -s 0.0.0.0:8080 flag in the command arguments.
First, I did not use the SERVICE strategy, and instead opted for a lightweight testing approach. I used port forwarding to expose the container's 8080 port on my local machine's 8000 port. Then, I temporarily used the curl command mentioned in LlamaEdge to make API requests, with a slight modification to the command. However, the result was:
- This error suggests that the application inside the container is not successfully listening on the 8080 port or the port forwarding is failing due to some internal issue with the container setup.
- The container is indeed in the running state, However, the logs are empty, and even after testing with the "--log-all" option, there are still no logs, which could indicate that the service has not been started or encountered an error.
- Since this is a Wasm sandbox, it is impossible to directly enter the container to check if the service has started properly.
Here are a few things that could be happening:
- The service inside the container might not be correctly binding to the 8080 port.
- The port forwarding might not be properly set up, or Kubernetes networking might be blocking the connection to the port.
- There could be an issue with the Wasm runtime or the wasmedge environment not correctly handling the service.
Could you provide guidance on troubleshooting this issue further, or is there something I'm missing in the configuration or setup?
What did you expect to happen?
I hope to successfully use Kuasar as the runtime to deploy this service.
How can we reproduce it (as minimally and precisely as possible)?
You just need to repeat the steps in How to run Llama-3-8B with Kubernetes. I strictly followed the recommended versions of containerd and wasmedge as specified in the repository, yet it still doesn't work.
Anything else we need to know?
In the above test, I used the -s parameter to specify the server listening port for the application inside the container. The original configuration did not include this parameter, but after checking, it seems that the default port is 8080. However, the test results remain the same.
Dev environment
No response



