Skip to content

Unable to access the Llama API after following the deployment instructions #184

@lzaeh

Description

@lzaeh

What happened?

I have been following the instructions from How to run Llama-3-8B with Kubernetes, but I am unable to get it working. The container is in the RUNNING state, and I have added the -s 0.0.0.0:8080 flag to open the service (learned from LlamaEdge). However, I am still unable to access it, and it does not seem to be working as expected.

Here is a summary of the steps I followed:

I applied the Kubernetes deployment YAML to start the Llama API (runtime and wasm-sandboxer all setup in right ways):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: llama
  labels:
    app: llama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: llama
  template:
    metadata:
      labels:
        app: llama
    spec:
      containers:
      - command:
        - llama-api-server.wasm
        args: ["--prompt-template", "llama-3-chat", "--ctx-size", "4096", "--model-name", "Llama-3-8B", "-s", "0.0.0.0:8080"]
        env:
        - name: io.kuasar.wasm.nn_preload
          value: default:GGML:AUTO:Meta-Llama-3-8B-Instruct-Q5_K_M.gguf
        image: docker.io/kuasario/llama-api-server:v1
        name: llama-api-server
      runtimeClassName: kuasar-wasm
  • The container is running successfully (kubectl get pods shows RUNNING, It means that all the prerequisites have been met ).
  • I am using the correct -s 0.0.0.0:8080 flag in the command arguments.

First, I did not use the SERVICE strategy, and instead opted for a lightweight testing approach. I used port forwarding to expose the container's 8080 port on my local machine's 8000 port. Then, I temporarily used the curl command mentioned in LlamaEdge to make API requests, with a slight modification to the command. However, the result was:

Image

Image

Image

Image

  • This error suggests that the application inside the container is not successfully listening on the 8080 port or the port forwarding is failing due to some internal issue with the container setup.
  • The container is indeed in the running state, However, the logs are empty, and even after testing with the "--log-all" option, there are still no logs, which could indicate that the service has not been started or encountered an error.
  • Since this is a Wasm sandbox, it is impossible to directly enter the container to check if the service has started properly.

Here are a few things that could be happening:

  • The service inside the container might not be correctly binding to the 8080 port.
  • The port forwarding might not be properly set up, or Kubernetes networking might be blocking the connection to the port.
  • There could be an issue with the Wasm runtime or the wasmedge environment not correctly handling the service.

Could you provide guidance on troubleshooting this issue further, or is there something I'm missing in the configuration or setup?

What did you expect to happen?

I hope to successfully use Kuasar as the runtime to deploy this service.

How can we reproduce it (as minimally and precisely as possible)?

You just need to repeat the steps in How to run Llama-3-8B with Kubernetes. I strictly followed the recommended versions of containerd and wasmedge as specified in the repository, yet it still doesn't work.

Anything else we need to know?

In the above test, I used the -s parameter to specify the server listening port for the application inside the container. The original configuration did not include this parameter, but after checking, it seems that the default port is 8080. However, the test results remain the same.

Dev environment

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions