Skip to content
This repository has been archived by the owner on Jun 22, 2024. It is now read-only.
This repository has been archived by the owner on Jun 22, 2024. It is now read-only.

How does applying a model from URL work? #4

Open
@braheezy

Description

Hello! I am an absolute LLM noob so I apologize if these are rather basic questions. I am loving LocalAI so far and it's been incredibly easy to get running with models from the gallery.

I wanted to try a model where the definition does not contain a URL, like Vicuna or Koala. The instructions indicate a POST request should be sent, using the koala.yaml configuration file from this repository and to supply URI(s) to actual model files to use, probably from HuggingFace:

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/xxxx",
            "sha256": "xxx",
            "filename": "koala.bin"
        }
     ]
   }'

So I went to HuggingFace, searched koala and reviewed one of the top results. It appears to have the model split into multiple files:

  • pytorch_model-00001-of-000002.bin
  • pytorch_model-00002-of-000002.bin

Presumably both of these files are needed. I couldn't find examples of how to handle model bin files that are split across multiple files. Additional, some light research indicates I couldn't just cat the model files together.

I found this repository that seems to host a single koala model file. So I tried that:

curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/4bit/koala-13B-GPTQ-4bit-128g/resolve/main/koala-13B-4bit-128g.safetensors",
            "sha256": "${SHA}",
            "filename": "koala.bin"
        }
     ]
   }'

(I downloaded the file first and calculated the SHA256, then ran this command and LocalAI also downloaded the model. Is that right?)

After the job finished processing, I was able to see the new model defined:

$ curl -q $LOCALAI/v1/models | jq '.'
{
  "object": "list",
  "data": [
    {
      "id": "ggml-gpt4all-j",
      "object": "model"
    },
    {
      "id": "koala.bin",
      "object": "model"
    },
  ]
}

I proceeded to place prompt-templates/koala.tmpl into the models/ directory. I then tried to call the model and got a 500 error:

$ curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "koala.bin",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'
{"error":{"code":500,"message":"could not load model - all backends returned error: 12 errors occurred:\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\n","type":""}}

I am sure I took a wrong turn at some point. Any advice? Thanks!

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions