How does applying a model from URL work?

Hello! I am an absolute LLM noob so I apologize if these are rather basic questions. I am loving LocalAI so far and it's been incredibly easy to get running with models from the gallery.

I wanted to try a model where the definition does not contain a URL, like Vicuna or Koala. The instructions indicate a POST request should be sent, using the `koala.yaml` configuration file from this repository and to supply URI(s) to actual model files to use, probably from HuggingFace:

```
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/xxxx",
            "sha256": "xxx",
            "filename": "koala.bin"
        }
     ]
   }'
```

So I went to HuggingFace, searched `koala` and reviewed one of the top results. It appears to have the model split into multiple files:
- `pytorch_model-00001-of-000002.bin`
- `pytorch_model-00002-of-000002.bin`

Presumably both of these files are needed. I couldn't find examples of how to handle model `bin` files that are split across multiple files. Additional, some light research indicates I couldn't just `cat` the model files together.

I found this repository that seems to host a single `koala` model file. So I tried that:

```
curl $LOCALAI/models/apply -H "Content-Type: application/json" -d '{
     "url": "github:go-skynet/model-gallery/koala.yaml",
     "name": "koala",
     "overrides": { "parameters": {"model": "koala.bin" } },
     "files": [
        {
            "uri": "https://huggingface.co/4bit/koala-13B-GPTQ-4bit-128g/resolve/main/koala-13B-4bit-128g.safetensors",
            "sha256": "${SHA}",
            "filename": "koala.bin"
        }
     ]
   }'
```

(I downloaded the file first and calculated the SHA256, then ran this command and LocalAI also downloaded the model. Is that right?)

After the job finished processing, I was able to see the new model defined:
```
$ curl -q $LOCALAI/v1/models | jq '.'
{
  "object": "list",
  "data": [
    {
      "id": "ggml-gpt4all-j",
      "object": "model"
    },
    {
      "id": "koala.bin",
      "object": "model"
    },
  ]
}
````

I proceeded to place `prompt-templates/koala.tmpl` into the `models/` directory. I then tried to call the model and got a 500 error:
```
$ curl $LOCALAI/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "koala.bin",
     "messages": [{"role": "user", "content": "How are you?"}],
     "temperature": 0.9 
   }'
{"error":{"code":500,"message":"could not load model - all backends returned error: 12 errors occurred:\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\t* failed loading model\n\n","type":""}}
```

I am sure I took a wrong turn at some point. Any advice? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How does applying a model from URL work? #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How does applying a model from URL work? #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions