Local AI using CPU instead of GPU - CUDA

**LocalAI version:**

1.30.0 Latest.

**Environment, CPU architecture, OS, and Version:**

Window server 2022. Xeon E5 2670v2. GPU Geforece GTX 1070
**Describe the bug**

LocalAI using CPU instead of GPU. CUDA remains 0% When calling chat completeion.

**To Reproduce**


**Expected behavior**


**Logs**


**Additional context**

Configure in env:

![image](https://github.com/go-skynet/LocalAI/assets/15341167/8eda6bee-ec81-4a89-8fc0-d89cdfdf6e26)


Docker Compose:
version: '3.6'

services:
  api:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    image: quay.io/go-skynet/local-ai:master-cublas-cuda12
    tty: true # enable colorized logs
    restart: always # should this be on-failure ?
    ports:
      - 8080:8080
    env_file:
      - .env
    volumes:
      - ./models:/models
    command: ["/usr/bin/local-ai" ]


![image](https://github.com/go-skynet/LocalAI/assets/15341167/e4ed1236-a442-4f8d-bb03-657dd41a3f6e)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local AI using CPU instead of GPU - CUDA #1108

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Local AI using CPU instead of GPU - CUDA #1108

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions