Skip to content

boyan-kirov/localLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local LLM Playground

This workspace spins up an Ollama container and provides a tiny TypeScript client for sending prompts to the API.

Prerequisites

  • Docker + Docker Compose v2
  • Node.js 18+ (for native fetch support)
  • NVIDIA GPU with proprietary drivers (optional, but recommended for speed)

Enable GPU on Ubuntu

  1. Install the latest NVIDIA driver

    sudo apt update
    sudo ubuntu-drivers autoinstall
    sudo reboot
  2. Install Docker Engine (skip if you already have it)

    sudo apt update
    sudo apt install -y ca-certificates curl gnupg
    sudo install -m 0755 -d /etc/apt/keyrings
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
    sudo apt update
    sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
  3. Install the NVIDIA Container Toolkit (lets Docker expose GPUs to containers)

    distribution=$(. /etc/os-release; echo $ID$VERSION_ID)
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit.gpg
    curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
      sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit.gpg] https://#' | \
      sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    sudo apt update
    sudo apt install -y nvidia-container-toolkit
    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker
  4. Verify GPU access from Docker

    docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi

Once these steps succeed, docker compose up -d will start Ollama with GPU acceleration automatically (see docker-compose.yml).

Start Ollama

docker compose up -d

Pull at least one model before sending prompts (example: llama3).

docker compose exec local-ollama ollama pull llama3

Install dependencies

npm install

Run the API server

npm run serve

For hot reload during development:

npm run dev

Both commands respect PORT (default 3000).

Call the endpoint

curl -s \
	-X POST http://localhost:3000/api/prompt \
	-H "Content-Type: application/json" \
	-d '{"prompt":"Write a haiku about local models."}'

Environment overrides

  • OLLAMA_HOST (default http://localhost:11434)
  • OLLAMA_MODEL (default gemma3:latest)
  • PORT (default 3000)

Override example:

PORT=4000 OLLAMA_MODEL=mistral npm run serve

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors