This repository contains the source code and the config files for LLM Inference Engine which is based on llama.cpp and built to be run on AMD Strix Halo Hardware with Vulkan backend.
- Base Container Image: I am planning to run the inference engine inside of a container, so the base container is built with all the necessary dependencies such as AMD drivers and Vulkan SDK.
- LLM Inference Engine: The main application that utilizes llama.cpp to perform inference. (in progress)
- Monitoring Service: Some sort of metrics collection tool to send data to LangFuse or Grafana. (in progress)
docker build -f base.dockerfile -t llm-inference-base .The base Docker image is automatically built and pushed to GitHub Container Registry (GHCR) when you create a tag on the main branch.
# Create and push a tag (with v prefix)
git tag v1.0.0
git push origin v1.0.0
# OR create a tag without v prefix
git tag 1.0.0
git push origin 1.0.0This will trigger the GitHub Actions workflow that:
- Builds the base Docker image
- Pushes it to
ghcr.io/avikantsrivastava/llm-inference-engine/basewith the tag name (e.g.,v1.0.0or1.0.0) - Also tags it as
latestif on the default branch
You can also manually trigger the workflow from the GitHub Actions tab:
- Go to Actions → "Build and Push Base Docker Image"
- Click "Run workflow"
- Optionally specify a custom tag name (defaults to
latest)
# Pull the latest image
docker pull ghcr.io/avikantsrivastava/llm-inference-engine/base:latest
# Pull a specific version (with v prefix)
docker pull ghcr.io/avikantsrivastava/llm-inference-engine/base:v1.0.0
# Pull a specific version (without v prefix)
docker pull ghcr.io/avikantsrivastava/llm-inference-engine/base:1.0.0Note: The image is published as a public package and can be pulled without authentication.