Skip to content

Commit 5bd34a2

Browse files
authored
Merge branch 'main' into dependabot/github_actions/actions-all-e1eb8bf241
2 parents f182b38 + 534dd3d commit 5bd34a2

25 files changed

Lines changed: 568 additions & 69 deletions

AGENTS.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# AGENTS.md
2+
3+
## Build & Test Commands
4+
5+
- `make test` — run all unit tests (`go test -v ./... -race -coverprofile=coverage.txt -covermode=atomic`)
6+
- `make lint` — run linter (`golangci-lint run -v ./... --timeout 5m`)
7+
- `make build-aikit` — build the AIKit Docker image via `docker buildx`
8+
- `make build-test-model` — build a test model image from a YAML aikitfile
9+
- Run `go mod tidy` after changing dependencies; CI verifies `go.mod`/`go.sum` are clean
10+
11+
## Code Style & Formatting
12+
13+
- golangci-lint v2 with formatters: `gofmt`, `gofumpt`, `goimports`, `gci` (import ordering)
14+
- Key linters enforced: `errcheck`, `errorlint`, `gosec`, `govet`, `staticcheck`, `revive`, `goconst`, `gocritic`, `godot`, `forcetypeassert`, `unconvert`, `unused`, `whitespace`, `misspell` (US locale)
15+
- Max line length: 200 characters
16+
- End every comment with a period (enforced by `godot`)
17+
- All files must end with a newline and have no trailing whitespace (pre-commit hooks)
18+
19+
## Commit Conventions
20+
21+
- PR titles must follow conventional commits: `feat`, `fix`, `build`, `chore`, `ci`, `docs`, `perf`, `refactor`, `revert`, `style`, `test`
22+
- Pre-commit hooks run: `gitleaks` (secret scanning), `golangci-lint`, `shellcheck`, `typos`
23+
24+
## Go Conventions
25+
26+
- Module path: `github.com/kaito-project/aikit`
27+
- Go 1.24.3 minimum, toolchain go1.26.1
28+
- Use `github.com/pkg/errors` for error wrapping (not `fmt.Errorf` with `%w`)
29+
- Logging via `github.com/sirupsen/logrus`
30+
- YAML parsing via `gopkg.in/yaml.v2`
31+
32+
## Architecture Quick Reference
33+
34+
- `cmd/frontend/` — BuildKit frontend entrypoint
35+
- `pkg/aikit/config/` — aikitfile YAML config structs and parsing
36+
- `pkg/aikit2llb/` — converts aikitfile configs to BuildKit LLB (inference/ and finetune/ subdirs)
37+
- `pkg/build/` — build orchestration and validation
38+
- `pkg/packager/` — OCI artifact packaging following CNCF ModelPack spec
39+
- `models/` — pre-made model YAML configs
40+
- `runners/` — runner definition YAMLs (llama-cpp-cpu, llama-cpp-cuda, vllm-cuda, diffusers-cuda)
41+
- `test/` — test aikitfile YAML fixtures

CLAUDE.md

Lines changed: 0 additions & 41 deletions
This file was deleted.

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
AGENTS.md

CONTRIBUTING.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,9 @@ This will automatically run linting and formatting checks before each commit.
5757

5858
## Building AIKit
5959

60+
> [!TIP]
61+
> Build targets default to multi-platform (`linux/amd64,linux/arm64`). For local development, pass your host architecture to speed up builds and avoid multi-platform issues — e.g. `make build-aikit PLATFORMS=linux/amd64`. You should also use the `default` buildx builder (`docker buildx use default`) so that locally built images are available to subsequent builds via the `#syntax=` directive.
62+
6063
### Build the AIKit Binary
6164

6265
```bash

Makefile

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,11 @@ run-test-model:
5151
run-test-model-gpu:
5252
docker run --rm -p 8080:8080 --gpus all ${REGISTRY}${REPOSITORY}/${TEST_IMAGE_NAME}:${TAG}
5353

54+
.PHONY: run-test-model-rocm
55+
run-test-model-rocm:
56+
docker run --rm -p 8080:8080 --device /dev/kfd --device /dev/dri --group-add video --group-add $$(stat -c '%g' /dev/dri/renderD128) \
57+
${REGISTRY}${REPOSITORY}/${TEST_IMAGE_NAME}:${TAG}
58+
5459
.PHONY: run-test-model-applesilicon
5560
run-test-model-applesilicon:
5661
podman run --rm -p 8080:8080 --device /dev/dri ${REGISTRY}${REPOSITORY}/${TEST_IMAGE_NAME}:${TAG}

README.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ AIKit offers three main capabilities:
2929
- 🦙 Support for GGUF ([`llama`](https://github.com/ggerganov/llama.cpp)) and GGML ([`llama-ggml`](https://github.com/ggerganov/llama.cpp)) models
3030
- 🚢 [Kubernetes deployment ready](https://kaito-project.github.io/aikit/docs/kubernetes)
3131
- 📚 Supports multiple models with a single image
32-
- 🖥️ Supports [AMD64 and ARM64](https://kaito-project.github.io/aikit/docs/create-images#multi-platform-support) CPUs and [GPU-accelerated inferencing with NVIDIA GPUs](https://kaito-project.github.io/aikit/docs/gpu)
32+
- 🖥️ Supports [AMD64 and ARM64](https://kaito-project.github.io/aikit/docs/create-images#multi-platform-support) CPUs and [GPU-accelerated inferencing with NVIDIA CUDA and AMD ROCm support](https://kaito-project.github.io/aikit/docs/gpu)
3333
- 🔐 Ensure [supply chain security](https://kaito-project.github.io/aikit/docs/security) with SBOMs, Provenance attestations, and signed images
3434
- 🌈 Supports air-gapped environments with self-hosted, local, or any remote container registries to store model images for inference on the edge.
3535

@@ -107,9 +107,9 @@ If it doesn't include a specific model, you can always [create your own images](
107107
### NVIDIA CUDA
108108

109109
> [!NOTE]
110-
> To enable GPU acceleration, please see [GPU Acceleration](https://kaito-project.github.io/aikit/docs/gpu).
110+
> To enable NVIDIA GPU acceleration, please see [GPU Acceleration](https://kaito-project.github.io/aikit/docs/gpu).
111111
>
112-
> Please note that only difference between CPU and GPU section is the `--gpus all` flag in the command to enable GPU acceleration.
112+
> Published pre-made GPU images include NVIDIA CUDA libraries. For the NVIDIA CUDA commands below, the only difference from the CPU section is the `--gpus all` flag.
113113
114114
| Model | Optimization | Parameters | Command | Model Name | License |
115115
| --------------- | ------------- | ---------- | -------------------------------------------------------------------------------------- | ------------------------ | --------------------------------------------------------------------------------------------------------------------------- |
@@ -127,6 +127,14 @@ If it doesn't include a specific model, you can always [create your own images](
127127
| 🤖 GPT-OSS | | 120B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/kaito-project/aikit/gpt-oss:120b` | `gpt-oss-120b` | [Apache 2.0](https://choosealicense.com/licenses/apache-2.0/) |
128128

129129

130+
### AMD ROCm (experimental)
131+
132+
> [!NOTE]
133+
> AMD GPU acceleration is currently available for custom `llama-cpp` images built with `runtime: rocm`. Published pre-made model images are currently CUDA-based, so for AMD GPUs please [create your own image](https://kaito-project.github.io/aikit/docs/create-images) and follow the ROCm instructions in [GPU Acceleration](https://kaito-project.github.io/aikit/docs/gpu).
134+
>
135+
> ROCm support currently applies to the `llama-cpp` backend on `linux/amd64`.
136+
137+
130138
### Apple Silicon (experimental)
131139

132140
> [!NOTE]

charts/aikit/values.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ postInstall:
8181
enabled: true
8282
image:
8383
repository: registry.k8s.io/kubectl
84-
tag: v1.35.2
84+
tag: v1.35.4
8585
pullPolicy: IfNotPresent
8686
pullSecrets: []
8787
podSecurity: ["pod-security.kubernetes.io/audit=restricted",

pkg/aikit2llb/inference/backend.go

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ func getBackendTag(backend, runtime string, platform specs.Platform) string {
8181
baseTag := getBackendVersion(backend, runtime, platform)
8282
backendName := getEffectiveBackend(backend, runtime, platform)
8383

84-
// Handle Apple Silicon - use Vulkan llama-cpp
84+
// Handle Apple Silicon - use Vulkan llama-cpp.
8585
if runtime == utils.RuntimeAppleSilicon {
8686
return fmt.Sprintf("%s-%s", baseTag, vulkanLlamaCppBackend)
8787
}
@@ -101,6 +101,12 @@ func getBackendTag(backend, runtime string, platform specs.Platform) string {
101101
}
102102
}
103103

104+
// Handle ROCm runtime.
105+
if runtime == utils.RuntimeROCm && platform.Architecture == utils.PlatformAMD64 {
106+
return fmt.Sprintf("%s-gpu-rocm-hipblas-llama-cpp", localAIROCmBackendVersion)
107+
}
108+
109+
// Handle CPU runtime (default).
104110
return fmt.Sprintf("%s-cpu-llama-cpp", baseTag)
105111
}
106112

@@ -131,6 +137,12 @@ func getBackendName(backend, runtime string, platform specs.Platform) string {
131137
}
132138
}
133139

140+
// Handle ROCm runtime
141+
if runtime == utils.RuntimeROCm && platform.Architecture == utils.PlatformAMD64 {
142+
// Only llama-cpp backend is supported for ROCm
143+
return "hipblas-llama-cpp"
144+
}
145+
134146
// Handle CPU runtime (default)
135147
return cpuLlamaCppBackend
136148
}
@@ -220,6 +232,14 @@ func installBackends(c *config.InferenceConfig, platform specs.Platform, s llb.S
220232
cpuConfig.Runtime = "cpu" // Use CPU runtime to force CPU backend installation
221233
merge = installBackend(backend, &cpuConfig, platform, s, merge)
222234
}
235+
236+
// For llama-cpp backend with ROCm runtime, also install the CPU version for fallback
237+
if backend == utils.BackendLlamaCpp && c.Runtime == utils.RuntimeROCm && platform.Architecture == utils.PlatformAMD64 {
238+
// Create a modified config with CPU runtime to install the CPU version
239+
cpuConfig := *c
240+
cpuConfig.Runtime = "cpu" // Use CPU runtime to force CPU backend installation
241+
merge = installBackend(backend, &cpuConfig, platform, s, merge)
242+
}
223243
}
224244

225245
return merge

pkg/aikit2llb/inference/backend_test.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,15 @@ func TestGetBackendTag(t *testing.T) {
9898
},
9999
want: fmt.Sprintf("%s-gpu-nvidia-cuda-12-llama-cpp", localAILlamaCppBackendVersion),
100100
},
101+
{
102+
name: "ROCm llama-cpp",
103+
backend: utils.BackendLlamaCpp,
104+
runtime: utils.RuntimeROCm,
105+
platform: specs.Platform{
106+
Architecture: utils.PlatformAMD64,
107+
},
108+
want: fmt.Sprintf("%s-gpu-rocm-hipblas-llama-cpp", localAIROCmBackendVersion),
109+
},
101110
{
102111
name: "Empty backend name defaults to CPU llama-cpp",
103112
backend: "",

pkg/aikit2llb/inference/convert.go

Lines changed: 48 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,22 @@ const (
1717
localAIBinaryVersion = "v4.0.0"
1818
localAILlamaCppBackendVersion = localAIBinaryVersion
1919
localAILegacyBackendVersion = "v3.12.1"
20+
localAIROCmBackendVersion = "rocm7"
2021
localAIRepo = "ghcr.io/kaito-project/aikit/localai:"
2122
cudaVersion = "12-5"
23+
rocmVersion = "7.2"
2224
)
2325

2426
// Aikit2LLB converts an InferenceConfig to an LLB state.
2527
func Aikit2LLB(c *config.InferenceConfig, platform *specs.Platform) (llb.State, *specs.Image, error) {
2628
var merge, state llb.State
27-
if c.Runtime == utils.RuntimeAppleSilicon {
29+
switch c.Runtime {
30+
case utils.RuntimeAppleSilicon:
2831
state = llb.Image(utils.AppleSiliconBase, llb.Platform(*platform))
29-
} else {
32+
case utils.RuntimeROCm:
33+
// Use Ubuntu 24.04 for ROCm to match noble repository
34+
state = llb.Image(utils.Ubuntu24Base, llb.Platform(*platform))
35+
default:
3036
state = llb.Image(utils.UbuntuBase, llb.Platform(*platform))
3137
}
3238
base := getBaseImage(c, platform)
@@ -55,6 +61,11 @@ func Aikit2LLB(c *config.InferenceConfig, platform *specs.Platform) (llb.State,
5561
state, merge = installCuda(c, state, merge)
5662
}
5763

64+
// install rocm if runtime is rocm and architecture is amd64
65+
if c.Runtime == utils.RuntimeROCm && platform.Architecture == utils.PlatformAMD64 {
66+
state, merge = installRocm(c, state, merge)
67+
}
68+
5869
// install backend dependencies
5970
merge = installBackends(c, *platform, state, merge)
6071

@@ -67,6 +78,10 @@ func getBaseImage(c *config.InferenceConfig, platform *specs.Platform) llb.State
6778
if c.Runtime == utils.RuntimeAppleSilicon {
6879
return llb.Image(utils.AppleSiliconBase, llb.Platform(*platform))
6980
}
81+
if c.Runtime == utils.RuntimeROCm {
82+
// Use Ubuntu 24.04 for ROCm to match noble repository.
83+
return llb.Image(utils.Ubuntu24Base, llb.Platform(*platform))
84+
}
7085
if len(c.Backends) > 0 {
7186
return llb.Image(utils.UbuntuBase, llb.Platform(*platform))
7287
}
@@ -155,6 +170,37 @@ func installCuda(c *config.InferenceConfig, s llb.State, merge llb.State) (llb.S
155170
return s, llb.Merge([]llb.State{merge, diff})
156171
}
157172

173+
func installRocm(c *config.InferenceConfig, s llb.State, merge llb.State) (llb.State, llb.State) {
174+
savedState := s
175+
176+
// Set up ROCm repository
177+
s = s.Run(utils.Sh("apt-get update && apt-get install --no-install-recommends -y ca-certificates curl gnupg"), llb.IgnoreCache).Root()
178+
179+
// Add ROCm GPG key and repository
180+
s = s.Run(utils.Sh("curl -fsSL https://repo.radeon.com/rocm/rocm.gpg.key | gpg --dearmor -o /etc/apt/trusted.gpg.d/rocm.gpg")).Root()
181+
s = s.Run(utils.Shf("echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm.gpg] https://repo.radeon.com/rocm/apt/%s/ noble main' >> /etc/apt/sources.list.d/rocm.list", rocmVersion)).Root()
182+
s = s.Run(utils.Shf("echo 'deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm.gpg] https://repo.radeon.com/graphics/%s/ubuntu noble main' >> /etc/apt/sources.list.d/rocm.list", rocmVersion)).Root()
183+
rocmPinning := `
184+
Package: *
185+
Pin: release o=repo.radeon.com
186+
Pin-Priority: 600
187+
`
188+
s = s.Run(utils.Shf("echo '%s' > /etc/apt/preferences.d/repo-radeon-pin-600", rocmPinning)).Root()
189+
s = s.Run(utils.Sh("apt-get update"), llb.IgnoreCache).Root()
190+
191+
// install rocm libraries and pciutils for gpu detection when using the default
192+
// llama-cpp backend or when it is configured explicitly
193+
if len(c.Backends) == 0 || slices.Contains(c.Backends, utils.BackendLlamaCpp) {
194+
s = s.Run(utils.Sh("apt-get install -y pciutils rocm && apt-get clean")).Root()
195+
}
196+
197+
// hipblaslt soname compatibility: backend may be linked against .so.0 while ROCm 7.2 ships .so.1
198+
s = s.Run(utils.Sh("set -e; cd /opt/rocm/lib; [ -e libhipblaslt.so.0 ] || ln -sf libhipblaslt.so.1 libhipblaslt.so.0")).Root()
199+
200+
diff := llb.Diff(savedState, s)
201+
return s, llb.Merge([]llb.State{merge, diff})
202+
}
203+
158204
// addLocalAI adds the LocalAI binary to the image.
159205
func addLocalAI(c *config.InferenceConfig, s llb.State, merge llb.State, platform specs.Platform) (llb.State, llb.State, error) {
160206
artifactVersion := getLocalAIArtifactVersion(c, platform)

0 commit comments

Comments
 (0)