Skip to content

Commit d8622cc

Browse files
Merge pull request #256 from kerthcet/cleanup/field-update
Release v0.1.0
2 parents 82833b7 + b0c7456 commit d8622cc

17 files changed

+2413
-457
lines changed

.github/ISSUE_TEMPLATE/new-release.md

+15-13
Original file line numberDiff line numberDiff line change
@@ -2,34 +2,36 @@
22
name: New Release
33
about: Propose a new release
44
title: Release v0.x.0
5-
labels: ''
6-
assignees: ''
7-
5+
labels: ""
6+
assignees: ""
87
---
98

109
## Release Checklist
10+
1111
<!--
1212
Please do not remove items from the checklist
1313
-->
14+
1415
- [ ] All [OWNERS](https://github.com/inftyai/llmaz/blob/main/OWNERS) must LGTM the release proposal
1516
- [ ] Prepare the image and files
16-
- [ ] Run `PLATFORMS=linux/amd64 make image-push GIT_TAG=$VERSION` to build and push an image.
17+
- [ ] Run `PLATFORMS=linux/amd64 make image-push GIT_TAG=$VERSION` to build and push an image.
1718
- [ ] Run `make artifacts GIT_TAG=$VERSION` to generate the artifact.
18-
- [ ] Update `chart/Chart.yaml` and `docs/installation.md`, the helm version is different with the app version.
19+
- [ ] Update helm chats and documents
20+
- [ ] Update `chart/Chart.yaml` and `docs/installation.md`, the helm version is different with the app version.
1921
- [ ] Run `make helm-package` to package the helm chart and update the index.yaml.
2022
- [ ] Submit a PR and merge it.
2123
- [ ] An OWNER [prepares a draft release](https://github.com/inftyai/llmaz/releases)
2224
- [ ] Create a new tag
2325
- [ ] Write the change log into the draft release which should include below items if any:
24-
```
25-
🚀 **Major Features**:
26-
✨ **Features**:
27-
🐛 **Bugs**:
28-
♻️ **Cleanups**:
29-
```
26+
```
27+
🚀 **Major Features**:
28+
✨ **Features**:
29+
🐛 **Bugs**:
30+
♻️ **Cleanups**:
31+
```
3032
- [ ] Upload the files to the draft release.
31-
- [ ] `manifests.yaml` under artifacts
32-
- [ ] new generated helm chart `*.zip` file
33+
- [ ] `manifests.yaml` under artifacts
34+
- [ ] new generated helm chart `*.zip` file
3335
- [ ] Publish the draft release prepared at the [Github releases page](https://github.com/inftyai/llmaz/releases)
3436
- [ ] Publish the helm chart
3537
- [ ] Run `git checkout gh-pages`

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@ HELMIFY ?= $(LOCALBIN)/helmify
290290
.PHONY: helmify
291291
helmify: $(HELMIFY) ## Download helmify locally if necessary.
292292
$(HELMIFY): $(LOCALBIN)
293-
test -s $(LOCALBIN)/helmify || GOBIN=$(LOCALBIN) go install github.com/arttor/helmify/cmd/helmify@latest
293+
test -s $(LOCALBIN)/helmify || GOBIN=$(LOCALBIN) go install github.com/arttor/helmify/cmd/helmify@v0.4.17
294294

295295
.PHONY: helm
296296
helm: manifests kustomize helmify

chart/Chart.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ type: application
1313
# This is the chart version. This version number should be incremented each time you make changes
1414
# to the chart and its templates, including the app version.
1515
# Versions are expected to follow Semantic Versioning (https://semver.org/)
16-
version: 0.0.5
16+
version: 0.0.6
1717
# This is the version number of the application being deployed. This version number should be
1818
# incremented each time you make changes to the application. Versions are not expected to
1919
# follow Semantic Versioning. They should reflect the version the application is using.
2020
# It is recommended to use it with quotes.
21-
appVersion: 0.0.9
21+
appVersion: 0.1.0

chart/crds/backendruntime-crd.yaml

+1,095-6
Large diffs are not rendered by default.

chart/crds/openmodel-crd.yaml

+62-54
Original file line numberDiff line numberDiff line change
@@ -55,60 +55,66 @@ spec:
5555
FamilyName represents the model type, like llama2, which will be auto injected
5656
to the labels with the key of `llmaz.io/model-family-name`.
5757
type: string
58-
inferenceFlavors:
59-
description: |-
60-
InferenceFlavors represents the accelerator requirements to serve the model.
61-
Flavors are fungible following the priority represented by the slice order.
62-
items:
63-
description: |-
64-
Flavor defines the accelerator requirements for a model and the necessary parameters
65-
in autoscaling. Right now, it will be used in two places:
66-
- Pod scheduling with node selectors specified.
67-
- Cluster autoscaling with essential parameters provided.
68-
properties:
69-
name:
70-
description: Name represents the flavor name, which will be
71-
used in model claim.
72-
type: string
73-
nodeSelector:
74-
additionalProperties:
75-
type: string
76-
description: |-
77-
NodeSelector represents the node candidates for Pod placements, if a node doesn't
78-
meet the nodeSelector, it will be filtered out in the resourceFungibility scheduler plugin.
79-
If nodeSelector is empty, it means every node is a candidate.
80-
type: object
81-
params:
82-
additionalProperties:
83-
type: string
84-
description: |-
85-
Params stores other useful parameters and will be consumed by the autoscaling components
86-
like cluster-autoscaler, Karpenter.
87-
E.g. when scaling up nodes with 8x Nvidia A00, the parameter can be injected with
88-
instance-type: p4d.24xlarge for AWS.
89-
type: object
90-
requests:
91-
additionalProperties:
92-
anyOf:
93-
- type: integer
94-
- type: string
95-
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
96-
x-kubernetes-int-or-string: true
58+
inferenceConfig:
59+
description: InferenceConfig represents the inference configurations
60+
for the model.
61+
properties:
62+
flavors:
63+
description: |-
64+
Flavors represents the accelerator requirements to serve the model.
65+
Flavors are fungible following the priority represented by the slice order.
66+
items:
9767
description: |-
98-
Requests defines the required accelerators to serve the model for each replica,
99-
like <nvidia.com/gpu: 8>. For multi-hosts cases, the requests here indicates
100-
the resource requirements for each replica. This may change in the future.
101-
Not recommended to set the cpu and memory usage here:
102-
- if using playground, you can define the cpu/mem usage at backendConfig.
103-
- if using inference service, you can define the cpu/mem at the container resources.
104-
However, if you define the same accelerator requests at playground/service as well,
105-
the requests here will be covered.
68+
Flavor defines the accelerator requirements for a model and the necessary parameters
69+
in autoscaling. Right now, it will be used in two places:
70+
- Pod scheduling with node selectors specified.
71+
- Cluster autoscaling with essential parameters provided.
72+
properties:
73+
name:
74+
description: Name represents the flavor name, which will
75+
be used in model claim.
76+
type: string
77+
nodeSelector:
78+
additionalProperties:
79+
type: string
80+
description: |-
81+
NodeSelector represents the node candidates for Pod placements, if a node doesn't
82+
meet the nodeSelector, it will be filtered out in the resourceFungibility scheduler plugin.
83+
If nodeSelector is empty, it means every node is a candidate.
84+
type: object
85+
params:
86+
additionalProperties:
87+
type: string
88+
description: |-
89+
Params stores other useful parameters and will be consumed by cluster-autoscaler / Karpenter
90+
for autoscaling or be defined as model parallelism parameters like TP or PP size.
91+
E.g. with autoscaling, when scaling up nodes with 8x Nvidia A00, the parameter can be injected
92+
with <INSTANCE-TYPE: p4d.24xlarge> for AWS.
93+
Preset parameters: TP, PP, INSTANCE-TYPE.
94+
type: object
95+
requests:
96+
additionalProperties:
97+
anyOf:
98+
- type: integer
99+
- type: string
100+
pattern: ^(\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))(([KMGTPE]i)|[numkMGTPE]|([eE](\+|-)?(([0-9]+(\.[0-9]*)?)|(\.[0-9]+))))?$
101+
x-kubernetes-int-or-string: true
102+
description: |-
103+
Requests defines the required accelerators to serve the model for each replica,
104+
like <nvidia.com/gpu: 8>. For multi-hosts cases, the requests here indicates
105+
the resource requirements for each replica, usually equals to the TP size.
106+
Not recommended to set the cpu and memory usage here:
107+
- if using playground, you can define the cpu/mem usage at backendConfig.
108+
- if using inference service, you can define the cpu/mem at the container resources.
109+
However, if you define the same accelerator requests at playground/service as well,
110+
the requests will be overwritten by the flavor requests.
111+
type: object
112+
required:
113+
- name
106114
type: object
107-
required:
108-
- name
109-
type: object
110-
maxItems: 8
111-
type: array
115+
maxItems: 8
116+
type: array
117+
type: object
112118
source:
113119
description: |-
114120
Source represents the source of the model, there're several ways to load
@@ -158,8 +164,10 @@ spec:
158164
type: object
159165
uri:
160166
description: |-
161-
URI represents a various kinds of model sources following the uri protocol, e.g.:
162-
- OSS: oss://<bucket>.<endpoint>/<path-to-your-model>
167+
URI represents a various kinds of model sources following the uri protocol, protocol://<address>, e.g.
168+
- oss://<bucket>.<endpoint>/<path-to-your-model>
169+
- ollama://llama3.3
170+
- host://<path-to-your-model>
163171
type: string
164172
type: object
165173
required:

0 commit comments

Comments
 (0)