Skip to content

Commit c3994a2

Browse files
authored
switch granite-7b for 3.1-8b-instruct (#214)
* switch granite-7b for 3.1-8b-instruct adjust max-length * add RedHatAI for HF org
1 parent 8f21778 commit c3994a2

19 files changed

+42
-75
lines changed

.github/.wordlist.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,7 @@ README
209209
readonly
210210
recog
211211
redhat
212+
RedHatAI
212213
repo
213214
repoURL
214215
RespectIgnoreDifferences

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Useful link: [https://redhat-scholars.github.io/build-course/rhs-build-course/de
4848

4949
- Python 3.11
5050
- Nodejs > 18
51-
- An existing instance of an LLM served through an OpenAI compatible API at `INFERENCE_SERVER_URL`. This application is based on Granite-7b-Instruct Prompt format. You will need to modify this format if you are using a different model.
51+
- An existing instance of an LLM served through an OpenAI compatible API at `INFERENCE_SERVER_URL`. This application is based on Granite-3.1-8B-Instruct Prompt format. You will need to modify this format if you are using a different model.
5252

5353
### Installation
5454

bootstrap/granite-modelcar-image/Containerfile

Lines changed: 0 additions & 23 deletions
This file was deleted.

bootstrap/granite-modelcar-image/README.md

Lines changed: 0 additions & 15 deletions
This file was deleted.

bootstrap/ic-shared-app/deployment-app.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,9 @@ spec:
2929
fieldRef:
3030
fieldPath: metadata.namespace
3131
- name: INFERENCE_SERVER_URL
32-
value: http://granite-7b-instruct-predictor.ic-shared-llm.svc.cluster.local:8080/v1
32+
value: http://granite-3-1-8b-instruct-predictor.ic-shared-llm.svc.cluster.local:8080/v1
3333
- name: MODEL_NAME
34-
value: 'granite-7b-instruct'
34+
value: 'granite-3-1-8b-instruct'
3535
- name: MAX_TOKENS
3636
value: '512'
3737
- name: TOP_P

bootstrap/ic-shared-llm/inference-service-granite-modelcar.yaml

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@ apiVersion: serving.kserve.io/v1beta1
22
kind: InferenceService
33
metadata:
44
annotations:
5-
openshift.io/display-name: granite-7b-instruct
5+
openshift.io/display-name: granite-3-1-8b-instruct
66
serving.knative.openshift.io/enablePassthrough: 'true'
77
sidecar.istio.io/inject: 'true'
88
sidecar.istio.io/rewriteAppHTTPProbers: 'true'
99
argocd.argoproj.io/sync-wave: "2"
1010
serving.kserve.io/deploymentMode: RawDeployment
1111
argocd.argoproj.io/compare-options: IgnoreExtraneous
1212
argocd.argoproj.io/sync-options: Prune=false
13-
name: granite-7b-instruct
13+
name: granite-3-1-8b-instruct
1414
namespace: ic-shared-llm
1515
labels:
1616
opendatahub.io/dashboard: 'true'
@@ -19,6 +19,15 @@ spec:
1919
maxReplicas: 1
2020
minReplicas: 1
2121
model:
22+
args:
23+
- '--port=8080'
24+
- '--model=/mnt/models'
25+
- '--served-model-name=granite-3-1-8b-instruct'
26+
- '--max-model-len=15000'
27+
- '--dtype=half'
28+
- '--enable-auto-tool-choice'
29+
- '--tool-call-parser'
30+
- granite
2231
modelFormat:
2332
name: vLLM
2433
name: ''
@@ -32,7 +41,7 @@ spec:
3241
memory: 8Gi
3342
nvidia.com/gpu: '1'
3443
runtime: vllm
35-
storageUri: oci://quay.io/rh-aiservices-bu/granite-7b-instruct-modelcar:0.2
44+
storageUri: oci://registry.redhat.io/rhelai1/modelcar-granite-3-1-8b-instruct:1.5
3645
tolerations:
3746
- effect: NoSchedule
3847
key: nvidia.com/gpu

bootstrap/ic-shared-llm/serving-runtime-vllm-granite-modelcar.yaml

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,19 +19,14 @@ spec:
1919
prometheus.io/path: /metrics
2020
prometheus.io/port: '8080'
2121
containers:
22-
- args:
23-
- '--port=8080'
24-
- '--model=/mnt/models'
25-
- '--served-model-name={{.Name}}'
26-
- '--distributed-executor-backend=mp'
27-
command:
22+
- command:
2823
- python
2924
- '-m'
3025
- vllm.entrypoints.openai.api_server
3126
env:
3227
- name: HF_HOME
3328
value: /tmp/hf_home
34-
image: 'quay.io/modh/vllm@sha256:b51fde66f162f1a78e8c027320dddf214732d5345953b1599a84fe0f0168c619'
29+
image: 'quay.io/modh/vllm:rhoai-2.19-cuda'
3530
name: kserve-container
3631
ports:
3732
- containerPort: 8080

content/modules/ROOT/pages/03-04-comparing-model-servers.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
= Comparing two LLMs
22
include::_attributes.adoc[]
33

4-
So far, for this {ic-lab}, we have used the model https://huggingface.co/ibm-granite/granite-7b-instruct[Granite 7B Instruct,window=_blank]. Although lighter than other models, it is still quite heavy and we need a large GPU to run it. Would we get as good results with a smaller model running on a CPU only? Let's try!
4+
So far, for this {ic-lab}, we have used the model https://huggingface.co/RedHatAI/granite-3.1-8b-instruct[Granite 3.1 8B Instruct,window=_blank]. Although lighter than other models, it is still quite heavy and we need a large GPU to run it. Would we get as good results with a smaller model running on a CPU only? Let's try!
55
66
In this exercise, we'll pitch our previous model against a much smaller LLM called https://huggingface.co/google/flan-t5-large[flan-t5-large,window=_blank]. We'll compare the results and see if the smaller model is good enough for our use case.
77

content/modules/ROOT/pages/06-01-potential-imp-ref.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ If you want to read what **we** thought could be improved, read below! (response
3939
** Mismatch in license plate, if visible in the picture.
4040
* We've only scratched the surface with gitops and Data Science pipelines here
4141
** There was no performance testing done. If too many users connect at the same time, it might overwhelm either the app, the database, the LLM, etc...
42-
* Currently, most simple changes would probably end up breaking the application. And the person who, for example decides to change Granite-7B for Flan-T5-Large would not necessarily realize that.
42+
* Currently, most simple changes would probably end up breaking the application. And the person who, for example decides to change Granite-3.1-8B-Instruct for Flan-T5-Large would not necessarily realize that.
4343
** It would be critical to have multiple instances (Dev/Test/UAT/Prod) of the application.
4444
** It would also be required to have integration pipelines run in these environments to confirm that changes made do not break the overall application.
4545
* We could ask the LLM to start writing a response to the customer.

lab-materials/02/02-05-validating.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@
6969
" services_to_check = [\n",
7070
" (\"minio.ic-shared-minio.svc.cluster.local\", 9000, \"Minio\"),\n",
7171
" (\"claimdb.ic-shared-db.svc.cluster.local\", 5432, \"Postgres Database\"),\n",
72-
" (\"granite-7b-instruct-predictor.ic-shared-llm.svc.cluster.local\", 8080, \"LLM Service\"),\n",
72+
" (\"granite-3-1-8b-instruct-predictor.ic-shared-llm.svc.cluster.local\", 8080, \"LLM Service\"),\n",
7373
" (\"llm-flant5.ic-shared-llm.svc.cluster.local\", 3000, \"LLM Service-FlanT5\"),\n",
7474
" (\"modelmesh-serving.ic-shared-img-det.svc.cluster.local\", 8033, \"ModelMesh\"),\n",
7575
" (\"vectordb-milvus.ic-shared-milvus.svc.cluster.local\", 19530, \"Milvus Vector DB\"),\n",

0 commit comments

Comments
 (0)