Skip to content

Commit 7127338

Browse files
committed
docs: revert changes to lora example
1 parent 21035cd commit 7127338

1 file changed

Lines changed: 78 additions & 78 deletions

File tree

examples/fine-tuning/lora/lora_sft-distributed.ipynb

Lines changed: 78 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
"cells": [
33
{
44
"cell_type": "markdown",
5+
"id": "bb4d8595-c34f-4ab0-a0d7-f41d6aec95cd",
56
"metadata": {},
67
"source": [
78
"## LoRA/QLoRA Fine-Tuning with Kubeflow Trainer and Training Hub on OpenShift AI\n",
@@ -32,22 +33,24 @@
3233
"```sql\n",
3334
"SELECT AVG(salary) FROM employees WHERE department = 'engineering'\n",
3435
"```"
35-
],
36-
"id": "bb4d8595-c34f-4ab0-a0d7-f41d6aec95cd"
36+
]
3737
},
3838
{
3939
"cell_type": "markdown",
40+
"id": "ea83ac4b-06f1-4c5d-b9fa-372cd3dd5ad2",
4041
"metadata": {},
4142
"source": [
4243
"## Setup\n",
4344
"\n",
4445
"First, import the required dependencies."
45-
],
46-
"id": "ea83ac4b-06f1-4c5d-b9fa-372cd3dd5ad2"
46+
]
4747
},
4848
{
4949
"cell_type": "code",
50+
"execution_count": null,
51+
"id": "bccf4f5f-244b-4283-a9a1-765f1ff5a89c",
5052
"metadata": {},
53+
"outputs": [],
5154
"source": [
5255
"# Standard library imports\n",
5356
"import json\n",
@@ -57,14 +60,14 @@
5760
"\n",
5861
"from datasets import load_dataset\n",
5962
"from kubernetes import client as k8s"
60-
],
61-
"execution_count": null,
62-
"outputs": [],
63-
"id": "bccf4f5f-244b-4283-a9a1-765f1ff5a89c"
63+
]
6464
},
6565
{
6666
"cell_type": "code",
67+
"execution_count": null,
68+
"id": "79ee4db1-b144-4c93-92f9-37a195033649",
6769
"metadata": {},
70+
"outputs": [],
6871
"source": [
6972
"# Configure logging to show only essential information\n",
7073
"logging.basicConfig(\n",
@@ -79,22 +82,22 @@
7982
"logging.getLogger(\"torch\").setLevel(logging.WARNING)\n",
8083
"\n",
8184
"print(\"✅ Logging configured for notebook environment\")"
82-
],
83-
"execution_count": null,
84-
"outputs": [],
85-
"id": "79ee4db1-b144-4c93-92f9-37a195033649"
85+
]
8686
},
8787
{
8888
"cell_type": "markdown",
89+
"id": "fdef7bfb-c5b8-49bc-ae03-b1d19bdd6541",
8990
"metadata": {},
9091
"source": [
9192
"## Authenticate to your OpenShift Cluster"
92-
],
93-
"id": "fdef7bfb-c5b8-49bc-ae03-b1d19bdd6541"
93+
]
9494
},
9595
{
9696
"cell_type": "code",
97+
"execution_count": null,
98+
"id": "88ce03ef-189e-409d-a8e0-8dd486de83e9",
9799
"metadata": {},
100+
"outputs": [],
98101
"source": [
99102
"api_server = \"<REPLACE WITH OPENSHIFT SERVER>\"\n",
100103
"token = \"<REPLACE WITH OPENSHIFT TOKEN>\"\n",
@@ -106,13 +109,11 @@
106109
"# configuration.verify_ssl = False\n",
107110
"configuration.api_key = {\"authorization\": f\"Bearer {token}\"}\n",
108111
"api_client = k8s.ApiClient(configuration)"
109-
],
110-
"execution_count": null,
111-
"outputs": [],
112-
"id": "88ce03ef-189e-409d-a8e0-8dd486de83e9"
112+
]
113113
},
114114
{
115115
"cell_type": "markdown",
116+
"id": "5219c28e-5d66-4fdb-b746-a7616336a50e",
116117
"metadata": {},
117118
"source": [
118119
"## 1. Load and Explore the Dataset\n",
@@ -122,23 +123,25 @@
122123
" Natural language questions\n",
123124
" Database schema context (CREATE TABLE statements)\n",
124125
" Corresponding SQL queries"
125-
],
126-
"id": "5219c28e-5d66-4fdb-b746-a7616336a50e"
126+
]
127127
},
128128
{
129129
"cell_type": "code",
130+
"execution_count": null,
131+
"id": "1d953abf-5777-4a2a-a6e2-541a2a405202",
130132
"metadata": {},
133+
"outputs": [],
131134
"source": [
132135
"# Load the dataset\n",
133136
"dataset = load_dataset(\"b-mc2/sql-create-context\", split=\"train\")"
134-
],
135-
"execution_count": null,
136-
"outputs": [],
137-
"id": "1d953abf-5777-4a2a-a6e2-541a2a405202"
137+
]
138138
},
139139
{
140140
"cell_type": "code",
141+
"execution_count": null,
142+
"id": "1ad43a62-ae35-4f07-8380-ec7998d8b377",
141143
"metadata": {},
144+
"outputs": [],
142145
"source": [
143146
"# Converting the format of the intial messages.\n",
144147
"def convert_to_messages(example):\n",
@@ -168,24 +171,24 @@
168171
"sample_converted = convert_to_messages(dataset[0])\n",
169172
"print(\"Converted format:\")\n",
170173
"print(json.dumps(sample_converted, indent=2))"
171-
],
172-
"execution_count": null,
173-
"outputs": [],
174-
"id": "1ad43a62-ae35-4f07-8380-ec7998d8b377"
174+
]
175175
},
176176
{
177177
"cell_type": "markdown",
178+
"id": "5940a0b6-4ab7-413e-96da-4c5d48acad53",
178179
"metadata": {},
179180
"source": [
180181
"## 2. Prepare Training Data\n",
181182
"\n",
182183
"Training Hub expects data in the chat template format with a messages field containing the conversation. We'll convert each example into a user message (question + context) and an assistant message (SQL query)."
183-
],
184-
"id": "5940a0b6-4ab7-413e-96da-4c5d48acad53"
184+
]
185185
},
186186
{
187187
"cell_type": "code",
188+
"execution_count": null,
189+
"id": "a2ac0f3e-d897-43ba-9f3e-14369ba1bef7",
188190
"metadata": {},
191+
"outputs": [],
189192
"source": [
190193
"# Training Dataset Preparation.\n",
191194
"print(f\"Dataset size: {len(dataset)} examples\")\n",
@@ -227,15 +230,13 @@
227230
"print(f\"Training data saved to: {training_file}\")\n",
228231
"print(f\"File size: {training_file.stat().st_size / 1024:.1f} KB\")\n",
229232
"\n",
230-
"data_path = f\"/opt/app-root/src/{PVC_PATH}/lora_text_sql_output/train_data.jsonl\"\n",
233+
"data_path = f\"{PVC_PATH}/lora_text_sql_output/train_data.jsonl\"\n",
231234
"print(data_path)"
232-
],
233-
"execution_count": null,
234-
"outputs": [],
235-
"id": "a2ac0f3e-d897-43ba-9f3e-14369ba1bef7"
235+
]
236236
},
237237
{
238238
"cell_type": "markdown",
239+
"id": "9de91052-984d-4b2b-b60e-8c6d3086b94c",
239240
"metadata": {},
240241
"source": [
241242
"## 3. Configure and Run LoRA Training\n",
@@ -251,12 +252,14 @@
251252
"\n",
252253
" load_in_4bit: Enable 4-bit quantization to reduce memory\n",
253254
" bnb_4bit_quant_type: Quantization type ('nf4' recommended)\n"
254-
],
255-
"id": "9de91052-984d-4b2b-b60e-8c6d3086b94c"
255+
]
256256
},
257257
{
258258
"cell_type": "code",
259+
"execution_count": null,
260+
"id": "8016ae6f-a01a-4714-a72b-9080acad89c4",
259261
"metadata": {},
262+
"outputs": [],
260263
"source": [
261264
"# Training configuration\n",
262265
"MODEL_NAME = \"Qwen/Qwen2.5-1.5B-Instruct\"\n",
@@ -332,53 +335,53 @@
332335
" \"checkpoint_at_epoch\": 2,\n",
333336
"}\n",
334337
"params"
335-
],
336-
"execution_count": null,
337-
"outputs": [],
338-
"id": "8016ae6f-a01a-4714-a72b-9080acad89c4"
338+
]
339339
},
340340
{
341341
"cell_type": "markdown",
342+
"id": "2302cfd9-42f2-4b94-b7c5-dbb4e4ac559b",
342343
"metadata": {},
343344
"source": [
344345
"## Training with LORA SFT and Kubeflow Trainer\n",
345346
"Launch a training job via Kubeflow Trainer with configured hyperparameters."
346-
],
347-
"id": "2302cfd9-42f2-4b94-b7c5-dbb4e4ac559b"
347+
]
348348
},
349349
{
350350
"cell_type": "code",
351+
"execution_count": null,
352+
"id": "3eace042-f662-4b36-9fb0-c6f543c4efb3",
351353
"metadata": {},
354+
"outputs": [],
352355
"source": [
353356
"from kubeflow.common.types import KubernetesBackendConfig\n",
354357
"from kubeflow.trainer import TrainerClient\n",
355358
"from kubeflow.trainer.rhai import TrainingHubAlgorithms, TrainingHubTrainer\n",
356359
"\n",
357360
"backend_cfg = KubernetesBackendConfig(client_configuration=api_client.configuration)\n",
358361
"client = TrainerClient(backend_cfg)"
359-
],
360-
"execution_count": null,
361-
"outputs": [],
362-
"id": "3eace042-f662-4b36-9fb0-c6f543c4efb3"
362+
]
363363
},
364364
{
365365
"cell_type": "code",
366+
"execution_count": null,
367+
"id": "bb066eae-39ce-4c20-91e0-86aa7afde30a",
366368
"metadata": {},
369+
"outputs": [],
367370
"source": [
368371
"for runtime in client.list_runtimes():\n",
369372
" if runtime.name == \"training-hub\":\n",
370373
" th_runtime = runtime\n",
371374
" print(\"Found runtime: \" + str(th_runtime))"
372-
],
373-
"execution_count": null,
374-
"outputs": [],
375-
"id": "bb066eae-39ce-4c20-91e0-86aa7afde30a"
375+
]
376376
},
377377
{
378378
"cell_type": "code",
379+
"execution_count": null,
380+
"id": "76ec647f-0c8b-48bb-9acc-69bda8315c1e",
379381
"metadata": {
380382
"scrolled": true
381383
},
384+
"outputs": [],
382385
"source": [
383386
"from kubeflow.trainer.options.kubernetes import (\n",
384387
" ContainerOverride,\n",
@@ -438,35 +441,35 @@
438441
")\n",
439442
"\n",
440443
"print(job_name)"
441-
],
442-
"execution_count": null,
443-
"outputs": [],
444-
"id": "76ec647f-0c8b-48bb-9acc-69bda8315c1e"
444+
]
445445
},
446446
{
447447
"cell_type": "code",
448+
"execution_count": null,
449+
"id": "f5dbda75-8ed3-4872-b4c3-e7d495e7d20b",
448450
"metadata": {},
451+
"outputs": [],
449452
"source": [
450453
"# Follow job logs\n",
451454
"logs = client.get_job_logs(job_name, follow=True)\n",
452455
"for line in logs:\n",
453456
" print(line)"
454-
],
455-
"execution_count": null,
456-
"outputs": [],
457-
"id": "f5dbda75-8ed3-4872-b4c3-e7d495e7d20b"
457+
]
458458
},
459459
{
460460
"cell_type": "markdown",
461+
"id": "3c663d1d-d87c-43d0-95e9-530307252fab",
461462
"metadata": {},
462463
"source": [
463464
"## Loading the Model from the Desired Checkpoints."
464-
],
465-
"id": "3c663d1d-d87c-43d0-95e9-530307252fab"
465+
]
466466
},
467467
{
468468
"cell_type": "code",
469+
"execution_count": null,
470+
"id": "7e73e162-3046-45a3-86d7-c7464c07d6ed",
469471
"metadata": {},
472+
"outputs": [],
470473
"source": [
471474
"import glob\n",
472475
"import os\n",
@@ -509,14 +512,14 @@
509512
" model.eval()\n",
510513
" tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n",
511514
" print(\"Loaded model with HuggingFace/PEFT (CPU compatible)\")"
512-
],
513-
"execution_count": null,
514-
"outputs": [],
515-
"id": "7e73e162-3046-45a3-86d7-c7464c07d6ed"
515+
]
516516
},
517517
{
518518
"cell_type": "code",
519+
"execution_count": null,
520+
"id": "aa4c95b7-75e1-444c-8245-58ef4f043507",
519521
"metadata": {},
522+
"outputs": [],
520523
"source": [
521524
"def generate_sql(question: str, schema: str, max_tokens: int = 256) -> str:\n",
522525
" \"\"\"\n",
@@ -564,24 +567,24 @@
564567
" )\n",
565568
"\n",
566569
" return response.strip()"
567-
],
568-
"execution_count": null,
569-
"outputs": [],
570-
"id": "aa4c95b7-75e1-444c-8245-58ef4f043507"
570+
]
571571
},
572572
{
573573
"cell_type": "markdown",
574+
"id": "3a44ae5a-521b-4335-bc29-e1cb915ee8fa",
574575
"metadata": {},
575576
"source": [
576577
"## Test the Trained Model\n",
577578
"\n",
578579
"Let's load the trained model and test it on some SQL generation examples."
579-
],
580-
"id": "3a44ae5a-521b-4335-bc29-e1cb915ee8fa"
580+
]
581581
},
582582
{
583583
"cell_type": "code",
584+
"execution_count": null,
585+
"id": "4f60325a-4216-4bb6-a095-17923831b28c",
584586
"metadata": {},
587+
"outputs": [],
585588
"source": [
586589
"# Test with examples from the dataset\n",
587590
"test_examples = [\n",
@@ -610,20 +613,17 @@
610613
" sql = generate_sql(example[\"question\"], example[\"schema\"])\n",
611614
" print(f\"Generated SQL: {sql}\")\n",
612615
" print(\"-\" * 60)"
613-
],
614-
"execution_count": null,
615-
"outputs": [],
616-
"id": "4f60325a-4216-4bb6-a095-17923831b28c"
616+
]
617617
},
618618
{
619619
"cell_type": "markdown",
620+
"id": "6bd68750-7e65-4a7b-ad11-80658ed68ba7",
620621
"metadata": {},
621622
"source": [
622623
"## Final Analysis and Summary\n",
623624
"In this notebook, we demonstrated how LORA/QLORA can be used fine tuning Qwen 2.5 1.5B Instruct model, \n",
624625
"we were able to fine tune the model to understand natural languages to sql queries generation."
625-
],
626-
"id": "6bd68750-7e65-4a7b-ad11-80658ed68ba7"
626+
]
627627
}
628628
],
629629
"metadata": {
@@ -647,4 +647,4 @@
647647
},
648648
"nbformat": 4,
649649
"nbformat_minor": 5
650-
}
650+
}

0 commit comments

Comments
 (0)