Skip to content

Commit 0abd70e

Browse files
committed
chore(trainer): add data and model initializers guide
Add docs/source/train/initializers.rst covering dataset and model initializers for the container backend (added in kubeflow#188, parallelised in kubeflow#313). Includes per-type code examples, combined usage, ContainerBackendConfig options, and debugging via get_job_logs(). Signed-off-by: Ayush Petwal <ayushpetwal.0105@gmail.com>
1 parent 07bd008 commit 0abd70e

File tree

4 files changed

+210
-0
lines changed

4 files changed

+210
-0
lines changed

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,7 @@ Getting Involved
147147
train/custom-training
148148
train/distributed
149149
train/runtimes
150+
train/initializers
150151
train/options
151152
train/api
152153

docs/source/train/api.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,33 @@ Trainers
2323
:members:
2424
:show-inheritance:
2525

26+
Initializers
27+
------------
28+
29+
.. autoclass:: kubeflow.trainer.Initializer
30+
:members:
31+
:show-inheritance:
32+
33+
.. autoclass:: kubeflow.trainer.HuggingFaceDatasetInitializer
34+
:members:
35+
:show-inheritance:
36+
37+
.. autoclass:: kubeflow.trainer.S3DatasetInitializer
38+
:members:
39+
:show-inheritance:
40+
41+
.. autoclass:: kubeflow.trainer.DataCacheInitializer
42+
:members:
43+
:show-inheritance:
44+
45+
.. autoclass:: kubeflow.trainer.HuggingFaceModelInitializer
46+
:members:
47+
:show-inheritance:
48+
49+
.. autoclass:: kubeflow.trainer.S3ModelInitializer
50+
:members:
51+
:show-inheritance:
52+
2653
Backend Configurations
2754
----------------------
2855

docs/source/train/index.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,12 @@ Guides
6767

6868
Understand pre-configured environments for PyTorch, TensorFlow, etc.
6969

70+
.. grid-item-card:: Data and Model Initializers
71+
:link: initializers
72+
:link-type: doc
73+
74+
Download datasets and pre-trained models before training starts.
75+
7076
Common Patterns
7177
---------------
7278

docs/source/train/initializers.rst

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
Data and Model Initializers
2+
===========================
3+
4+
Initializers are pre-training containers that download datasets and pre-trained
5+
models before your training job starts. You declare *what* to fetch; the SDK
6+
runs the download as a separate step and makes the data available to your
7+
training container.
8+
9+
.. note::
10+
11+
Initializers are supported on the **Container backend** and the
12+
**Kubernetes backend**. They have no effect on ``LocalProcessBackend``.
13+
14+
Available Initializers
15+
----------------------
16+
17+
.. list-table::
18+
:header-rows: 1
19+
:widths: 20 20 60
20+
21+
* - Kind
22+
- Source
23+
- Class
24+
* - Dataset
25+
- HuggingFace Hub
26+
- ``HuggingFaceDatasetInitializer``
27+
* - Dataset
28+
- S3-compatible
29+
- ``S3DatasetInitializer``
30+
* - Dataset
31+
- Distributed cache
32+
- ``DataCacheInitializer``
33+
* - Model
34+
- HuggingFace Hub
35+
- ``HuggingFaceModelInitializer``
36+
* - Model
37+
- S3-compatible
38+
- ``S3ModelInitializer``
39+
40+
Pass them via the ``Initializer`` wrapper to ``client.train()``. When both
41+
``dataset`` and ``model`` are set they download **in parallel**, so total wait
42+
time equals the longer of the two.
43+
44+
Dataset Initializers
45+
--------------------
46+
47+
**HuggingFace Hub:**
48+
49+
.. code-block:: python
50+
51+
from kubeflow.trainer import TrainerClient, CustomTrainer
52+
from kubeflow.trainer import Initializer, HuggingFaceDatasetInitializer
53+
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
54+
55+
client = TrainerClient(backend_config=ContainerBackendConfig())
56+
client.train(
57+
initializer=Initializer(
58+
dataset=HuggingFaceDatasetInitializer(
59+
storage_uri="hf://username/my-dataset",
60+
access_token="hf_...", # required for private repos
61+
)
62+
),
63+
trainer=CustomTrainer(func=train),
64+
)
65+
66+
The dataset is available inside the training container at ``/workspace/dataset``.
67+
68+
**S3-compatible storage:**
69+
70+
.. code-block:: python
71+
72+
from kubeflow.trainer import Initializer, S3DatasetInitializer
73+
74+
client.train(
75+
initializer=Initializer(
76+
dataset=S3DatasetInitializer(
77+
storage_uri="s3://my-bucket/datasets/my-dataset",
78+
endpoint="https://minio.example.com", # omit for AWS S3
79+
access_key_id="...",
80+
secret_access_key="...",
81+
region="us-east-1",
82+
)
83+
),
84+
trainer=CustomTrainer(func=train),
85+
)
86+
87+
Model Initializers
88+
------------------
89+
90+
**HuggingFace Hub:**
91+
92+
.. code-block:: python
93+
94+
from kubeflow.trainer import Initializer, HuggingFaceModelInitializer
95+
96+
client.train(
97+
initializer=Initializer(
98+
model=HuggingFaceModelInitializer(
99+
storage_uri="hf://meta-llama/Llama-3.2-1B",
100+
access_token="hf_...",
101+
)
102+
),
103+
trainer=CustomTrainer(func=fine_tune),
104+
)
105+
106+
Model weights are available at ``/workspace/model-weights``. By default,
107+
redundant formats (``*.msgpack``, ``*.h5``, ``*.bin``, ``*.pt``, ``*.pth``)
108+
are skipped. Pass ``ignore_patterns=[]`` to download everything.
109+
110+
**S3-compatible storage:**
111+
112+
.. code-block:: python
113+
114+
from kubeflow.trainer import Initializer, S3ModelInitializer
115+
116+
client.train(
117+
initializer=Initializer(
118+
model=S3ModelInitializer(
119+
storage_uri="s3://my-models/llama-3.2-1b",
120+
access_key_id="...",
121+
secret_access_key="...",
122+
region="us-east-1",
123+
)
124+
),
125+
trainer=CustomTrainer(func=fine_tune),
126+
)
127+
128+
Using Both Together
129+
-------------------
130+
131+
.. code-block:: python
132+
133+
from kubeflow.trainer import (
134+
Initializer,
135+
HuggingFaceDatasetInitializer,
136+
HuggingFaceModelInitializer,
137+
)
138+
139+
client.train(
140+
initializer=Initializer(
141+
dataset=HuggingFaceDatasetInitializer(storage_uri="hf://tatsu-lab/alpaca"),
142+
model=HuggingFaceModelInitializer(
143+
storage_uri="hf://meta-llama/Llama-3.2-1B",
144+
access_token="hf_...",
145+
),
146+
),
147+
trainer=CustomTrainer(func=fine_tune),
148+
)
149+
150+
Container Backend Configuration
151+
---------------------------------
152+
153+
Override default images or increase the timeout via ``ContainerBackendConfig``:
154+
155+
.. code-block:: python
156+
157+
from kubeflow.trainer.backends.container.types import ContainerBackendConfig
158+
159+
client = TrainerClient(backend_config=ContainerBackendConfig(
160+
dataset_initializer_image="ghcr.io/kubeflow/trainer/dataset-initializer:v0.4.0",
161+
model_initializer_image="ghcr.io/kubeflow/trainer/model-initializer:v0.4.0",
162+
initializer_timeout=1800, # seconds, default 600
163+
))
164+
165+
Debugging
166+
---------
167+
168+
Fetch logs from a specific initializer step:
169+
170+
.. code-block:: python
171+
172+
for line in client.get_job_logs(job_name, step="dataset-initializer"):
173+
print(line)
174+
175+
for line in client.get_job_logs(job_name, step="model-initializer"):
176+
print(line)

0 commit comments

Comments
 (0)