Skip to content

Commit 3f702a2

Browse files
doc-update-quality-assessment (#112)
* doc-update-quality-assessment * prep-for-dita-conversion * two minor adjustments * doc-update incorporate peer review feedback * Update workshop/docs/modules/ROOT/nav.adoc Co-authored-by: RHRolun <[email protected]> --------- Co-authored-by: RHRolun <[email protected]>
1 parent d21f627 commit 3f702a2

32 files changed

+504
-333
lines changed

workshop/docs/modules/ROOT/nav.adoc

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,15 @@
2323
* 4. Data Science Pipelines
2424
// ** xref:enabling-data-science-pipelines.adoc[1. Enable Pipelines]
2525
** xref:automating-workflows-with-pipelines.adoc[1. Workbench Pipeline Editor]
26+
*** xref:creating-a-pipeline.adoc[1. Creating a pipeline]
27+
*** xref:adding-nodes-to-your-pipeline.adoc[2. Adding nodes to your pipeline]
28+
*** xref:specifying-the-training-file-as-a-dependency.adoc[3. Specify the training file as a dependency]
29+
*** xref:creating-and-storing-the-onnx-output-file.adoc[4. Creating and storing the ONNX-formatted output file]
30+
*** xref:configuring-the-connection-to-storage.adoc[5. Configuring the connection to storage]
31+
*** xref:running-your-pipeline.adoc[6. Running your pipeline]
2632
** xref:running-a-pipeline-generated-from-python-code.adoc[2. Python Pipelines]
2733
28-
29-
* 5. Distributed Training
30-
** xref:distributed-jobs-with-ray.adoc[1. Distributed Jobs with Ray]
31-
** xref:distributed-jobs-with-kfto.adoc[2. Distributed Jobs with the Training Operator]
34+
* 5. Distributing Training
35+
** xref:distributing-training-jobs.adoc[1.Distributing Training Jobs]
36+
*** xref:distributing-training-jobs-with-ray.adoc[1. Distributing Training Jobs with Ray]
37+
*** xref:distributing-training-jobs-with-kfto.adoc[2. Distributing Training Jobs with the Training Operator]
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
:_module-type: PROCEDURE
2+
3+
[id='adding-nodes-to-your-pipeline']
4+
= Adding nodes to your pipeline
5+
6+
[role="_abstract"]
7+
Add some steps, or *nodes*, in your pipeline for the `1_experiment_train.ipynb` and `2_save_model.ipynb` notebooks.
8+
9+
.Prerequisites
10+
11+
* You created a pipeline file as described in xref:creating-a-pipeline.adoc[title]
12+
13+
.Procedure
14+
15+
. From the JupyterLab file-browser panel, drag the `1_experiment_train.ipynb` and `2_save_model.ipynb` notebooks onto the pipeline canvas.
16+
+
17+
image::pipelines/wb-pipeline-drag-drop.png[ Drag and Drop Notebooks]
18+
19+
. Click the output port of `1_experiment_train.ipynb` and drag a connecting line to the input port of `2_save_model.ipynb`.
20+
+
21+
image::pipelines/wb-pipeline-connect-nodes.png[Connect Nodes, 400]
22+
23+
. Save the pipeline.
24+
25+
.Verification
26+
27+
* Your pipeline has two nodes.
28+
29+
.Next step
30+
31+
xref:specifying-the-training-file-as-a-dependency.adoc[Specify the training file as a dependency]
Lines changed: 6 additions & 222 deletions
Original file line numberDiff line numberDiff line change
@@ -1,227 +1,11 @@
1+
:_module-type: CONCEPT
2+
13
[id='automating-workflows-with-pipelines']
24
= Automating workflows with data science pipelines
35

4-
In previous sections of this {deliverable}, you used a notebook to train and save your model. Optionally, you can automate these tasks by using {productname-long} pipelines. Pipelines offer a way to automate the execution of multiple notebooks and Python code. By using pipelines, you can execute long training jobs or retrain your models on a schedule without having to manually run them in a notebook.
5-
6-
In this section, you create a simple pipeline by using the GUI pipeline editor. The pipeline uses the notebook that you used in previous sections to train a model and then save it to S3 storage.
7-
8-
Your completed pipeline should look like the one in the `6 Train Save.pipeline` file.
9-
10-
To explore the pipeline editor, complete the steps in the following procedure to create your own pipeline. Alternately, you can skip the following procedure and instead run the `6 Train Save.pipeline` file.
11-
12-
== Prerequisites
13-
14-
* You configured a pipeline server as described in xref:enabling-data-science-pipelines.adoc[Enabling data science pipelines].
15-
* If you configured the pipeline server after you created your workbench, you stopped and then started your workbench.
16-
17-
== Create a pipeline
18-
19-
. Open your workbench's JupyterLab environment. If the launcher is not visible, click *+* to open it.
20-
+
21-
image::pipelines/wb-pipeline-launcher.png[Pipeline buttons, 400]
22-
23-
. Click *Pipeline Editor*.
24-
+
25-
image::pipelines/wb-pipeline-editor-button.png[Pipeline Editor button, 75]
26-
+
27-
You've created a blank pipeline.
28-
29-
. Set the default runtime image for when you run your notebook or Python code.
30-
31-
.. In the pipeline editor, click *Open Panel*.
32-
+
33-
image::pipelines/wb-pipeline-panel-button-loc.png[Open Panel,400]
34-
35-
.. Select the *Pipeline Properties* tab.
36-
+
37-
image::pipelines/wb-pipeline-properties-tab.png[Pipeline Properties Tab]
38-
39-
.. In the *Pipeline Properties* panel, scroll down to *Generic Node Defaults* and *Runtime Image*. Set the value to `Tensorflow with Cuda and Python 3.11 (UBI 9)`.
40-
+
41-
image::pipelines/wb-pipeline-runtime-image.png[Pipeline Runtime Image0, 400]
42-
43-
. Select *File* -> *Save Pipeline*.
44-
45-
== Add nodes to your pipeline
46-
47-
Add some steps, or *nodes* in your pipeline. Your two nodes will use the `1_experiment_train.ipynb` and `2_save_model.ipynb` notebooks.
48-
49-
. From the file-browser panel, drag the `1_experiment_train.ipynb` and `2_save_model.ipynb` notebooks onto the pipeline canvas.
50-
+
51-
image::pipelines/wb-pipeline-drag-drop.png[ Drag and Drop Notebooks]
52-
53-
. Click the output port of `1_experiment_train.ipynb` and drag a connecting line to the input port of `2_save_model.ipynb`.
54-
+
55-
image::pipelines/wb-pipeline-connect-nodes.png[Connect Nodes, 400]
56-
57-
. Save the pipeline.
58-
59-
== Specify the training file as a dependency
60-
61-
Set node properties to specify the training file as a dependency.
62-
63-
NOTE: If you don't set this file dependency, the file is not included in the node when it runs and the training job fails.
64-
65-
. Click the `1_experiment_train.ipynb` node.
66-
+
67-
image::pipelines/wb-pipeline-node-1.png[Select Node 1, 150]
68-
69-
. In the *Properties* panel, click the *Node Properties* tab.
70-
71-
. Scroll down to the *File Dependencies* section and then click *Add*.
72-
+
73-
image::pipelines/wb-pipeline-node-1-file-dep.png[Add File Dependency, 500]
74-
75-
. Set the value to `data/*.csv` which contains the data to train your model.
76-
77-
. Select the *Include Subdirectories* option.
78-
+
79-
image::pipelines/wb-pipeline-node-1-file-dep-form.png[Set File Dependency Value, 300]
80-
81-
. *Save* the pipeline.
82-
83-
== Create and store the ONNX-formatted output file
84-
85-
In node 1, the notebook creates the `models/fraud/1/model.onnx` file. In node 2, the notebook uploads that file to the S3 storage bucket. You must set `models/fraud/1/model.onnx` file as the output file for both nodes.
86-
87-
. Select node 1.
88-
89-
. Select the *Node Properties* tab.
90-
91-
. Scroll down to the *Output Files* section, and then click *Add*.
92-
93-
. Set the value to `models/fraud/1/model.onnx`.
94-
+
95-
image::pipelines/wb-pipeline-node-1-file-output-form.png[Set file dependency value, 400]
96-
97-
. Repeat steps 2-4 for node 2.
98-
99-
. *Save* the pipeline.
100-
101-
== Configure the connection to the S3 storage bucket
102-
103-
In node 2, the notebook uploads the model to the S3 storage bucket.
104-
105-
You must set the S3 storage bucket keys by using the secret created by the `My Storage` connection that you set up in the storing-data-with-connections.adoc[Storing data with connections] section of this {deliverable}.
106-
107-
You can use this secret in your pipeline nodes without having to save the information in your pipeline code. This is important, for example, if you want to save your pipelines - without any secret keys - to source control.
108-
109-
The secret is named `aws-connection-my-storage`.
110-
111-
[NOTE]
112-
====
113-
If you named your connection something other than `My Storage`, you can obtain the secret name in the {productname-short} dashboard by hovering over the help (?) icon in the *Connections* tab.
114-
115-
image::pipelines/dsp-dc-secret-name.png[My Storage Secret Name, 400]
116-
====
117-
118-
The `aws-connection-my-storage` secret includes the following fields:
119-
120-
* `AWS_ACCESS_KEY_ID`
121-
* `AWS_DEFAULT_REGION`
122-
* `AWS_S3_BUCKET`
123-
* `AWS_S3_ENDPOINT`
124-
* `AWS_SECRET_ACCESS_KEY`
125-
126-
You must set the secret name and key for each of these fields.
127-
128-
.Procedure
129-
130-
. Remove any pre-filled environment variables.
131-
132-
.. Select node 2, and then select the *Node Properties* tab.
133-
+
134-
Under *Additional Properties*, note that some environment variables have been pre-filled. The pipeline editor inferred that you need them from the notebook code.
135-
+
136-
Since you don't want to save the value in your pipelines, remove all of these environment variables.
137-
138-
.. Click *Remove* for each of the pre-filled environment variables.
139-
+
140-
image::pipelines/wb-pipeline-node-remove-env-var.png[Remove Env Var]
141-
142-
. Add the S3 bucket and keys by using the Kubernetes secret.
143-
144-
.. Under *Kubernetes Secrets*, click *Add*.
145-
+
146-
image::pipelines/wb-pipeline-add-kube-secret.png[Add Kubernetes Secret]
147-
148-
.. Enter the following values and then click *Add*.
149-
+
150-
* *Environment Variable*: `AWS_ACCESS_KEY_ID`
151-
** *Secret Name*: `aws-connection-my-storage`
152-
** *Secret Key*: `AWS_ACCESS_KEY_ID`
153-
+
154-
image::pipelines/wb-pipeline-kube-secret-form.png[Secret Form, 400]
155-
156-
. Repeat Step 2 for each of the following Kubernetes secrets:
157-
158-
* *Environment Variable*: `AWS_SECRET_ACCESS_KEY`
159-
** *Secret Name*: `aws-connection-my-storage`
160-
** *Secret Key*: `AWS_SECRET_ACCESS_KEY`
161-
162-
* *Environment Variable*: `AWS_S3_ENDPOINT`
163-
** *Secret Name*: `aws-connection-my-storage`
164-
** *Secret Key*: `AWS_S3_ENDPOINT`
165-
166-
* *Environment Variable*: `AWS_DEFAULT_REGION`
167-
** *Secret Name*: `aws-connection-my-storage`
168-
** *Secret Key*: `AWS_DEFAULT_REGION`
169-
170-
* *Environment Variable*: `AWS_S3_BUCKET`
171-
** *Secret Name*: `aws-connection-my-storage`
172-
** *Secret Key*: `AWS_S3_BUCKET`
173-
174-
. Select *File* -> *Save Pipeline As* to save and rename the pipeline. For example, rename it to `My Train Save.pipeline`.
175-
176-
== Run the Pipeline
177-
178-
Upload the pipeline on your cluster and run it. You can do so directly from the pipeline editor. You can use your own newly created pipeline or the pipeline in the provided `6 Train Save.pipeline` file.
179-
180-
.Procedure
181-
182-
. Click the play button in the toolbar of the pipeline editor.
183-
+
184-
image::pipelines/wb-pipeline-run-button.png[Pipeline Run Button, 300]
185-
186-
. Enter a name for your pipeline.
187-
. Verify that the *Runtime Configuration:* is set to `Data Science Pipeline`.
188-
. Click *OK*.
189-
+
190-
[NOTE]
191-
====
192-
If you see an error message stating that "no runtime configuration for Data Science Pipeline is defined", you might have created your workbench before the pipeline server was available.
193-
194-
To address this situation, you must verify that you configured the pipeline server and then restart the workbench.
195-
196-
Follow these steps in the {productname-short} dashboard:
197-
198-
. Check the status of the pipeline server:
199-
.. In your Fraud Detection project, click the *Pipelines* tab.
200-
** If you see the *Configure pipeline server* option, follow the steps in xref:enabling-data-science-pipelines.adoc[Enabling data science pipelines].
201-
** If you see the *Import a pipeline* option, the pipeline server is configured. Continue to the next step.
202-
. Restart your Fraud Detection workbench:
203-
.. Click the *Workbenches* tab.
204-
.. Click *Stop* and then click *Stop workbench*.
205-
.. After the workbench status is *Stopped*, click *Start*.
206-
.. Wait until the workbench status is *Running*.
207-
. Return to your workbench's JupyterLab environment and run the pipeline.
208-
====
209-
210-
. In the {productname-short} dashboard, open your data science project and expand the newly created pipeline.
211-
+
212-
image::pipelines/dsp-pipeline-complete.png[New pipeline expanded, 800]
213-
214-
. Click *View runs*.
215-
+
216-
image::pipelines/dsp-view-run.png[View runs for selected pipeline, 500]
217-
218-
. Click your run and then view the pipeline run in progress.
219-
+
220-
image::pipelines/pipeline-run-complete.png[Pipeline run progress, 800]
221-
222-
The result should be a `models/fraud/1/model.onnx` file in your S3 bucket which you can serve, just like you did manually in the xref:preparing-a-model-for-deployment.adoc[Preparing a model for deployment] section.
223-
6+
[role="_abstract"]
7+
Earlier, you used a notebook to train and save your model. Optionally, you can automate these tasks by using {productname-long} pipelines. Pipelines offer a way to automate the execution of many notebooks and Python code. By using pipelines, you can run long training jobs or retrain your models on a schedule without having to manually run them in a notebook.
2248

225-
.Next step
9+
To explore the pipeline editor, complete the steps in the following procedures to create your own pipeline.
22610

227-
(Optional) xref:running-a-pipeline-generated-from-python-code.adoc[Running a data science pipeline generated from Python code]
11+
Alternately, you can skip the following procedures and instead run the `6 Train Save.pipeline` file.
Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,18 @@
1+
:_module-type: CONCEPT
2+
13
= Fraud Detection Workshop with {productname-long}
24
:page-layout: home
35
:!sectids:
46

57
[.text-center.strong]
68
== Conclusion
79

10+
[role="_abstract"]
811
Congratulations. In this {deliverable}, you learned how to incorporate data science, artificial intelligence, and machine learning into an OpenShift development workflow.
912

1013
You used an example fraud detection model and completed the following tasks:
1114

1215
* Explored a pre-trained fraud detection model by using a Jupyter notebook.
1316
* Deployed the model by using {productname-short} model serving.
1417
* Refined and trained the model by using automated pipelines.
15-
* Learned how to train the model by using Ray, a distributed computing framework.
18+
* Learned how to train the model by using distributed computing frameworks.
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
:_module-type: PROCEDURE
2+
3+
[id='configuring-the-connection-to-storage']
4+
= Configuring the connection to storage
5+
6+
[role="_abstract"]
7+
In node 2, the notebook uploads the model to the S3 storage bucket. You must set the S3 storage bucket keys by using the secret created by the `My Storage` connection that you set up in xref:storing-data-with-connections.adoc[Storing data with connections].
8+
9+
You can use this secret in your pipeline nodes without having to save the information in your pipeline code. This is important, for example, if you want to save your pipelines - without any secret keys - to source control.
10+
11+
The name of the secret is `aws-connection-my-storage`.
12+
13+
[NOTE]
14+
====
15+
If you named your connection something other than `My Storage`, you can obtain the secret name in the {productname-short} dashboard by hovering over the help (?) icon in the *Connections* tab.
16+
17+
image::pipelines/dsp-dc-secret-name.png[My Storage Secret Name, 400]
18+
====
19+
20+
The `aws-connection-my-storage` secret includes the following fields:
21+
22+
* `AWS_ACCESS_KEY_ID`
23+
* `AWS_DEFAULT_REGION`
24+
* `AWS_S3_BUCKET`
25+
* `AWS_S3_ENDPOINT`
26+
* `AWS_SECRET_ACCESS_KEY`
27+
28+
You must set the secret name and key for each of these fields.
29+
30+
.Prerequisites
31+
32+
* You created the `My Storage` connection, as described in xref:storing-data-with-connections.adoc[Storing data with connections].
33+
34+
* You set the `models/fraud/1/model.onnx` file as the output file for both nodes in your pipeline, as described in
35+
xref:creating-and-storing-the-onnx-output-file.adoc[Creating and storing the ONNX-formatted output file].
36+
37+
.Procedure
38+
39+
. Remove any pre-filled environment variables.
40+
41+
.. Select node 2, and then select the *Node Properties* tab.
42+
+
43+
Under *Additional Properties*, note that some environment variables have been pre-filled. The pipeline editor inferred that you need them from the notebook code.
44+
+
45+
Because you do not want to save the value in your pipelines, remove all of these environment variables.
46+
47+
.. Click *Remove* for each of the pre-filled environment variables.
48+
+
49+
image::pipelines/wb-pipeline-node-remove-env-var.png[Remove Env Var]
50+
51+
. Add the S3 bucket and keys by using the Kubernetes secret.
52+
53+
.. Under *Kubernetes Secrets*, click *Add*.
54+
+
55+
image::pipelines/wb-pipeline-add-kube-secret.png[Add Kubernetes Secret]
56+
57+
.. Enter the following values and then click *Add*.
58+
+
59+
* *Environment Variable*: `AWS_ACCESS_KEY_ID`
60+
** *Secret Name*: `aws-connection-my-storage`
61+
** *Secret Key*: `AWS_ACCESS_KEY_ID`
62+
+
63+
image::pipelines/wb-pipeline-kube-secret-form.png[Secret Form, 400]
64+
65+
. Repeat Step 2 for each of the following Kubernetes secrets:
66+
67+
* *Environment Variable*: `AWS_SECRET_ACCESS_KEY`
68+
** *Secret Name*: `aws-connection-my-storage`
69+
** *Secret Key*: `AWS_SECRET_ACCESS_KEY`
70+
71+
* *Environment Variable*: `AWS_S3_ENDPOINT`
72+
** *Secret Name*: `aws-connection-my-storage`
73+
** *Secret Key*: `AWS_S3_ENDPOINT`
74+
75+
* *Environment Variable*: `AWS_DEFAULT_REGION`
76+
** *Secret Name*: `aws-connection-my-storage`
77+
** *Secret Key*: `AWS_DEFAULT_REGION`
78+
79+
* *Environment Variable*: `AWS_S3_BUCKET`
80+
** *Secret Name*: `aws-connection-my-storage`
81+
** *Secret Key*: `AWS_S3_BUCKET`
82+
83+
. Select *File* -> *Save Pipeline As* to save and rename the pipeline. For example, rename it to `My Train Save.pipeline`.
84+
85+
.Verification
86+
87+
* You set the S3 storage bucket keys by using the secret created by the `My Storage` connection.
88+
89+
.Next step
90+
91+
xref:running-your-pipeline.adoc[Running your pipeline]

0 commit comments

Comments
 (0)