5
5
Convert a pipeline
6
6
==================
7
7
8
- *skl2onnx * converts any machine learning pipeline into
9
- *ONNX * pipelines . Every transformer or predictors is converted
10
- into one or multiple nodes into the *ONNX * graph.
8
+ *skl2onnx * converts any machine learning pipeline into an
9
+ *ONNX * pipeline . Every transformer or predictor is converted
10
+ into one or multiple nodes in the *ONNX * graph.
11
11
Any `ONNX backend <https://github.com/onnx/onnx/blob/main/docs/ImplementingAnOnnxBackend.md >`_
12
12
can then use this graph to compute equivalent outputs for the same inputs.
13
13
@@ -17,8 +17,8 @@ Convert complex pipelines
17
17
=========================
18
18
19
19
*scikit-learn * introduced
20
- `ColumnTransformer <https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html >`_
21
- useful to build complex pipelines such as the following one:
20
+ `ColumnTransformer <https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html >`_,
21
+ useful for building complex pipelines such as the following one:
22
22
23
23
::
24
24
@@ -56,7 +56,7 @@ useful to build complex pipelines such as the following one:
56
56
('classifier', classifier)
57
57
])
58
58
59
- Which we can represents as:
59
+ Which we can represent as:
60
60
61
61
.. blockdiag ::
62
62
@@ -112,15 +112,15 @@ Parser, shape calculator, converter
112
112
113
113
.. index :: parser, shape calculator, converter
114
114
115
- Three kinds of functions are involved into the conversion
115
+ Three kinds of functions are involved in the conversion
116
116
of a *scikit-pipeline *. Each of them is called in the following
117
117
order:
118
118
119
119
* **parser(scope, model, inputs, custom_parser) **:
120
- the parser builds the expected outputs of a model,
121
- as the resulting graph must contain unique names,
122
- *scope * contains all names already given,
123
- *model * is the model to convert,
120
+ The parser builds the expected outputs of a model.
121
+ As the resulting graph must contain unique names,
122
+ *scope * contains all names already given.
123
+ *model * is the model to convert.
124
124
*inputs * are the *inputs * the model receives
125
125
in the *ONNX * graph. It is a list of
126
126
:class: `Variable <skl2onnx.common._topology.Variable> `.
@@ -130,32 +130,32 @@ order:
130
130
machine learned problems. The shape calculator
131
131
changes the shapes and types for each of them
132
132
depending on the model and is called after all
133
- outputs were defined (topology). This steps defines
133
+ outputs are defined (topology). This step defines
134
134
the number of outputs and their types for every node
135
135
and sets them to a default shape ``[None, None] ``
136
136
which the output node has one row and no known
137
137
columns yet.
138
- * **shape_calculator(model): **
138
+ * **shape_calculator(model) **:
139
139
The shape calculator changes the shape
140
140
of the outputs created by the parser. Once this function
141
141
returned its results, the graph structure is fully defined
142
142
and cannot be changed. The shape calculator should
143
143
not change types. Many runtimes are implemented in C++
144
144
and do not support implicit casts. A change of type
145
145
might make the runtime fail due to a type mismatch
146
- between two consecutive nodes produces by two different
146
+ between two consecutive nodes produced by two different
147
147
converters.
148
- * **converter(scope, operator, container): **
148
+ * **converter(scope, operator, container) **:
149
149
The converter converts the transformers or predictors into
150
- *ONNX * nodes. Each node can an *ONNX *
150
+ *ONNX * nodes. Each node can be an *ONNX *
151
151
`operator <https://github.com/onnx/onnx/blob/main/docs/Operators.md >`_ or
152
152
`ML operator <https://github.com/onnx/onnx/blob/main/docs/Operators.md >`_ or
153
153
custom *ONNX * operators.
154
154
155
155
As *sklearn-onnx * may convert pipelines with model coming from other libraries,
156
156
the library must handle parsers, shape calculators or converters coming
157
- from other packages. This can be done is two ways. The first one
158
- consists in calling function :func: `convert_sklearn <skl2onnx.convert_sklearn> `
157
+ from other packages. This can be done in two ways. The first one
158
+ consists of calling function :func: `convert_sklearn <skl2onnx.convert_sklearn> `
159
159
by mapping the model type to a specific parser, a specific shape calculator
160
160
or a specific converter. It is possible to avoid these specifications
161
161
by registering the new parser or shape calculator or converter
@@ -169,13 +169,13 @@ One example follows.
169
169
New converters in a pipeline
170
170
============================
171
171
172
- Many libraries implement *scikit-learn * API and their models can
172
+ Many libraries implement the *scikit-learn * API and their models can
173
173
be included in *scikit-learn * pipelines. However, *sklearn-onnx * cannot
174
- a pipeline which include a model such as *XGBoost * or *LightGbm *
174
+ convert a pipeline which includes a model such as *XGBoost * or *LightGBM *
175
175
if it does not know the corresponding converters: it needs to be registered.
176
- That's the purpose of function :func: `skl2onnx.update_registered_converter `.
176
+ That's the purpose of the function :func: `skl2onnx.update_registered_converter `.
177
177
The following example shows how to register a new converter or
178
- or update an existing one. Four elements are registered:
178
+ update an existing one. Four elements are registered:
179
179
180
180
* the model class
181
181
* an alias, usually the class name prefixed by the library name
@@ -193,23 +193,22 @@ The following lines shows what these four elements are for a random forest:
193
193
calculate_linear_classifier_output_shapes,
194
194
convert_sklearn_random_forest_classifier)
195
195
196
- See example :ref: `example-lightgbm ` to see a complete example
197
- with a *LightGbm * model.
196
+ See :ref: `example-lightgbm ` for a complete example with a *LightGBM * model.
198
197
199
198
Titanic example
200
199
===============
201
200
202
201
The first example was a simplified pipeline coming from *scikit-learn *'s documentation:
203
202
`Column Transformer with Mixed Types <https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py >`_.
204
203
The full story is available in a runnable example: :ref: `example-complex-pipeline `
205
- which also shows up some mistakes that a user could come accross
204
+ which also shows some mistakes that a user could come across
206
205
when trying to convert a pipeline.
207
206
208
207
Parameterize the conversion
209
208
===========================
210
209
211
- Most of the converter do not require specific options
212
- to convert a *scikit-learn * model. It always produces the same
210
+ Most of the converters do not require specific options
211
+ to convert a *scikit-learn * model and produce the same
213
212
results. However, in some cases, the conversion cannot produce
214
213
a model which returns the exact same results. The user may want
215
214
to optimize the conversion by giving the converter additional
@@ -220,16 +219,16 @@ pipeline. That why the option mechanism was implemented:
220
219
Investigate discrepencies
221
220
=========================
222
221
223
- A wrong converter may introduce introduce discrepencies
224
- in a converter pipeline but it is not alway easy to
222
+ A wrong converter may introduce discrepancies
223
+ in a converted pipeline but it is not always easy to
225
224
isolate the source of the differences. The function
226
225
:func: `collect_intermediate_steps
227
226
<skl2onnx.helpers.collect_intermediate_steps> `
228
- may then be used to investigate each component independently.
229
- The following piece of code is extracted from unit test
227
+ may be used to investigate each component independently.
228
+ The following piece of code is taken from unit test
230
229
`test_investigate.py <https://github.com/onnx/sklearn-onnx/
231
230
blob/main/tests/test_investigate.py> `_ and converts
232
- a pipeline and each of its components independently.
231
+ a pipeline and each of its components independently:
233
232
234
233
::
235
234
0 commit comments