Skip to content

Commit 5060c8c

Browse files
authored
Merge branch 'main' into ch
2 parents 689f6c4 + 2edf2b3 commit 5060c8c

File tree

1 file changed

+31
-32
lines changed

1 file changed

+31
-32
lines changed

docs/pipeline.rst

+31-32
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@
55
Convert a pipeline
66
==================
77

8-
*skl2onnx* converts any machine learning pipeline into
9-
*ONNX* pipelines. Every transformer or predictors is converted
10-
into one or multiple nodes into the *ONNX* graph.
8+
*skl2onnx* converts any machine learning pipeline into an
9+
*ONNX* pipeline. Every transformer or predictor is converted
10+
into one or multiple nodes in the *ONNX* graph.
1111
Any `ONNX backend <https://github.com/onnx/onnx/blob/main/docs/ImplementingAnOnnxBackend.md>`_
1212
can then use this graph to compute equivalent outputs for the same inputs.
1313

@@ -17,8 +17,8 @@ Convert complex pipelines
1717
=========================
1818

1919
*scikit-learn* introduced
20-
`ColumnTransformer <https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html>`_
21-
useful to build complex pipelines such as the following one:
20+
`ColumnTransformer <https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html>`_,
21+
useful for building complex pipelines such as the following one:
2222

2323
::
2424

@@ -56,7 +56,7 @@ useful to build complex pipelines such as the following one:
5656
('classifier', classifier)
5757
])
5858

59-
Which we can represents as:
59+
Which we can represent as:
6060

6161
.. blockdiag::
6262

@@ -112,15 +112,15 @@ Parser, shape calculator, converter
112112

113113
.. index:: parser, shape calculator, converter
114114

115-
Three kinds of functions are involved into the conversion
115+
Three kinds of functions are involved in the conversion
116116
of a *scikit-pipeline*. Each of them is called in the following
117117
order:
118118

119119
* **parser(scope, model, inputs, custom_parser)**:
120-
the parser builds the expected outputs of a model,
121-
as the resulting graph must contain unique names,
122-
*scope* contains all names already given,
123-
*model* is the model to convert,
120+
The parser builds the expected outputs of a model.
121+
As the resulting graph must contain unique names,
122+
*scope* contains all names already given.
123+
*model* is the model to convert.
124124
*inputs* are the *inputs* the model receives
125125
in the *ONNX* graph. It is a list of
126126
:class:`Variable <skl2onnx.common._topology.Variable>`.
@@ -130,32 +130,32 @@ order:
130130
machine learned problems. The shape calculator
131131
changes the shapes and types for each of them
132132
depending on the model and is called after all
133-
outputs were defined (topology). This steps defines
133+
outputs are defined (topology). This step defines
134134
the number of outputs and their types for every node
135135
and sets them to a default shape ``[None, None]``
136136
which the output node has one row and no known
137137
columns yet.
138-
* **shape_calculator(model):**
138+
* **shape_calculator(model)**:
139139
The shape calculator changes the shape
140140
of the outputs created by the parser. Once this function
141141
returned its results, the graph structure is fully defined
142142
and cannot be changed. The shape calculator should
143143
not change types. Many runtimes are implemented in C++
144144
and do not support implicit casts. A change of type
145145
might make the runtime fail due to a type mismatch
146-
between two consecutive nodes produces by two different
146+
between two consecutive nodes produced by two different
147147
converters.
148-
* **converter(scope, operator, container):**
148+
* **converter(scope, operator, container)**:
149149
The converter converts the transformers or predictors into
150-
*ONNX* nodes. Each node can an *ONNX*
150+
*ONNX* nodes. Each node can be an *ONNX*
151151
`operator <https://github.com/onnx/onnx/blob/main/docs/Operators.md>`_ or
152152
`ML operator <https://github.com/onnx/onnx/blob/main/docs/Operators.md>`_ or
153153
custom *ONNX* operators.
154154

155155
As *sklearn-onnx* may convert pipelines with model coming from other libraries,
156156
the library must handle parsers, shape calculators or converters coming
157-
from other packages. This can be done is two ways. The first one
158-
consists in calling function :func:`convert_sklearn <skl2onnx.convert_sklearn>`
157+
from other packages. This can be done in two ways. The first one
158+
consists of calling function :func:`convert_sklearn <skl2onnx.convert_sklearn>`
159159
by mapping the model type to a specific parser, a specific shape calculator
160160
or a specific converter. It is possible to avoid these specifications
161161
by registering the new parser or shape calculator or converter
@@ -169,13 +169,13 @@ One example follows.
169169
New converters in a pipeline
170170
============================
171171

172-
Many libraries implement *scikit-learn* API and their models can
172+
Many libraries implement the *scikit-learn* API and their models can
173173
be included in *scikit-learn* pipelines. However, *sklearn-onnx* cannot
174-
a pipeline which include a model such as *XGBoost* or *LightGbm*
174+
convert a pipeline which includes a model such as *XGBoost* or *LightGBM*
175175
if it does not know the corresponding converters: it needs to be registered.
176-
That's the purpose of function :func:`skl2onnx.update_registered_converter`.
176+
That's the purpose of the function :func:`skl2onnx.update_registered_converter`.
177177
The following example shows how to register a new converter or
178-
or update an existing one. Four elements are registered:
178+
update an existing one. Four elements are registered:
179179

180180
* the model class
181181
* an alias, usually the class name prefixed by the library name
@@ -193,23 +193,22 @@ The following lines shows what these four elements are for a random forest:
193193
calculate_linear_classifier_output_shapes,
194194
convert_sklearn_random_forest_classifier)
195195

196-
See example :ref:`example-lightgbm` to see a complete example
197-
with a *LightGbm* model.
196+
See :ref:`example-lightgbm` for a complete example with a *LightGBM* model.
198197

199198
Titanic example
200199
===============
201200

202201
The first example was a simplified pipeline coming from *scikit-learn*'s documentation:
203202
`Column Transformer with Mixed Types <https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py>`_.
204203
The full story is available in a runnable example: :ref:`example-complex-pipeline`
205-
which also shows up some mistakes that a user could come accross
204+
which also shows some mistakes that a user could come across
206205
when trying to convert a pipeline.
207206

208207
Parameterize the conversion
209208
===========================
210209

211-
Most of the converter do not require specific options
212-
to convert a *scikit-learn* model. It always produces the same
210+
Most of the converters do not require specific options
211+
to convert a *scikit-learn* model and produce the same
213212
results. However, in some cases, the conversion cannot produce
214213
a model which returns the exact same results. The user may want
215214
to optimize the conversion by giving the converter additional
@@ -220,16 +219,16 @@ pipeline. That why the option mechanism was implemented:
220219
Investigate discrepencies
221220
=========================
222221

223-
A wrong converter may introduce introduce discrepencies
224-
in a converter pipeline but it is not alway easy to
222+
A wrong converter may introduce discrepancies
223+
in a converted pipeline but it is not always easy to
225224
isolate the source of the differences. The function
226225
:func:`collect_intermediate_steps
227226
<skl2onnx.helpers.collect_intermediate_steps>`
228-
may then be used to investigate each component independently.
229-
The following piece of code is extracted from unit test
227+
may be used to investigate each component independently.
228+
The following piece of code is taken from unit test
230229
`test_investigate.py <https://github.com/onnx/sklearn-onnx/
231230
blob/main/tests/test_investigate.py>`_ and converts
232-
a pipeline and each of its components independently.
231+
a pipeline and each of its components independently:
233232

234233
::
235234

0 commit comments

Comments
 (0)