You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Finally, we show a case, where sequential filtering steps are performed in Spark, before visualizing in R, and running a decision tree model in Python.
233
233
234
-
Note, that in this case, we register `getTargetForPyClassifcation` and `getTargetForPyClassification` as *non-data* functions. In this particular pipeline, there is no main *path* as such, as the pipeline branches into 2 paths - one in R and the other in Python. In such cases, using `outAsIn` or the `dataFunction` parameter with formula semantics is just a **question of convenience**. If the first argument of a *non-data* function is of a data frame class in R, Python (Pandas) or Spark, the package automatically performs type conversions when environments are switched (R -> Spark, Spark -> Python, and so on).
234
+
Note, that in this case, `getTargetForPyClassifcation` and `getTargetForPyClassification`have been registered as *data* functions. Type conversions between R, Spark and Python for data functions are performed automatically by the package.
235
235
236
236
```{r}
237
237
pipelineObj %>>% filterData_spark(condition = "Species == 'setosa' or Species == 'virginica'") %>>%
0 commit comments