forked from rstudio/mleap
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
95 lines (70 loc) · 2.76 KB
/
Copy pathREADME.Rmd
File metadata and controls
95 lines (70 loc) · 2.76 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: "R interface for MLeap"
output:
github_document:
fig_width: 9
fig_height: 5
---
```{r setup, include = FALSE}
library(mleap)
```
[](https://travis-ci.org/rstudio/mleap) [](https://codecov.io/github/rstudio/mleap?branch=master) [](https://cran.r-project.org/package=mleap)
**mleap** is a [sparklyr](http://spark.rstudio.com/) extension that provides an interface to [MLeap](https://github.com/combust/mleap), which allows us to take Spark pipelines to production.
## Getting started
**mleap** can be installed from CRAN via
```{r eval = FALSE}
install.packages("mleap")
```
or, for the latest development version from GitHub, using
```{r eval = FALSE}
devtools::install_github("rstudio/mleap")
```
Once mleap has been installed, we can install the external dependencies using
```{r eval = FALSE}
library(mleap)
install_maven()
# Alternatively, if you already have Maven installed, you can
# set options(maven.home = "path/to/maven")
install_mleap()
```
We can now export Spark ML pipelines from sparklyr.
```{r, message = FALSE}
library(sparklyr)
sc <- spark_connect(master = "local", version = "2.3.0")
mtcars_tbl <- sdf_copy_to(sc, mtcars, overwrite = TRUE)
# Create a pipeline and fit it
pipeline <- ml_pipeline(sc) %>%
ft_binarizer("hp", "big_hp", threshold = 100) %>%
ft_vector_assembler(c("big_hp", "wt", "qsec"), "features") %>%
ml_gbt_regressor(label_col = "mpg")
pipeline_model <- ml_fit(pipeline, mtcars_tbl)
# A transformed data frame with the appropriate schema is required
# for exporting the pipeline model
transformed_tbl <- ml_transform(pipeline_model, mtcars_tbl)
# Export model
model_path <- file.path(tempdir(), "mtcars_model.zip")
ml_write_bundle(pipeline_model, transformed_tbl, model_path)
# Disconnect from Spark
spark_disconnect(sc)
```
At this point, we can share `mtcars_model.zip` with our deployment/implementation engineers, and they would be able to embed the model in another application. See the [MLeap docs](http://mleap-docs.combust.ml/) for details.
We also provide R functions for testing that the saved models behave as expected. Here we load the previously saved model:
```{r}
model <- mleap_load_bundle(model_path)
model
```
We can retrieve the schema associated with the model:
```{r}
mleap_model_schema(model)
```
Then, we create a new data frame to be scored, and make predictions using our model:
```{r}
newdata <- tibble::tribble(
~qsec, ~hp, ~wt,
16.2, 101, 2.68,
18.1, 99, 3.08
)
# Transform the data frame
transformed_df <- mleap_transform(model, newdata)
dplyr::glimpse(transformed_df)
```