Skip to content

Commit ab5cfd2

Browse files
authored
Merge pull request #47 from Mu-Sigma/develop
Corrections to vignettes
2 parents d53b6ad + a09e945 commit ab5cfd2

File tree

2 files changed

+8
-8
lines changed

2 files changed

+8
-8
lines changed

vignettes/Interoperable_Pipelines.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,7 @@ opWithFilter %>>% getOutputById(2)
231231

232232
Finally, we show a case, where sequential filtering steps are performed in Spark, before visualizing in R, and running a decision tree model in Python.
233233

234-
Note, that in this case, we register `getTargetForPyClassifcation` and `getTargetForPyClassification` as *non-data* functions. In this particular pipeline, there is no main *path* as such, as the pipeline branches into 2 paths - one in R and the other in Python. In such cases, using `outAsIn` or the `dataFunction` parameter with formula semantics is just a **question of convenience**. If the first argument of a *non-data* function is of a data frame class in R, Python (Pandas) or Spark, the package automatically performs type conversions when environments are switched (R -> Spark, Spark -> Python, and so on).
234+
Note, that in this case, `getTargetForPyClassifcation` and `getTargetForPyClassification` have been registered as *data* functions. Type conversions between R, Spark and Python for data functions are performed automatically by the package.
235235

236236
```{r}
237237
pipelineObj %>>% filterData_spark(condition = "Species == 'setosa' or Species == 'virginica'") %>>%

vignettes/Streaming_pipelines_for_working_Apache_Spark_Structured_Streaming.Rmd

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,9 @@ knitr::opts_chunk$set(
5151
library(analysisPipelines)
5252
library(SparkR)
5353
54-
## Define these variables as per the configuration of your machine. This is just an example.
54+
## Define these variables as per the configuration of your machine. The below example is just illustrative.
5555
56-
sparkHome <- "/Users/naren/softwares/spark-2.3.1-bin-hadoop2.7/"
56+
sparkHome <- "/path/to/spark/directory/"
5757
sparkMaster <- "local[1]"
5858
sparkPackages <- c("org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1")
5959
# Set spark home variable if not present
@@ -81,10 +81,10 @@ This example illustrates usage of pipelines for a streaming application. In this
8181
Read streaming data from Kafka.
8282

8383
```{r}
84-
## Define these variables as per the configuration of your machine. This is just an example.
84+
## Define these variables as per the configuration of your machine. The below example is just illustrative.
8585
86-
kafkaBootstrapServers <- "172.25.0.144:9092,172.25.0.98:9092,172.25.0.137:9092"
87-
consumerTopic <- "netlogo"
86+
kafkaBootstrapServers <- "192.168.0.256:9092,192.168.0.257:9092,192.168.0.258:9092"
87+
consumerTopic <- "topic1"
8888
streamObj <- read.stream(source = "kafka", kafka.bootstrap.servers = kafkaBootstrapServers, subscribe = consumerTopic, startingOffsets="earliest")
8989
printSchema(streamObj)
9090
```
@@ -95,7 +95,7 @@ Users can define their own functions and use it as a part of the pipeline. These
9595

9696
```{r}
9797
98-
# Function to convert datatype json struct to colums
98+
# Function to convert datatype json struct to columns
9999
convertStructToDf <- function(streamObj) {
100100
streamObj <- SparkR::select(streamObj,list(getField(streamObj$`jsontostructs(value)`,"bannerId"),
101101
getField(streamObj$`jsontostructs(value)`,"mobile"),
@@ -131,7 +131,7 @@ castDfColumns <- function(streamObj) {
131131
return (streamObj)
132132
}
133133
134-
# Function to convert datatype json struct to colums
134+
# Function to convert datatype json struct to columns
135135
convertDfToKafkaKeyValuePairs <- function (streamObj, kafkaKey) {
136136
streamObj <- SparkR::toJSON(streamObj)
137137
streamObj$key <- kafkaKey

0 commit comments

Comments
 (0)