You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ Explore the following resources to get started with Dagger:
39
39
*[Reference](https://odpf.github.io/dagger/docs/reference/overview) contains details about configurations, metrics and other aspects of Dagger.
40
40
*[Contribute](https://odpf.github.io/dagger/docs/contribute/contribution) contains resources for anyone who wants to contribute to Dagger.
41
41
*[Usecase](https://odpf.github.io/dagger/docs/usecase/overview) describes examples use cases which can be solved via Dagger.
42
-
42
+
*[Examples](https://odpf.github.io/dagger/docs/examples/overview) contains tutorials to try out some of Dagger's features with real-world usecases
43
43
## Running locally
44
44
45
45
Please follow this [Dagger Quickstart Guide](https://odpf.github.io/dagger/docs/guides/quickstart) for setting up a local running Dagger consuming from Kafka or to set up a Docker Compose for Dagger.
In this example, we will count the number of booking orders,(as Kafka records) in every 30 second interval. By the end of this example we will understand how to use Dagger to aggregate data over a specified time window.
5
+
6
+
7
+
## Before Trying This Example
8
+
9
+
10
+
1.**You must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
11
+
2. Clone Dagger repository into your local
12
+
13
+
```shell
14
+
git clone https://github.com/odpf/dagger.git
15
+
```
16
+
17
+
## Steps
18
+
19
+
Following are the steps for setting up dagger in docker compose -
20
+
21
+
1. cd into the aggregation directory:
22
+
```shell
23
+
cd dagger/quickstart/examples/aggregation/tumble_window
24
+
```
25
+
2. fire this command to spin up the docker compose:
26
+
```shell
27
+
docker compose up
28
+
```
29
+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the count of booking orders in every 30 second interval.
30
+
3. fire this command to gracefully close the docker compose:
31
+
```shell
32
+
docker compose down
33
+
```
34
+
This will stop all services and remove all the containers.
35
+
36
+
Congratulations, we are now able to use Dagger for performing aggregation over a tumble window!
In this example, we will use the DeDuplication Transformer in Dagger to remove the booking orders (as Kafka records) having duplicate `order_number`. By the end of this example we will understand how to use Dagger to remove duplicate data from Kafka source.
5
+
6
+
7
+
## Before Trying This Example
8
+
9
+
10
+
1.**We must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
11
+
2. Clone Dagger repository into your local
12
+
13
+
```shell
14
+
git clone https://github.com/odpf/dagger.git
15
+
```
16
+
17
+
## Steps
18
+
19
+
Following are the steps for setting up dagger in docker compose -
20
+
21
+
1. cd into the aggregation directory:
22
+
```shell
23
+
cd dagger/quickstart/examples/aggregation/tumble_window
24
+
```
25
+
2. fire this command to spin up the docker compose:
26
+
```shell
27
+
docker compose up
28
+
```
29
+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the booking logs without any duplicate `order_number`.
30
+
3. fire this command to gracefully close the docker compose:
31
+
```shell
32
+
docker compose down
33
+
```
34
+
This will stop and remove all the containers.
35
+
36
+
Congratulations, we are now able to use Dagger to remove duplicate data from Kafka source!
In this example, we will use a User-Defined Function in Dagger to compute the distance between the driver pickup location and the driver dropoff location for each booking log (as Kafka record) . By the end of this example we will understand how to use Dagger UDFs to add more functionality and simplify our queries.
5
+
6
+
7
+
## Before Trying This Example
8
+
9
+
10
+
1.**We must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
11
+
2. Clone Dagger repository into your local
12
+
13
+
```shell
14
+
git clone https://github.com/odpf/dagger.git
15
+
```
16
+
17
+
## Steps
18
+
19
+
Following are the steps for setting up dagger in docker compose -
20
+
21
+
1. cd into the aggregation directory:
22
+
```shell
23
+
cd dagger/quickstart/examples/aggregation/tumble_window
24
+
```
25
+
2. fire this command to spin up the docker compose:
26
+
```shell
27
+
docker compose up
28
+
```
29
+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the distance between the driver pickup location and the driver dropoff location for each booking log.
30
+
3. fire this command to gracefully close the docker compose:
31
+
```shell
32
+
docker compose down
33
+
```
34
+
This will stop and remove all the containers.
35
+
36
+
Congratulations, we are now able to use Dagger UDF to calculate distance easily!
In this example, we will use Dagger Post-processors to enrich the payment transaction logs (from Kafka source), in the input stream with user profile information from an external source i.e. Elasticsearch, to get the user profile information in each record. At the end of this example, we will be able to use Dagger to enrich our data stream from Kafka with the data on any remote ElasticSearch server.
5
+
6
+
## Before Trying This Example
7
+
8
+
9
+
1.**You must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
10
+
2. Clone Dagger repository into your local
11
+
12
+
```shell
13
+
git clone https://github.com/odpf/dagger.git
14
+
```
15
+
16
+
## Steps
17
+
18
+
Following are the steps for setting up dagger in docker compose -
19
+
20
+
1. cd into the aggregation directory:
21
+
```shell
22
+
cd dagger/quickstart/examples/enrichment/elasticsearch_enrichment
23
+
```
24
+
2. fire this command to spin up the docker compose:
25
+
```shell
26
+
docker compose up
27
+
```
28
+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the enriched booking log with the customer profile information.
29
+
3. fire this command to gracefully close the docker compose:
30
+
```shell
31
+
docker compose down
32
+
```
33
+
This will stop all services and remove all the containers.
34
+
35
+
Congratulations, we are now able to use Dagger to enrich our data stream from Kafka with the data on any remote ElasticSearch server.
In this example, we will use the Inner joins in Dagger to join the data streams from two different Kafka topics and count the number of booking logs in every 30 second interval from both the sources combined for each service type. By the end of this example we will understand how to use inner joins to combine 2 or more Kafka streams.
5
+
6
+
7
+
## Before Trying This Example
8
+
9
+
10
+
1.**We must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
11
+
2. Clone Dagger repository into your local
12
+
13
+
```shell
14
+
git clone https://github.com/odpf/dagger.git
15
+
```
16
+
17
+
## Steps
18
+
19
+
Following are the steps for setting up dagger in docker compose -
20
+
21
+
1. cd into the aggregation directory:
22
+
```shell
23
+
cd dagger/quickstart/examples/aggregation/tumble_window
24
+
```
25
+
2. fire this command to spin up the docker compose:
26
+
```shell
27
+
docker compose up
28
+
```
29
+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the number of booking logs in every 30 second interval from both the Kafka sources combined, for each service type.
30
+
3. fire this command to gracefully close the docker compose:
31
+
```shell
32
+
docker compose down
33
+
```
34
+
This will stop and remove all the containers.
35
+
36
+
Congratulations, we are now able to use Dagger to combine 2 or more Kafka streams.!
Copy file name to clipboardExpand all lines: docs/docs/guides/quickstart.md
+56-30
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,67 @@
1
1
# Dagger Quickstart
2
2
3
-
## Prerequisites
3
+
There are 2 ways to set up and get dagger running in your machine in no time -
4
+
1.**[Docker Compose Setup](quickstart.md#docker-compose-setup)** - recommended for beginners
5
+
2.**[Local Installation Setup](quickstart.md#local-installation-setup)** - for more advanced usecases
6
+
7
+
## Docker Compose Setup
8
+
9
+
### Prerequisites
10
+
11
+
1.**You must have docker installed**
12
+
13
+
Following are the steps for setting up dagger in docker compose -
14
+
1. Clone Dagger repository into your local
15
+
16
+
```shell
17
+
git clone https://github.com/odpf/dagger.git
18
+
```
19
+
2. cd into the docker-compose directory:
20
+
```shell
21
+
cd dagger/quickstart/docker-compose
22
+
```
23
+
3. fire this command to spin up the docker compose:
24
+
```shell
25
+
docker compose up
26
+
```
27
+
This will spin up docker containers for the kafka, zookeeper, stencil, kafka-producer and the dagger.
28
+
4. fire this command to gracefully stop all the docker containers. This will save the container state and help to speed up the setup next time. All the kafka records and topics will also be saved :
29
+
```shell
30
+
docker compose stop
31
+
```
32
+
To start the containers from their saved state run this command
33
+
```shell
34
+
docker compose start
35
+
```
36
+
5. fire this command to gracefully remove all the containers. This will delete all the kafka topics/ saved data as well:
37
+
```shell
38
+
docker compose down
39
+
```
40
+
41
+
### Workflow
42
+
43
+
Following are the containers that are created, in chronological order, when you run `docker compose up` -
44
+
45
+
1.**Zookeeper** - Container for the Zookeeper service is created and listening on port 2187. Zookeeper is a service required by the Kafka server.
46
+
2.**Kafka** - Container for Kafka server is created and is exposed on port 29094. This will serve as the input data source for the Dagger.
47
+
3.**init-kafka** - This container creates the kafka topic `dagger-test-topic-v1` from which the dagger will pull the Kafka messages.
48
+
4.**Stencil** - It compiles the proto file and creates a proto descriptor. Also it sets up an http server serving the proto descriptors required by dagger to parse the Kafka messages.
49
+
5.**kafka-producer** - It runs a script to generate the random kafka messages and sends one message to the kafka topic every second.
50
+
6.**Dagger** - Clones the Dagger Github repository and builds the jar. Then it creates an in-memory flink cluster and uploads the dagger job jar and starts the job.
51
+
52
+
The dagger environment variables are present in the `local.properties` file inside the `quickstart/docker-compose/resources` directory. The dagger runs a simple aggregation query which will count the number of bookings , i.e. kafka messages, in every 30 seconds interval. The output will be visible in the logs in the terminal itself. You can edit this query (`FLINK_SQL_QUERY` variable) in the `local.properties` file inside the `quickstart/docker-compose/resources` directory.
53
+
54
+
## Local Installation Setup
55
+
56
+
### Prerequisites
4
57
5
58
1.**Your Java version is Java 8**: Dagger as of now works only with Java 8. Some features might not work with older or later versions.
6
59
2. Your **Kafka** version is **3.0.0** or a minor version of it
7
60
3. You have **kcat** installed: We will use kcat to push messages to Kafka from the CLI. You can follow the installation steps [here](https://github.com/edenhill/kcat). Ensure the version you install is 1.7.0 or a minor version of it.
8
61
4. You have **protobuf** installed: We will use protobuf to push messages encoded in protobuf format to Kafka topic. You can follow the installation steps for MacOS [here](https://formulae.brew.sh/formula/protobuf). For other OS, please download the corresponding release from [here](https://github.com/protocolbuffers/protobuf/releases). Please note, this quickstart has been written to work with[ 3.17.3](https://github.com/protocolbuffers/protobuf/releases/tag/v3.17.3) of protobuf. Compatibility with other versions is unknown.
9
62
5. You have **Python 2.7+** and **simple-http-server** installed: We will use Python along with simple-http-server to spin up a mock Stencil server which can serve the proto descriptors to Dagger. To install **simple-http-server**, please follow these [installation steps](https://pypi.org/project/simple-http-server/).
10
63
11
-
## Quickstart
64
+
###Quickstart
12
65
13
66
1. Clone Dagger repository into your local
14
67
@@ -52,7 +105,7 @@ The Stencil client being used in Dagger will fetch it by calling this URL. This
52
105
53
106
After some initialization logs, you should see the output of the SQL query getting printed.
54
107
55
-
## Troubleshooting
108
+
###Troubleshooting
56
109
57
110
1.**I am pushing messages to the kafka topic but not seeing any output in the logs.**
58
111
@@ -65,30 +118,3 @@ After some initialization logs, you should see the output of the SQL query getti
65
118
2.**I see an exception `java.lang.RuntimeException: Unable to retrieve any partitions with KafkaTopicsDescriptor: Topic Regex Pattern`**
66
119
67
120
This can happen if the topic configured under `STREAMS` -> `SOURCE_KAFKA_TOPIC_NAMES` in `local.properties` is new and you have not pushed any messages to it yet. Ensure that you have pushed atleast one message to the topic before you start dagger.
68
-
69
-
## Docker Compose Setup
70
-
71
-
### Prerequisites
72
-
73
-
1.**You must have docker installed**
74
-
75
-
Following are the steps for setting up dagger in docker compose -
76
-
1. Clone Dagger repository into your local
77
-
78
-
```shell
79
-
git clone https://github.com/odpf/dagger.git
80
-
```
81
-
2. cd into the docker-compose directory:
82
-
```shell
83
-
cd dagger/quickstart/docker-compose
84
-
```
85
-
3. fire this command to spin up the docker compose:
86
-
```shell
87
-
docker compose up
88
-
```
89
-
This will spin up docker containers for the kafka, zookeeper, stencil, kafka-producer and the dagger.
90
-
4. fire this command to gracefully close the docker compose:
0 commit comments