Skip to content

Commit 203524c

Browse files
docs: dagger examples using docker compose setup (#214)
1 parent 0e36063 commit 203524c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+1944
-33
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Explore the following resources to get started with Dagger:
3939
* [Reference](https://odpf.github.io/dagger/docs/reference/overview) contains details about configurations, metrics and other aspects of Dagger.
4040
* [Contribute](https://odpf.github.io/dagger/docs/contribute/contribution) contains resources for anyone who wants to contribute to Dagger.
4141
* [Usecase](https://odpf.github.io/dagger/docs/usecase/overview) describes examples use cases which can be solved via Dagger.
42-
42+
* [Examples](https://odpf.github.io/dagger/docs/examples/overview) contains tutorials to try out some of Dagger's features with real-world usecases
4343
## Running locally
4444

4545
Please follow this [Dagger Quickstart Guide](https://odpf.github.io/dagger/docs/guides/quickstart) for setting up a local running Dagger consuming from Kafka or to set up a Docker Compose for Dagger.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Data Aggregation using a Tumble Window
2+
3+
## About this example
4+
In this example, we will count the number of booking orders,(as Kafka records) in every 30 second interval. By the end of this example we will understand how to use Dagger to aggregate data over a specified time window.
5+
6+
7+
## Before Trying This Example
8+
9+
10+
1. **You must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
11+
2. Clone Dagger repository into your local
12+
13+
```shell
14+
git clone https://github.com/odpf/dagger.git
15+
```
16+
17+
## Steps
18+
19+
Following are the steps for setting up dagger in docker compose -
20+
21+
1. cd into the aggregation directory:
22+
```shell
23+
cd dagger/quickstart/examples/aggregation/tumble_window
24+
```
25+
2. fire this command to spin up the docker compose:
26+
```shell
27+
docker compose up
28+
```
29+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the count of booking orders in every 30 second interval.
30+
3. fire this command to gracefully close the docker compose:
31+
```shell
32+
docker compose down
33+
```
34+
This will stop all services and remove all the containers.
35+
36+
Congratulations, we are now able to use Dagger for performing aggregation over a tumble window!
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Removing duplicate records using Transformers
2+
3+
## About this example
4+
In this example, we will use the DeDuplication Transformer in Dagger to remove the booking orders (as Kafka records) having duplicate `order_number`. By the end of this example we will understand how to use Dagger to remove duplicate data from Kafka source.
5+
6+
7+
## Before Trying This Example
8+
9+
10+
1. **We must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
11+
2. Clone Dagger repository into your local
12+
13+
```shell
14+
git clone https://github.com/odpf/dagger.git
15+
```
16+
17+
## Steps
18+
19+
Following are the steps for setting up dagger in docker compose -
20+
21+
1. cd into the aggregation directory:
22+
```shell
23+
cd dagger/quickstart/examples/aggregation/tumble_window
24+
```
25+
2. fire this command to spin up the docker compose:
26+
```shell
27+
docker compose up
28+
```
29+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the booking logs without any duplicate `order_number`.
30+
3. fire this command to gracefully close the docker compose:
31+
```shell
32+
docker compose down
33+
```
34+
This will stop and remove all the containers.
35+
36+
Congratulations, we are now able to use Dagger to remove duplicate data from Kafka source!
+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Distance computation using Java UDF
2+
3+
## About this example
4+
In this example, we will use a User-Defined Function in Dagger to compute the distance between the driver pickup location and the driver dropoff location for each booking log (as Kafka record) . By the end of this example we will understand how to use Dagger UDFs to add more functionality and simplify our queries.
5+
6+
7+
## Before Trying This Example
8+
9+
10+
1. **We must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
11+
2. Clone Dagger repository into your local
12+
13+
```shell
14+
git clone https://github.com/odpf/dagger.git
15+
```
16+
17+
## Steps
18+
19+
Following are the steps for setting up dagger in docker compose -
20+
21+
1. cd into the aggregation directory:
22+
```shell
23+
cd dagger/quickstart/examples/aggregation/tumble_window
24+
```
25+
2. fire this command to spin up the docker compose:
26+
```shell
27+
docker compose up
28+
```
29+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the distance between the driver pickup location and the driver dropoff location for each booking log.
30+
3. fire this command to gracefully close the docker compose:
31+
```shell
32+
docker compose down
33+
```
34+
This will stop and remove all the containers.
35+
36+
Congratulations, we are now able to use Dagger UDF to calculate distance easily!
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Stream enrichment using ElasticSearch source
2+
3+
## About this example
4+
In this example, we will use Dagger Post-processors to enrich the payment transaction logs (from Kafka source), in the input stream with user profile information from an external source i.e. Elasticsearch, to get the user profile information in each record. At the end of this example, we will be able to use Dagger to enrich our data stream from Kafka with the data on any remote ElasticSearch server.
5+
6+
## Before Trying This Example
7+
8+
9+
1. **You must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
10+
2. Clone Dagger repository into your local
11+
12+
```shell
13+
git clone https://github.com/odpf/dagger.git
14+
```
15+
16+
## Steps
17+
18+
Following are the steps for setting up dagger in docker compose -
19+
20+
1. cd into the aggregation directory:
21+
```shell
22+
cd dagger/quickstart/examples/enrichment/elasticsearch_enrichment
23+
```
24+
2. fire this command to spin up the docker compose:
25+
```shell
26+
docker compose up
27+
```
28+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the enriched booking log with the customer profile information.
29+
3. fire this command to gracefully close the docker compose:
30+
```shell
31+
docker compose down
32+
```
33+
This will stop all services and remove all the containers.
34+
35+
Congratulations, we are now able to use Dagger to enrich our data stream from Kafka with the data on any remote ElasticSearch server.
+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Joining two Kafka topics using Inner join
2+
3+
## About this example
4+
In this example, we will use the Inner joins in Dagger to join the data streams from two different Kafka topics and count the number of booking logs in every 30 second interval from both the sources combined for each service type. By the end of this example we will understand how to use inner joins to combine 2 or more Kafka streams.
5+
6+
7+
## Before Trying This Example
8+
9+
10+
1. **We must have Docker installed**. We can follow [this guide](https://docs.docker.com/get-docker/) on how to install and set up Docker in your local machine.
11+
2. Clone Dagger repository into your local
12+
13+
```shell
14+
git clone https://github.com/odpf/dagger.git
15+
```
16+
17+
## Steps
18+
19+
Following are the steps for setting up dagger in docker compose -
20+
21+
1. cd into the aggregation directory:
22+
```shell
23+
cd dagger/quickstart/examples/aggregation/tumble_window
24+
```
25+
2. fire this command to spin up the docker compose:
26+
```shell
27+
docker compose up
28+
```
29+
Hang on for a while as it installs all the required dependencies and starts all the required services. After a while we should see the output of the Dagger SQL query in the terminal, which will be the number of booking logs in every 30 second interval from both the Kafka sources combined, for each service type.
30+
3. fire this command to gracefully close the docker compose:
31+
```shell
32+
docker compose down
33+
```
34+
This will stop and remove all the containers.
35+
36+
Congratulations, we are now able to use Dagger to combine 2 or more Kafka streams.!

docs/docs/examples/overview.md

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Overview
2+
3+
The following example tutorials will help you to quickly try out some of Dagger's most useful features with real-world usecases -
4+
5+
- [Data Aggregation using a Tumble Window](../examples/aggregation_tumble_window.md)
6+
- [Removing duplicate records using Transformers](../examples/deduplication_transformer.md)
7+
- [Distance computation using Java UDF](../examples/distance_java_udf.md)
8+
- [Stream enrichment using ElasticSearch source](../examples/elasticsearch_enrichment.md)
9+
- [Joining two Kafka topics using Inner join](../examples/kafka_inner_join.md)

docs/docs/guides/quickstart.md

+56-30
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,67 @@
11
# Dagger Quickstart
22

3-
## Prerequisites
3+
There are 2 ways to set up and get dagger running in your machine in no time -
4+
1. **[Docker Compose Setup](quickstart.md#docker-compose-setup)** - recommended for beginners
5+
2. **[Local Installation Setup](quickstart.md#local-installation-setup)** - for more advanced usecases
6+
7+
## Docker Compose Setup
8+
9+
### Prerequisites
10+
11+
1. **You must have docker installed**
12+
13+
Following are the steps for setting up dagger in docker compose -
14+
1. Clone Dagger repository into your local
15+
16+
```shell
17+
git clone https://github.com/odpf/dagger.git
18+
```
19+
2. cd into the docker-compose directory:
20+
```shell
21+
cd dagger/quickstart/docker-compose
22+
```
23+
3. fire this command to spin up the docker compose:
24+
```shell
25+
docker compose up
26+
```
27+
This will spin up docker containers for the kafka, zookeeper, stencil, kafka-producer and the dagger.
28+
4. fire this command to gracefully stop all the docker containers. This will save the container state and help to speed up the setup next time. All the kafka records and topics will also be saved :
29+
```shell
30+
docker compose stop
31+
```
32+
To start the containers from their saved state run this command
33+
```shell
34+
docker compose start
35+
```
36+
5. fire this command to gracefully remove all the containers. This will delete all the kafka topics/ saved data as well:
37+
```shell
38+
docker compose down
39+
```
40+
41+
### Workflow
42+
43+
Following are the containers that are created, in chronological order, when you run `docker compose up` -
44+
45+
1. **Zookeeper** - Container for the Zookeeper service is created and listening on port 2187. Zookeeper is a service required by the Kafka server.
46+
2. **Kafka** - Container for Kafka server is created and is exposed on port 29094. This will serve as the input data source for the Dagger.
47+
3. **init-kafka** - This container creates the kafka topic `dagger-test-topic-v1` from which the dagger will pull the Kafka messages.
48+
4. **Stencil** - It compiles the proto file and creates a proto descriptor. Also it sets up an http server serving the proto descriptors required by dagger to parse the Kafka messages.
49+
5. **kafka-producer** - It runs a script to generate the random kafka messages and sends one message to the kafka topic every second.
50+
6. **Dagger** - Clones the Dagger Github repository and builds the jar. Then it creates an in-memory flink cluster and uploads the dagger job jar and starts the job.
51+
52+
The dagger environment variables are present in the `local.properties` file inside the `quickstart/docker-compose/resources` directory. The dagger runs a simple aggregation query which will count the number of bookings , i.e. kafka messages, in every 30 seconds interval. The output will be visible in the logs in the terminal itself. You can edit this query (`FLINK_SQL_QUERY` variable) in the `local.properties` file inside the `quickstart/docker-compose/resources` directory.
53+
54+
## Local Installation Setup
55+
56+
### Prerequisites
457

558
1. **Your Java version is Java 8**: Dagger as of now works only with Java 8. Some features might not work with older or later versions.
659
2. Your **Kafka** version is **3.0.0** or a minor version of it
760
3. You have **kcat** installed: We will use kcat to push messages to Kafka from the CLI. You can follow the installation steps [here](https://github.com/edenhill/kcat). Ensure the version you install is 1.7.0 or a minor version of it.
861
4. You have **protobuf** installed: We will use protobuf to push messages encoded in protobuf format to Kafka topic. You can follow the installation steps for MacOS [here](https://formulae.brew.sh/formula/protobuf). For other OS, please download the corresponding release from [here](https://github.com/protocolbuffers/protobuf/releases). Please note, this quickstart has been written to work with[ 3.17.3](https://github.com/protocolbuffers/protobuf/releases/tag/v3.17.3) of protobuf. Compatibility with other versions is unknown.
962
5. You have **Python 2.7+** and **simple-http-server** installed: We will use Python along with simple-http-server to spin up a mock Stencil server which can serve the proto descriptors to Dagger. To install **simple-http-server**, please follow these [installation steps](https://pypi.org/project/simple-http-server/).
1063

11-
## Quickstart
64+
### Quickstart
1265

1366
1. Clone Dagger repository into your local
1467

@@ -52,7 +105,7 @@ The Stencil client being used in Dagger will fetch it by calling this URL. This
52105

53106
After some initialization logs, you should see the output of the SQL query getting printed.
54107

55-
## Troubleshooting
108+
### Troubleshooting
56109

57110
1. **I am pushing messages to the kafka topic but not seeing any output in the logs.**
58111

@@ -65,30 +118,3 @@ After some initialization logs, you should see the output of the SQL query getti
65118
2. **I see an exception `java.lang.RuntimeException: Unable to retrieve any partitions with KafkaTopicsDescriptor: Topic Regex Pattern`**
66119

67120
This can happen if the topic configured under `STREAMS` -> `SOURCE_KAFKA_TOPIC_NAMES` in `local.properties` is new and you have not pushed any messages to it yet. Ensure that you have pushed atleast one message to the topic before you start dagger.
68-
69-
## Docker Compose Setup
70-
71-
### Prerequisites
72-
73-
1. **You must have docker installed**
74-
75-
Following are the steps for setting up dagger in docker compose -
76-
1. Clone Dagger repository into your local
77-
78-
```shell
79-
git clone https://github.com/odpf/dagger.git
80-
```
81-
2. cd into the docker-compose directory:
82-
```shell
83-
cd dagger/quickstart/docker-compose
84-
```
85-
3. fire this command to spin up the docker compose:
86-
```shell
87-
docker compose up
88-
```
89-
This will spin up docker containers for the kafka, zookeeper, stencil, kafka-producer and the dagger.
90-
4. fire this command to gracefully close the docker compose:
91-
```shell
92-
docker compose down
93-
```
94-

docs/docs/intro.md

+1
Original file line numberDiff line numberDiff line change
@@ -41,3 +41,4 @@ Explore the following resources to get started with Dagger:
4141
- [Reference](./reference/overview.md) contains details about configurations, metrics and other aspects of Dagger.
4242
- [Contribute](./contribute/contribution.md) contains resources for anyone who wants to contribute to Dagger.
4343
- [Usecase](./usecase/overview.md) describes examples use cases which can be solved via Dagger.
44+
- [Examples](./examples/overview.md) contains tutorials to try out some of Dagger's features with real-world usecases

docs/sidebars.js

+12
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,18 @@ module.exports = {
6262
"reference/udfs"
6363
],
6464
},
65+
{
66+
type: "category",
67+
label: "Examples",
68+
items: [
69+
"examples/overview",
70+
"examples/aggregation_tumble_window",
71+
"examples/deduplication_transformer",
72+
"examples/distance_java_udf",
73+
"examples/elasticsearch_enrichment",
74+
"examples/kafka_inner_join"
75+
],
76+
},
6577
{
6678
type: "category",
6779
label: "Contribute",

0 commit comments

Comments
 (0)