Skip to content

Commit 1ee28ab

Browse files
authored
chore: rename to raystack (#226)
SSL/TLS support for Dagger Kafka Source Add support for kafka producer config linger.ms in kafka sink feat: stencil schema auto refresh and fix typehandler bug fix:[longbow] excluded module google-cloud-bigtable from minimaljar feat: bump version to 0.9.6
1 parent 203524c commit 1ee28ab

File tree

766 files changed

+5709
-4100
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

766 files changed

+5709
-4100
lines changed

.github/workflows/core_dependencies.yml renamed to .github/workflows/dependencies-publish.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: core-dependencies
1+
name: dependencies-publish
22

33
on:
44
workflow_dispatch

CHANGELOG.md

-52
This file was deleted.

README.md

+35-27
Original file line numberDiff line numberDiff line change
@@ -1,57 +1,63 @@
11
# Dagger
2-
![build workflow](https://github.com/odpf/dagger/actions/workflows/build.yml/badge.svg)
3-
![package workflow](https://github.com/odpf/dagger/actions/workflows/package.yml/badge.svg)
2+
3+
![build workflow](https://github.com/raystack/dagger/actions/workflows/build.yml/badge.svg)
4+
![package workflow](https://github.com/raystack/dagger/actions/workflows/package.yml/badge.svg)
45
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg?logo=apache)](LICENSE)
5-
[![Version](https://img.shields.io/github/v/release/odpf/dagger?logo=semantic-release)](https://github.com/odpf/dagger/releases/latest)
6+
[![Version](https://img.shields.io/github/v/release/raystack/dagger?logo=semantic-release)](https://github.com/raystack/dagger/releases/latest)
67

78
Dagger or Data Aggregator is an easy-to-use, configuration over code, cloud-native framework built on top of Apache Flink
8-
for stateful processing of data. With Dagger, you don't need to write custom applications or complicated code to process
9+
for stateful processing of data. With Dagger, you don't need to write custom applications or complicated code to process
910
data as a stream. Instead, you can write SQL queries and UDFs to do the processing and analysis on streaming data.
1011

1112
![](docs/static/img/overview/dagger_overview.png)
1213

1314
## Key Features
15+
1416
Discover why to use Dagger
1517

1618
- **Processing:** Dagger can transform, aggregate, join and enrich streaming data, both real-time and historical.
1719
- **Scale:** Dagger scales in an instant, both vertically and horizontally for high performance streaming sink and zero data drops.
1820
- **Extensibility:** Add your own sink to dagger with a clearly defined interface or choose from already provided ones. Use Kafka and/or Parquet Files as stream sources.
1921
- **Flexibility:** Add custom business logic in form of plugins \(UDFs, Transformers, Preprocessors and Post Processors\) independent of the core logic.
20-
- **Metrics:** Always know what’s going on with your deployment with built-in [monitoring](https://odpf.github.io/dagger/docs/reference/metrics) of throughput, response times, errors and more.
22+
- **Metrics:** Always know what’s going on with your deployment with built-in [monitoring](https://raystack.github.io/dagger/docs/reference/metrics) of throughput, response times, errors and more.
2123

2224
## What problems Dagger solves?
23-
* Map reduce -> [SQL](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html)
24-
* Enrichment -> [Post Processors](https://odpf.github.io/dagger/docs/advance/post_processor)
25-
* Aggregation -> [SQL](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html), [UDFs](https://odpf.github.io/dagger/docs/guides/use_udf)
26-
* Masking -> [Hash Transformer](https://odpf.github.io/dagger/docs/reference/transformers#HashTransformer)
27-
* Deduplication -> [Deduplication Transformer](https://odpf.github.io/dagger/docs/reference/transformers#DeDuplicationTransformer)
28-
* Realtime long window processing -> [Longbow](https://odpf.github.io/dagger/docs/advance/longbow)
2925

30-
To know more, follow the detailed [documentation](https://odpf.github.io/dagger/).
26+
- Map reduce -> [SQL](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html)
27+
- Enrichment -> [Post Processors](https://raystack.github.io/dagger/docs/advance/post_processor)
28+
- Aggregation -> [SQL](https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sql.html), [UDFs](https://raystack.github.io/dagger/docs/guides/use_udf)
29+
- Masking -> [Hash Transformer](https://raystack.github.io/dagger/docs/reference/transformers#HashTransformer)
30+
- Deduplication -> [Deduplication Transformer](https://raystack.github.io/dagger/docs/reference/transformers#DeDuplicationTransformer)
31+
- Realtime long window processing -> [Longbow](https://raystack.github.io/dagger/docs/advance/longbow)
32+
33+
To know more, follow the detailed [documentation](https://raystack.github.io/dagger/).
3134

3235
## Usage
3336

3437
Explore the following resources to get started with Dagger:
3538

36-
* [Guides](https://odpf.github.io/dagger/docs/guides/overview) provides guidance on [creating Dagger](https://odpf.github.io/dagger/docs/guides/create_dagger) with different sinks.
37-
* [Concepts](https://odpf.github.io/dagger/docs/concepts/overview) describes all important Dagger concepts.
38-
* [Advance](https://odpf.github.io/dagger/docs/advance/overview) contains details regarding advance features of Dagger.
39-
* [Reference](https://odpf.github.io/dagger/docs/reference/overview) contains details about configurations, metrics and other aspects of Dagger.
40-
* [Contribute](https://odpf.github.io/dagger/docs/contribute/contribution) contains resources for anyone who wants to contribute to Dagger.
41-
* [Usecase](https://odpf.github.io/dagger/docs/usecase/overview) describes examples use cases which can be solved via Dagger.
42-
* [Examples](https://odpf.github.io/dagger/docs/examples/overview) contains tutorials to try out some of Dagger's features with real-world usecases
39+
- [Guides](https://raystack.github.io/dagger/docs/guides/overview) provides guidance on [creating Dagger](https://raystack.github.io/dagger/docs/guides/create_dagger) with different sinks.
40+
- [Concepts](https://raystack.github.io/dagger/docs/concepts/overview) describes all important Dagger concepts.
41+
- [Advance](https://raystack.github.io/dagger/docs/advance/overview) contains details regarding advance features of Dagger.
42+
- [Reference](https://raystack.github.io/dagger/docs/reference/overview) contains details about configurations, metrics and other aspects of Dagger.
43+
- [Contribute](https://raystack.github.io/dagger/docs/contribute/contribution) contains resources for anyone who wants to contribute to Dagger.
44+
- [Usecase](https://raystack.github.io/dagger/docs/usecase/overview) describes examples use cases which can be solved via Dagger.
45+
- [Examples](https://raystack.github.io/dagger/docs/examples/overview) contains tutorials to try out some of Dagger's features with real-world usecases
46+
4347
## Running locally
4448

45-
Please follow this [Dagger Quickstart Guide](https://odpf.github.io/dagger/docs/guides/quickstart) for setting up a local running Dagger consuming from Kafka or to set up a Docker Compose for Dagger.
49+
Please follow this [Dagger Quickstart Guide](https://raystack.github.io/dagger/docs/guides/quickstart) for setting up a local running Dagger consuming from Kafka or to set up a Docker Compose for Dagger.
4650

47-
**Note:** Sample configuration for running a basic dagger can be found [here](https://odpf.github.io/dagger/docs/guides/create_dagger#common-configurations). For detailed configurations, refer [here](https://odpf.github.io/dagger/docs/reference/configuration).
51+
**Note:** Sample configuration for running a basic dagger can be found [here](https://raystack.github.io/dagger/docs/guides/create_dagger#common-configurations). For detailed configurations, refer [here](https://raystack.github.io/dagger/docs/reference/configuration).
4852

49-
Find more detailed steps on local setup [here](https://odpf.github.io/dagger/docs/guides/create_dagger).
53+
Find more detailed steps on local setup [here](https://raystack.github.io/dagger/docs/guides/create_dagger).
5054

5155
## Running on cluster
52-
Refer [here](https://odpf.github.io/dagger/docs/guides/deployment) for details regarding Dagger deployment.
5356

54-
## Running tests
57+
Refer [here](https://raystack.github.io/dagger/docs/guides/deployment) for details regarding Dagger deployment.
58+
59+
## Running tests
60+
5561
```sh
5662
# Running unit tests
5763
$ ./gradlew clean test
@@ -67,12 +73,14 @@ $ ./gradlew clean
6773

6874
Development of Dagger happens in the open on GitHub, and we are grateful to the community for contributing bug fixes and improvements. Read below to learn how you can take part in improving Dagger.
6975

70-
Read our [contributing guide](https://odpf.github.io/dagger/docs/contribute/contribution) to learn about our development process, how to propose bug fixes and improvements, and how to build and test your changes to Dagger.
76+
Read our [contributing guide](https://raystack.github.io/dagger/docs/contribute/contribution) to learn about our development process, how to propose bug fixes and improvements, and how to build and test your changes to Dagger.
7177

72-
To help you get your feet wet and get you familiar with our contribution process, we have a list of [good first issues](https://github.com/odpf/dagger/labels/good%20first%20issue) that contain bugs which have a relatively limited scope. This is a great place to get started.
78+
To help you get your feet wet and get you familiar with our contribution process, we have a list of [good first issues](https://github.com/raystack/dagger/labels/good%20first%20issue) that contain bugs which have a relatively limited scope. This is a great place to get started.
7379

7480
## Credits
75-
This project exists thanks to all the [contributors](https://github.com/odpf/dagger/graphs/contributors).
81+
82+
This project exists thanks to all the [contributors](https://github.com/raystack/dagger/graphs/contributors).
7683

7784
## License
85+
7886
Dagger is [Apache 2.0](LICENSE) licensed.

build.gradle

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ subprojects {
1515
apply plugin: 'idea'
1616
apply plugin: 'checkstyle'
1717

18-
group 'io.odpf'
18+
group 'org.raystack'
1919

2020
checkstyle {
2121
toolVersion '7.6.1'

dagger-common/build.gradle

+2-2
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,7 @@ dependencies {
5858
compileOnly group: 'org.apache.flink', name: 'flink-table', version: flinkVersion
5959
compileOnly group: 'org.apache.flink', name: 'flink-table-api-java-bridge_2.11', version: flinkVersion
6060
compileOnly group: 'org.apache.flink', name: 'flink-connector-kafka_2.11', version: flinkVersion
61+
compileOnly 'org.raystack:stencil:0.4.0'
6162

6263
dependenciesCommonJar ('org.apache.hadoop:hadoop-client:2.8.3') {
6364
exclude module:"commons-cli"
@@ -67,7 +68,6 @@ dependencies {
6768
dependenciesCommonJar 'org.apache.flink:flink-metrics-dropwizard:' + flinkVersion
6869
dependenciesCommonJar 'org.apache.flink:flink-json:' + flinkVersion
6970
dependenciesCommonJar 'com.jayway.jsonpath:json-path:2.4.0'
70-
dependenciesCommonJar 'io.odpf:stencil:0.2.1'
7171
dependenciesCommonJar 'com.google.code.gson:gson:2.8.2'
7272
dependenciesCommonJar 'org.apache.parquet:parquet-column:1.12.2'
7373

@@ -127,7 +127,7 @@ publishing {
127127
repositories {
128128
maven {
129129
name = "GitHubPackages"
130-
url = "https://maven.pkg.github.com/odpf/dagger"
130+
url = "https://maven.pkg.github.com/raystack/dagger"
131131
credentials {
132132
username = System.getenv("GITHUB_ACTOR")
133133
password = System.getenv("GITHUB_TOKEN")

dagger-common/src/main/java/io/odpf/dagger/common/serde/typehandler/TypeHandlerFactory.java

-57
This file was deleted.

dagger-common/src/main/java/io/odpf/dagger/common/configuration/Configuration.java renamed to dagger-common/src/main/java/org/raystack/dagger/common/configuration/Configuration.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
package io.odpf.dagger.common.configuration;
1+
package org.raystack.dagger.common.configuration;
22

33
import org.apache.flink.api.java.utils.ParameterTool;
44

dagger-common/src/main/java/io/odpf/dagger/common/core/Constants.java renamed to dagger-common/src/main/java/org/raystack/dagger/common/core/Constants.java

+12-2
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,24 @@
1-
package io.odpf.dagger.common.core;
1+
package org.raystack.dagger.common.core;
22

33
public class Constants {
44
public static final String SCHEMA_REGISTRY_STENCIL_ENABLE_KEY = "SCHEMA_REGISTRY_STENCIL_ENABLE";
55
public static final boolean SCHEMA_REGISTRY_STENCIL_ENABLE_DEFAULT = false;
66
public static final String SCHEMA_REGISTRY_STENCIL_URLS_KEY = "SCHEMA_REGISTRY_STENCIL_URLS";
77
public static final String SCHEMA_REGISTRY_STENCIL_URLS_DEFAULT = "";
88
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_TIMEOUT_MS = "SCHEMA_REGISTRY_STENCIL_FETCH_TIMEOUT_MS";
9-
public static final Integer SCHEMA_REGISTRY_STENCIL_FETCH_TIMEOUT_MS_DEFAULT = 60000;
9+
public static final Integer SCHEMA_REGISTRY_STENCIL_FETCH_TIMEOUT_MS_DEFAULT = 10000;
1010
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_HEADERS_KEY = "SCHEMA_REGISTRY_STENCIL_FETCH_HEADERS";
1111
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_HEADERS_DEFAULT = "";
12+
public static final String SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH_KEY = "SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH";
13+
public static final boolean SCHEMA_REGISTRY_STENCIL_CACHE_AUTO_REFRESH_DEFAULT = false;
14+
public static final String SCHEMA_REGISTRY_STENCIL_CACHE_TTL_MS_KEY = "SCHEMA_REGISTRY_STENCIL_CACHE_TTL_MS";
15+
public static final Long SCHEMA_REGISTRY_STENCIL_CACHE_TTL_MS_DEFAULT = 900000L;
16+
public static final String SCHEMA_REGISTRY_STENCIL_REFRESH_STRATEGY_KEY = "SCHEMA_REGISTRY_STENCIL_REFRESH_STRATEGY";
17+
public static final String SCHEMA_REGISTRY_STENCIL_REFRESH_STRATEGY_DEFAULT = "LONG_POLLING";
18+
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_BACKOFF_MIN_MS_KEY = "SCHEMA_REGISTRY_STENCIL_FETCH_BACKOFF_MIN_MS";
19+
public static final Long SCHEMA_REGISTRY_STENCIL_FETCH_BACKOFF_MIN_MS_DEFAULT = 60000L;
20+
public static final String SCHEMA_REGISTRY_STENCIL_FETCH_RETRIES_KEY = "SCHEMA_REGISTRY_STENCIL_FETCH_RETRIES";
21+
public static final Integer SCHEMA_REGISTRY_STENCIL_FETCH_RETRIES_DEFAULT = 4;
1222

1323
public static final String UDF_TELEMETRY_GROUP_KEY = "udf";
1424
public static final String GAUGE_ASPECT_NAME = "value";

dagger-common/src/main/java/io/odpf/dagger/common/core/DaggerContext.java renamed to dagger-common/src/main/java/org/raystack/dagger/common/core/DaggerContext.java

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
package io.odpf.dagger.common.core;
1+
package org.raystack.dagger.common.core;
22

3-
import io.odpf.dagger.common.configuration.Configuration;
4-
import io.odpf.dagger.common.exceptions.DaggerContextException;
3+
import org.raystack.dagger.common.configuration.Configuration;
4+
import org.raystack.dagger.common.exceptions.DaggerContextException;
55
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
66
import org.apache.flink.table.api.EnvironmentSettings;
77
import org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
package org.raystack.dagger.common.core;
2+
3+
import com.google.protobuf.Descriptors;
4+
5+
import java.io.Serializable;
6+
import java.util.HashMap;
7+
import java.util.List;
8+
import java.util.Map;
9+
10+
11+
public class FieldDescriptorCache implements Serializable {
12+
private final Map<String, Integer> fieldDescriptorIndexMap = new HashMap<>();
13+
private final Map<String, Integer> protoDescriptorArityMap = new HashMap<>();
14+
15+
public FieldDescriptorCache(Descriptors.Descriptor descriptor) {
16+
17+
cacheFieldDescriptorMap(descriptor);
18+
}
19+
20+
public void cacheFieldDescriptorMap(Descriptors.Descriptor descriptor) {
21+
22+
if (protoDescriptorArityMap.containsKey(descriptor.getFullName())) {
23+
return;
24+
}
25+
List<Descriptors.FieldDescriptor> descriptorFields = descriptor.getFields();
26+
protoDescriptorArityMap.putIfAbsent(descriptor.getFullName(), descriptorFields.size());
27+
28+
for (Descriptors.FieldDescriptor fieldDescriptor : descriptorFields) {
29+
fieldDescriptorIndexMap.putIfAbsent(fieldDescriptor.getFullName(), fieldDescriptor.getIndex());
30+
}
31+
32+
for (Descriptors.FieldDescriptor fieldDescriptor : descriptorFields) {
33+
if (fieldDescriptor.getType().toString().equals("MESSAGE")) {
34+
cacheFieldDescriptorMap(fieldDescriptor.getMessageType());
35+
36+
}
37+
}
38+
}
39+
40+
public int getOriginalFieldIndex(Descriptors.FieldDescriptor fieldDescriptor) {
41+
if (!fieldDescriptorIndexMap.containsKey(fieldDescriptor.getFullName())) {
42+
throw new IllegalArgumentException("The Field Descriptor " + fieldDescriptor.getFullName() + " was not found in the cache");
43+
}
44+
return fieldDescriptorIndexMap.get(fieldDescriptor.getFullName());
45+
}
46+
47+
public boolean containsField(String fieldName) {
48+
49+
return fieldDescriptorIndexMap.containsKey(fieldName);
50+
}
51+
52+
public int getOriginalFieldCount(Descriptors.Descriptor descriptor) {
53+
if (!protoDescriptorArityMap.containsKey(descriptor.getFullName())) {
54+
throw new IllegalArgumentException("The Proto Descriptor " + descriptor.getFullName() + " was not found in the cache");
55+
}
56+
return protoDescriptorArityMap.get(descriptor.getFullName());
57+
}
58+
}

0 commit comments

Comments
 (0)