Skip to content

[docs](ecosystem) add Kafka Connect SMT Examples #2323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions docs/ecosystem/doris-kafka-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,67 @@ curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X
}'
```

### Loading Data with Kafka Connect Single Message Transforms

For example, consider data in the following format:
```shell
{
"registertime": 1513885135404,
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE"
}
```
If we want to add a hard-coded column that doesn't exist in the Kafka topic data, we can use InsertField to accomplish this. Additionally, if we need to convert a bigint timestamp to a formatted time string, we can use TimestampConverter.

```shell
curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X POST -d '{
"name": "insert_field_tranform",
"config": {
"connector.class": "org.apache.doris.kafka.connector.DorisSinkConnector",
"tasks.max": "1",
"topics": "users",
"doris.topic2table.map": "users:kf_users",
"buffer.count.records": "10",
"buffer.flush.time": "11",
"buffer.size.bytes": "5000000",
"doris.urls": "127.0.0.1:8030",
"doris.user": "root",
"doris.password": "123456",
"doris.http.port": "8030",
"doris.query.port": "9030",
"doris.database": "testdb",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "InsertField,TimestampConverter",
// Insert Static Field
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertField.static.field": "repo",
"transforms.InsertField.static.value": "Apache Doris",
// Convert Timestamp Format
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.field": "registertime",
"transforms.TimestampConverter.format": "yyyy-MM-dd HH:mm:ss.SSS",
"transforms.TimestampConverter.target.type": "string"
}
}'
```

After InsertField and TimestampConverter transformations, the data becomes:
```shell
{
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE",
"repo": "Apache Doris",// Static field added
"registertime": "2017-12-21 03:38:55.404" // Unix timestamp converted to string
}
```

For more examples of Kafka Connect Single Message Transforms (SMT), please refer to the [SMT documentation](https://docs.confluent.io/cloud/current/connectors/transforms/overview.html).


## FAQ
**1. The following error occurs when reading Json type data:**
```shell
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,68 @@ curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X
}'
```

### 使用 Kafka Connect SMT 转换数据

数据样例如下:
```shell
{
"registertime": 1513885135404,
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE"
}
```

假设我们想在 Kafka 消息中硬编码新增一个列,我们可以使用 InsertField 。另外, 我们也可以使用 TimestampConverter 将 Bigint 类型 timestamp 转换成时间字符串。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

人称建议不使用“我们”,用第三人称。可以大模型优化一下


```shell
curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X POST -d '{
"name": "insert_field_tranform",
"config": {
"connector.class": "org.apache.doris.kafka.connector.DorisSinkConnector",
"tasks.max": "1",
"topics": "users",
"doris.topic2table.map": "users:kf_users",
"buffer.count.records": "10",
"buffer.flush.time": "11",
"buffer.size.bytes": "5000000",
"doris.urls": "127.0.0.1:8030",
"doris.user": "root",
"doris.password": "123456",
"doris.http.port": "8030",
"doris.query.port": "9030",
"doris.database": "testdb",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "InsertField,TimestampConverter",
// Insert Static Field
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertField.static.field": "repo",
"transforms.InsertField.static.value": "Apache Doris",
// Convert Timestamp Format
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.field": "registertime",
"transforms.TimestampConverter.format": "yyyy-MM-dd HH:mm:ss.SSS",
"transforms.TimestampConverter.target.type": "string"
}
}'
```

样例数据经过 SMT 的处理之后,变成如下所示:
```shell
{
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE",
"repo": "Apache Doris",// Static field added
"registertime": "2017-12-21 03:38:55.404" // Unix timestamp converted to string
}
```

更多关于 Kafka Connect Single Message Transforms (SMT) 使用案例, 可以参考文档 [SMT documentation](https://docs.confluent.io/cloud/current/connectors/transforms/overview.html).


## 常见问题
**1. 读取 JSON 类型的数据报如下错误:**
```shell
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,68 @@ curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X
}'
```

### 使用 Kafka Connect SMT 转换数据

数据样例如下:
```shell
{
"registertime": 1513885135404,
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE"
}
```

假设我们想在 Kafka 消息中硬编码新增一个列,我们可以使用 InsertField 。另外, 我们也可以使用 TimestampConverter 将 Bigint 类型 timestamp 转换成时间字符串。

```shell
curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X POST -d '{
"name": "insert_field_tranform",
"config": {
"connector.class": "org.apache.doris.kafka.connector.DorisSinkConnector",
"tasks.max": "1",
"topics": "users",
"doris.topic2table.map": "users:kf_users",
"buffer.count.records": "10",
"buffer.flush.time": "11",
"buffer.size.bytes": "5000000",
"doris.urls": "127.0.0.1:8030",
"doris.user": "root",
"doris.password": "123456",
"doris.http.port": "8030",
"doris.query.port": "9030",
"doris.database": "testdb",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "InsertField,TimestampConverter",
// Insert Static Field
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertField.static.field": "repo",
"transforms.InsertField.static.value": "Apache Doris",
// Convert Timestamp Format
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.field": "registertime",
"transforms.TimestampConverter.format": "yyyy-MM-dd HH:mm:ss.SSS",
"transforms.TimestampConverter.target.type": "string"
}
}'
```

样例数据经过 SMT 的处理之后,变成如下所示:
```shell
{
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE",
"repo": "Apache Doris",// Static field added
"registertime": "2017-12-21 03:38:55.404" // Unix timestamp converted to string
}
```

更多关于 Kafka Connect Single Message Transforms (SMT) 使用案例, 可以参考文档 [SMT documentation](https://docs.confluent.io/cloud/current/connectors/transforms/overview.html).


## 常见问题
**1. 读取 JSON 类型的数据报如下错误:**
```shell
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,68 @@ curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X
}'
```

### 使用 Kafka Connect SMT 转换数据

数据样例如下:
```shell
{
"registertime": 1513885135404,
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE"
}
```

假设我们想在 Kafka 消息中硬编码新增一个列,我们可以使用 InsertField 。另外, 我们也可以使用 TimestampConverter 将 Bigint 类型 timestamp 转换成时间字符串。

```shell
curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X POST -d '{
"name": "insert_field_tranform",
"config": {
"connector.class": "org.apache.doris.kafka.connector.DorisSinkConnector",
"tasks.max": "1",
"topics": "users",
"doris.topic2table.map": "users:kf_users",
"buffer.count.records": "10",
"buffer.flush.time": "11",
"buffer.size.bytes": "5000000",
"doris.urls": "127.0.0.1:8030",
"doris.user": "root",
"doris.password": "123456",
"doris.http.port": "8030",
"doris.query.port": "9030",
"doris.database": "testdb",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "InsertField,TimestampConverter",
// Insert Static Field
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertField.static.field": "repo",
"transforms.InsertField.static.value": "Apache Doris",
// Convert Timestamp Format
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.field": "registertime",
"transforms.TimestampConverter.format": "yyyy-MM-dd HH:mm:ss.SSS",
"transforms.TimestampConverter.target.type": "string"
}
}'
```

样例数据经过 SMT 的处理之后,变成如下所示:
```shell
{
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE",
"repo": "Apache Doris",// Static field added
"registertime": "2017-12-21 03:38:55.404" // Unix timestamp converted to string
}
```

更多关于 Kafka Connect Single Message Transforms (SMT) 使用案例, 可以参考文档 [SMT documentation](https://docs.confluent.io/cloud/current/connectors/transforms/overview.html).


## 常见问题
**1. 读取 JSON 类型的数据报如下错误:**
```shell
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -427,6 +427,68 @@ curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X
}'
```

### 使用 Kafka Connect SMT 转换数据

数据样例如下:
```shell
{
"registertime": 1513885135404,
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE"
}
```

假设我们想在 Kafka 消息中硬编码新增一个列,我们可以使用 InsertField 。另外, 我们也可以使用 TimestampConverter 将 Bigint 类型 timestamp 转换成时间字符串。

```shell
curl -i http://127.0.0.1:8083/connectors -H "Content-Type: application/json" -X POST -d '{
"name": "insert_field_tranform",
"config": {
"connector.class": "org.apache.doris.kafka.connector.DorisSinkConnector",
"tasks.max": "1",
"topics": "users",
"doris.topic2table.map": "users:kf_users",
"buffer.count.records": "10",
"buffer.flush.time": "11",
"buffer.size.bytes": "5000000",
"doris.urls": "127.0.0.1:8030",
"doris.user": "root",
"doris.password": "123456",
"doris.http.port": "8030",
"doris.query.port": "9030",
"doris.database": "testdb",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "InsertField,TimestampConverter",
// Insert Static Field
"transforms.InsertField.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.InsertField.static.field": "repo",
"transforms.InsertField.static.value": "Apache Doris",
// Convert Timestamp Format
"transforms.TimestampConverter.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter.field": "registertime",
"transforms.TimestampConverter.format": "yyyy-MM-dd HH:mm:ss.SSS",
"transforms.TimestampConverter.target.type": "string"
}
}'
```

样例数据经过 SMT 的处理之后,变成如下所示:
```shell
{
"userid": "User_9",
"regionid": "Region_3",
"gender": "MALE",
"repo": "Apache Doris",// Static field added
"registertime": "2017-12-21 03:38:55.404" // Unix timestamp converted to string
}
```

更多关于 Kafka Connect Single Message Transforms (SMT) 使用案例, 可以参考文档 [SMT documentation](https://docs.confluent.io/cloud/current/connectors/transforms/overview.html).


## 常见问题
**1. 读取 JSON 类型的数据报如下错误:**
```shell
Expand Down
Loading