Skip to content

DBZ-4616 README edits to Debezium RHOAS/RHOSR tutorial #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 69 additions & 44 deletions debezium-openshift-registry-avro/README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,47 @@
# Debezium - Avro serialization with Red Hat OpenShift Service Registry

This tutorial demonstrates how to use[ Debezium](https://debezium.io/) to monitor a MySQL database. As the data in the database changes, you will see the resulting event streams reflected in[ Red Hat OpenShift Streams for Apache Kafka](https://www.redhat.com/es/technologies/cloud-computing/openshift/openshift-streams-for-apache-kafka).
This tutorial demonstrates how to use [Debezium](https://debezium.io/) to monitor a MySQL database.
As you make changes to data in the database, the changes are reflected in the change events records that Debezium emits to [Red Hat OpenShift Streams for Apache Kafka](https://www.redhat.com/es/technologies/cloud-computing/openshift/openshift-streams-for-apache-kafka).

Debezium includes multiple connectors. In this tutorial, you will use the[ MySQL connector](https://debezium.io/documentation/reference/1.3/connectors/mysql.html).
Debezium provides connectors for multiple database types.
In this tutorial we'll use the Debezium [MySQL connector](https://debezium.io/documentation/reference/1.3/connectors/mysql.html).

## Red Hat OpenShift Service Registry

[Red Hat OpenShift Service Registry](https://www.redhat.com/es/technologies/cloud-computing/openshift/openshift-service-registry) is a fully hosted and managed service that provides an API and schema registry for microservices. OpenShift Service Registry makes it easy for development teams to publish, discover, and reuse APIs and schemas.The following services include Red Hat OpenShift Service Registry at no additional charge:
[Red Hat OpenShift Service Registry](https://www.redhat.com/es/technologies/cloud-computing/openshift/openshift-service-registry) is a fully hosted and managed service that provides an API and schema registry for microservices. By using OpenShift Service Registry, development teams can easily publish, discover, and reuse APIs and schemas.
Red Hat OpenShift Service Registry is included in the following services at no additional charge:

- [Red Hat OpenShift API Management](https://www.redhat.com/es/technologies/cloud-computing/openshift/openshift-api-management)
- [Red Hat OpenShift Streams for Apache Kafka](https://www.redhat.com/es/technologies/cloud-computing/openshift/openshift-streams-for-apache-kafka)

## Debezium schema serialization

The default behavior is that the JSON converter includes the record’s message schema, making each record very verbose. Alternatively, you can serialize the record keys and values using[ Apache Avro](https://avro.apache.org/). To use Apache Avro serialization, you must deploy a schema registry that manages Avro message schemas and their versions.
By default the Debezium JSON converter includes the entire message schema of each change event, resulting in records that are very verbose.
To enable you to reduce the size of event records, the Service Registry provides a Kafka Connect converter for JSON schema.
By using the converter, you can serialize record keys and values using [Apache Avro](https://avro.apache.org/).
To use Apache Avro serialization, you must deploy a schema registry that manages Avro message schemas and their versions.

OpenShift Service Registry provides an Avro converter that you can specify in Debezium connector configurations. This converter maps Kafka Connect schemas to Avro schemas. The converter then uses the Avro schemas to serialize the record keys and values into Avro’s compact binary form.
You can build the OpenShift Service Registry Avro converter into your Debezium connector by specifying it in the connector configuration.
This converter maps Kafka Connect schemas to Avro schemas.
The converter then uses the Avro schemas to serialize the record keys and values into Avro’s compact binary form.

### Prerequisites

- Docker is installed and running.

This tutorial uses Docker and the Linux container images to run the essential local services. You should use the latest version of Docker. For more information, see the[ Docker Engine installation documentation](https://docs.docker.com/engine/installation/).
This tutorial uses Docker and the Linux container images to run the essential local services.
You should use the latest version of Docker.
For more information, see the [Docker Engine installation documentation](https://docs.docker.com/engine/installation/).

- [kcat](https://github.com/edenhill/kcat)

- [kcctl](https://github.com/kcctl/kcctl)

- [rhoas](https://github.com/redhat-developer/app-services-cli/releases/latest) - Red Hat Openshift Application Services CLIjq (for JSON processing)
- [Red Hat Openshift Application Services CLI (RHOAS)](https://github.com/redhat-developer/app-services-cli/releases/latest)

- A Red Hat developer account
- [jq](https://stedolan.github.io/jq/) For JSON processing.

- A [Red Hat Developer account](https://developers.redhat.com/about)

As part of the developer program for OpenShift Streams for Apache Kafka, everybody with a Red Hat account can create a Kafka instance free of charge.

Expand All @@ -39,17 +51,18 @@ OpenShift Service Registry provides an Avro converter that you can specify in De

## Starting the local services

For this demo, the MySQL database and the Kafka Connect cluster will be running locally on your machine. We will use Docker compose to start the required services, so there is no need to install anything beyond the prerequisites listed above.
In this demo, you run a MySQL database and a Kafka Connect cluster locally on your machine.
We will use Docker compose to start the required services, so there is no need to install anything beyond the prerequisites listed above.

To start the local services, follow these steps:
To start the local services, complete the following steps:

1. Clone this repository:
1. Clone the following repository:

```bash
git clone https://github.com/hguerrero/debezium-examples.git
```

1. Change to the following directory:
1. Change to the `debezium-openshift-registry-avro` directory:

```bash
cd debezium-examples/debezium-openshift-registry-avro
Expand All @@ -64,9 +77,9 @@ To start the local services, follow these steps:
KAFKA_CONNECT_SASL_USERNAME: <kafka-sa-client-id>
KAFKA_CONNECT_SASL_PASSWORD_FILE: cpass
```

> You will need your Kafka bootstrap server and the service account you will use to connect. The container image then takes the password from a local file called `cpass`

1. Open the provided `cpass` file and **replace the placeholder** with your service account secret.

```
Expand All @@ -79,20 +92,24 @@ To start the local services, follow these steps:
docker-compose up -d
```

The last command will start the following components:
The preceding commdand starts the following components:

- Single node Kafka Connect cluster
- MySQL database (ready for CDC)

## Apicurio converters

The open source[ Apicurio Registry project](https://www.apicur.io/registry/) is the original community of OpenShift Service Registry. Apicurio Registry provides Kafka Connect converters for Apache Avro and JSON Schema. Configuring Avro at the Debezium Connector involves specifying the converter and schema registry as a part of the connector's configuration. The connector configuration customizes the connector but explicitly sets the (de-)serializers for the connector to use Avro and specifies the location of the Apicurio registry.
The open source [Apicurio Registry project](https://www.apicur.io/registry/) is the original community of OpenShift Service Registry.
Apicurio Registry provides Kafka Connect converters for Apache Avro and JSON Schema.
You configure Avro at the Debezium connector by specifying the converter and schema registry properties in the connector configuration.
The connector configuration customizes the connector and explicitly sets the (de-)serializers for the connector to use Avro and specifies the location of the Apicurio registry.

> The container image used in this environment includes all the required libraries to access the Debezium connectors and Apicurio Registry converters.
> The container image used in this environment includes all of the libraries that are required to access the Debezium MySQL connector and Apicurio Registry converters.

### Configure the converters

The following are the lines required in the connector configuration to set the **key** and **value** converters and their respective registry configuration. Replace the corresponding values with the information from your OpenShift services.
Add the following lines to the connector configuration to specify the **key** and **value** converters and their corresponding registry configuration.
Replace the values in angle brackets (`< >`) with the information for your OpenShift services.

```json
"key.converter": "io.apicurio.registry.utils.converter.AvroConverter",
Expand All @@ -115,13 +132,15 @@ The following are the lines required in the connector configuration to set the *
"value.converter.apicurio.registry.auto-register": "true"
```

> The compatibility mode allows you to use other providers tooling to deserialize and reuse the schemas in the Apicurio service registry.
> The compatibility mode enables you to use other providers tooling to deserialize and reuse the schemas in the Apicurio service registry.
This also includes the information required for the serializer to use a service account to authenticate with the service registry.

This also includes the information required for the serializer to authenticate with the service registry using a service account.
<!--- What is the compatibility mode? In the preceding sentence does `This also includes ...` refer to the compatibility mode? Or to the configuration? --->

### Create the topics in Red Hat OpenShift Streams for Apache Kafka

You will need to manually create the required topics that will be used by Debezium.
Automatic topic creation is disabled in OpenShift Streams.
You must manually create the topics that Debezium requires.

1. Create the following topics in your Kafka cluster:

Expand All @@ -139,22 +158,24 @@ You will need to manually create the required topics that will be used by Debezi
| debezium-cluster-status | 1 | 604800000 ms (7 days) | Unlimited |
| schema-changes.inventory | 1 | 604800000 ms (7 days) | Unlimited |

You should end with a table of topics like this:
The table in the following example shows the required topics:

![topics-openshift-streams-debezium.png](topics-openshift-streams-debezium.png)

**NOTE:** It is also necessary that the topics that start with `debezium-cluster-` are configured with the `compact` policy. If you don't set this property correctly, you will see errors in the kafka connect log and the connector won't be able to start.
**NOTE:** Configure the topics with the prefix `debezium-cluster-` with the `compact` Cleanup policy.
If you fail to set this property, errors are reported to the Kafka Connect log, and the connector is unable to start.

![compact policy](cleanup-policy-debezium.png)

### Configure database history

In a separate[ database history Kafka topic](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-schema-history-topic), the Debezium connector for MySQL records all DDL statements along with the position in the binlog where each DDL statement appeared. Because of that, the connector needs access to the target Kafka cluster, so we need to add the connection details to the connector configuration.

You will need the following lines in your connector configuration to access OpenShift Streams. Add them as you did with the details of the converter.
The Debezium connector for MySQL records all DDL statements along with the position in the binlog where each DDL statement appears, in a separate [database history Kafka topic](https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-schema-history-topic),
To enable the connector to record the database history, you must provide it with access to the target Kafka cluster.
Add the following connection details to the connector configuration to provide the connector with access to OpenShift Streams.
As with the converter details that you added earlier, replace values in angle brackets (`< >`) with the information for your OpenShift services.

```json
"database.history.kafka.topic": "schema-changes.inventory",
"database.history.kafka.topic": "schema-changes.inventory",
"database.history.kafka.bootstrap.servers": "<your-boostrap-server>",
"database.history.producer.security.protocol": "SASL_SSL",
"database.history.producer.sasl.mechanism": "PLAIN",
Expand All @@ -164,35 +185,38 @@ You will need the following lines in your connector configuration to access Open
"database.history.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=<kafka-sa-client-id> password=<kafka-sa-client-secret>;",
```

As you can see, you will need to configure the producer and consumer authentication independently.
As you can see in the preceding example, you must configure producer authentication and consumer authentication independently.

> You can check the final file configuration called `dbz-mysql-openshift-registry-avro.json` under the main folder.

### Create the connector

Now that the configuration for the connector is ready, let's add it to the Kafka Connect cluster to start the task that begins capturing the changes of the database. We will use the kcctl a command-line client for Kafka Connect that allows you to register and examine connectors, delete them, restart them among other features.
The configuration for the connector is now ready.
Let's add it to the Kafka Connect cluster to start the process that begins to capture changes that occur in the database.
We will use `kcctl`, a command-line client for Kafka Connect, to register, examine, delete, and restart connectors.

1. Configure kcctl context:

```sh
kcctl config set-context --cluster http://localhost:8083 local
```

1. Register the connector using kcctl

```bash
kcctl apply -f dbz-mysql-openshift-registry-avro.json
```

### Check the service registry
### Check the Service Registry

Access the *Red Hat OpenShift Service Regtistry* console, and you should be able to find all the schema artifacts.
Open the *Red Hat OpenShift Service Registry* console.
You should be able to find all the schema artifacts, as seen in the following figure.

![registry-debezium-artifacts.png](registry-debezium-artifacts.png)

### Check the data

We will use kcat CLI utility to query the information from the OpenShift Streams Kafka cluster.
We will use `kcat` CLI utility to query the information from the OpenShift Streams Kafka cluster.

1. Set the environment variables in your terminal session with your cluster information:

Expand All @@ -212,7 +236,7 @@ We will use kcat CLI utility to query the information from the OpenShift Streams
-X sasl.password="$CLIENT_SECRET" -L
```

You should get an output similar to the following:
The command returns output that is similar to the output in the following example:

```sh
Metadata for all topics (from broker -1: sasl_ssl://kafkaesque-c-isn-bhfjlsl-g-dana.bf2.kafka.rhcloud.com:443/bootstrap):
Expand Down Expand Up @@ -245,9 +269,9 @@ We will use kcat CLI utility to query the information from the OpenShift Streams
partition 0, leader 1, replicas: 1,2,0, isrs: 1,2,0
```



3. Now check the records on the `customers` topic:

3. Check the records on the `customers` topic:

```sh
kcat -b $BOOTSTRAP_SERVER \
Expand All @@ -258,7 +282,7 @@ We will use kcat CLI utility to query the information from the OpenShift Streams
-t avro.inventory.customers -C -e
```

You should see the 4 scramble records in the terminal:
You should see the following four scrambled records in the terminal:

```sh
Expand All @@ -278,9 +302,11 @@ We will use kcat CLI utility to query the information from the OpenShift Streams
% Reached end of topic avro.inventory.customers [0] at offset 4
```

> This is because we are using Avro for the serialization. The kcat utility is expecting Strings and hence, can not convert correctly. We will fix this in the following step.
> The records are scrambled, because we are using Avro for the serialization.
Because the `kcat` utility expects strings, it is unable to convert the records correctly.
Complete the next step to improve the way that the records are rendered.

4. Now that we can see the records, we can then ask kcat to connect with the OpenShift Service Registry so it can query the used schema and deserialize correctly the Avro records:
4. Now that we can see the records, we can then ask `kcat` to connect with the OpenShift Service Registry so it can query the used schema and deserialize correctly the Avro records:

```sh
kcat -b $BOOTSTRAP_SERVER \
Expand All @@ -294,7 +320,7 @@ We will use kcat CLI utility to query the information from the OpenShift Streams

> OpenShift Service Registry also supports basic authentication, that's why we are using the credentials in the format: https://username:password@URL

You will be able to see the record in a nice formatted json structure:
The records are now presented in a nicely formatted JSON structure:

```json
...
Expand Down Expand Up @@ -341,7 +367,6 @@ Congratulations! You were able to send Avro serialized records from MySQL to Ope

## Summary

Although Debezium makes it easy to capture database changes and record them in Kafka, one of the mo

re critical decisions you have to make is *how* you will serialize those change events in Kafka. Debezium allows you to select key and value *converters* to choose from different types of options. The *Red Hat OpenShift Service Registry* will enable you to store externalized schema versions to minimize the payload to propagate.

Although Debezium makes it easy to capture database changes and record them in Kafka, one of the more critical decisions you have to make is *how* to serialize those change events in Kafka.
Debezium provides key and value *converters* that you can use to specify how to serialize events.
The *Red Hat OpenShift Service Registry* enables you to store externalized schema versions to minimize payload volumes.