Skip to content

Change data capture demo: From Oracle to MongoDB using #Debezium, #KafkaConnect, and #Camel

Notifications You must be signed in to change notification settings

snoussi/cdc-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assumptions

  • Change data capture flow :
    • Source: from an Oracle 19c to a Kafka topic using Debezium
    • Sink:  from a Kafka topic to a MongoDB collection
  • No transformation to apply to the captured change payloads
  • Store change payloads as they are in MongoDB
  • We don’t apply change types (Inserts, updates, deletes …)

High-Level Architecture

Option1 - Using Kafka Connect for the Sink - Simple & Easy

Option 2 - Using Apache Camel for the Sink - Advanced Integration patterns

Installation and configuration

First clone this repository to your local machine

git clone https://github.com/snoussi/cdc-demo.git

Login to your Openshift cluster and create a new project/namespace named cdc-demo to hold the demo

oc new-project cdc-demo

Oracle installation

We will opt for a local container installation of the Oracle database, using the following steps

  1. Make sure that podman is up and running

  2. Login to container-registry.oracle.com, make sure you have an account and have accepted the Oracle Standard Terms and Restrictions for this container

podman login container-registry.oracle.com
  1. Pull the image
podman pull container-registry.oracle.com/database/enterprise:19.3.0.0
  1. Start an Oracle Database server instance
podman run -d \
  --name dbz_oracle19 \
  -p 1521:1521 \
  -e ORACLE_SID=ORCLCDB \
  -e ORACLE_PDB=ORCLPDB1 \
  -e ORACLE_PWD=oraclepw \
  container-registry.oracle.com/database/enterprise:19.3.0.0
  1. Tail the container’s database log until successful installation
podman logs -f dbz_oracle19
  1. Open an SQLPlus terminal to the Oracle database container.
podman exec -it dbz_oracle19 sqlplus sys/oraclepw as sysdba
  1. Oracle LogMiner - Archive logs configuration: execute the following SQL commands inside the SQLPlus terminal
SELECT LOG_MODE FROM V$DATABASE;

If the column contains ARCHIVELOG, then archive logging is enabled (you can the skip next step). If the column contains the value NOARCHIVELOG, archive logging isn’t enabled, and further configuration is necessary (see next step).

  1. Oracle LogMiner - Archive logs configuration: execute the following SQL commands inside the SQLPlus terminal
ALTER SYSTEM SET db_recovery_file_dest_size = 10G;
ALTER SYSTEM SET db_recovery_file_dest = '/opt/oracle/oradata/ORCLCDB' scope=spfile; 
SHUTDOWN IMMEDIATE
STARTUP MOUNT
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
ARCHIVE LOG LIST;

The final output from the SQLPlus terminal shows the following:

Database log mode        Archive Mode
Automatic archival         Enabled
Archive destination        USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     4
Next log sequence to archive   6
Current log sequence         6
  1. Oracle LogMiner - Redo logs configuration: There are two log mining strategies for Debezium’s Oracle connector. The strategy controls how the connector interacts with Oracle LogMiner and how the connector ingests schema and table changes:

redo_log_catalog The data dictionary will be written periodically to the redo logs, causing a higher generation of archive logs over time. This setting enables tracking DDL changes, so if a table’s schema changes, this will be the ideal strategy for that purpose.

online_catalog The data dictionary will not be written periodically to the redo logs, leaving the generation of archive logs consistent with current behavior. Oracle LogMiner will mine changes substantially faster; however, this performance comes at the cost of not tracking DDL changes. If a table’s schema remains constant, this will be the ideal strategy for that purpose.

When using the online_catalog mode, you can safely skip this step entirely.

  1. Oracle LogMiner - Redo logs configuration: execute the following SQL commands inside the SQLPlus terminal to check the Redo log size
SELECT GROUP#, BYTES/1024/1024 SIZE_MB, STATUS FROM V$LOG ORDER BY 1;

This output tells us there are 3 log groups, and each group consumes 200MB of space per log. Additionally, the status associated with each group is crucial as it represents the current state of that log.

    GROUP#    SIZE_MB STATUS
---------- ---------- ----------------
       1        200 INACTIVE
       2        200 CURRENT
       3        200 UNUSED

Now, execute the following SQL to determine the filenames and locations of the redo logs.

SELECT GROUP#, MEMBER FROM V$LOGFILE ORDER BY 1, 2;
    GROUP# MEMBER
---------- ---------------------------------------------------
       1 /opt/oracle/oradata/ORCLCDB/redo01.log
       2 /opt/oracle/oradata/ORCLCDB/redo02.log
       3 /opt/oracle/oradata/ORCLCDB/redo03.log

Now, execute the following SQL to drop and re-create the log group with the size of 500MB. We will use the same log file name in the MEMBER column from the VLOGFILE table.

ALTER DATABASE CLEAR LOGFILE GROUP 1;
ALTER DATABASE DROP LOGFILE GROUP 1;
ALTER DATABASE ADD LOGFILE GROUP 1 ('/opt/oracle/oradata/ORCLCDB/redo01.log') size 500M REUSE;

Execute the following SQL commands inside the SQLPlus terminal to check again the Redo log size

SELECT GROUP#, BYTES/1024/1024 SIZE_MB, STATUS FROM V$LOG ORDER BY 1;
    GROUP#    SIZE_MB STATUS
---------- ---------- ----------------
       1        400 UNUSED
       2        200 CURRENT
       3        200 UNUSED
  1. Oracle LogMiner - Supplemental Logging configuration: For Debezium to interface with LogMiner and work with chained rows and various storage arrangements, database supplemental logging must be enabled at a minimum level. To enable this level, execute the following SQL in the SQLPlus terminal:
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
  1. Oracle LogMiner - User setup: A user account will require specific permissions to access these LogMiner APIs and gather data from the captured tables, execute the following SQL in the SQLPlus terminal:
CONNECT sys/oraclepw@ORCLCDB as sysdba;
CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
CONNECT sys/oraclepw@ORCLPDB1 as sysdba;
CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/ORCLPDB1/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
CONNECT sys/oraclepw@ORCLCDB as sysdba;
CREATE USER c##dbzuser IDENTIFIED BY dbz DEFAULT TABLESPACE LOGMINER_TBS QUOTA UNLIMITED ON LOGMINER_TBS CONTAINER=ALL;
GRANT CREATE SESSION TO c##dbzuser CONTAINER=ALL;
GRANT SET CONTAINER TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$DATABASE TO c##dbzuser CONTAINER=ALL;
GRANT FLASHBACK ANY TABLE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ANY TABLE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT_CATALOG_ROLE TO c##dbzuser CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO c##dbzuser CONTAINER=ALL;
GRANT LOGMINING TO c##dbzuser CONTAINER=ALL; 
GRANT SELECT ANY DICTIONARY TO c##dbzuser CONTAINER=ALL;
GRANT CREATE TABLE TO c##dbzuser CONTAINER=ALL;
GRANT LOCK ANY TABLE TO c##dbzuser CONTAINER=ALL;
GRANT CREATE SEQUENCE TO c##dbzuser CONTAINER=ALL;
GRANT EXECUTE ON DBMS_LOGMNR TO c##dbzuser CONTAINER=ALL;
GRANT EXECUTE ON DBMS_LOGMNR_D TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOG_HISTORY TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_PARAMETERS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO c##dbzuser CONTAINER=ALL;
  1. You can refer to Debezium for Oracle - Part 1: Installation and Setup for further explanations

  2. Create a sample table to use during the demo, and execute the following SQL in the SQLPlus terminal:

CONNECT sys/oraclepw@ORCLCDB as sysdba;
CREATE USER c##raa IDENTIFIED BY raapwd DEFAULT TABLESPACE users TEMPORARY TABLESPACE temp;
GRANT CONNECT, RESOURCE TO c##raa;
GRANT CREATE TABLE TO c##raa;
ALTER USER c##raa QUOTA UNLIMITED ON users;
ALTER USER c##raa DEFAULT TABLESPACE users;
ALTER USER c##raa TEMPORARY TABLESPACE temp;
CONNECT c##raa/raapwd@ORCLCDB;
CREATE TABLE customers (id number(9,0) primary key, name varchar2(50), city varchar2(50));
GRANT SELECT, INSERT, UPDATE, DELETE ON customers TO c##raa;
INSERT INTO customers VALUES (1001, 'Rick Grimes','Alexandria');
INSERT INTO customers VALUES (1002, 'Hubert Bonisseur de La Bath','Paris');
INSERT INTO customers VALUES (1003, 'Sterling Archer','New York');
COMMIT;
  1. Use ngrok Expose the database externally
ngrok tcp 1521

MongoDB installation

For this installation, we will use the MongoDB operator offered by opstreelabs

  1. Follow these steps to install the MongoDB operator on the right namespace

  1. Create a secret to hold the admin password by running the following command
oc create secret generic my-mongodb-secret --from-literal=password=ilovemongo
  1. Create the following file and name it my-mongodb.yml. Don't forget to edit the storageClass with the one you use in your Openshift cluster.
apiVersion: opstreelabs.in/v1alpha1
kind: MongoDB
metadata:
  name: my-mongodb
spec:
  kubernetesConfig:
    image: 'quay.io/opstree/mongo:v5.0.6'
    imagePullPolicy: IfNotPresent
  mongoDBSecurity:
    mongoDBAdminUser: admin
    secretRef:
      key: password
      name: my-mongodb-secret
  storage:
    accessModes:
      - ReadWriteOnce
    storageClass: lvms-vg1
    storageSize: 1Gi
  1. Apply the previous file to the namespace
oc apply -f my-mongodb.yml
  1. Create the following file and name it my-mongodb-service.yml. This will expose the MongoDB port to the outside world though port 32017
apiVersion: v1
kind: Service
metadata:
  name: my-mongodb
  labels:
    app: mongodb-standalone
spec:
  type: NodePort
  ports:
    - port: 27017
      targetPort: 27017
      nodePort: 32017
      name: mongodb-port
  selector:
    app: mongodb-standalone
  1. Apply the previous file to the namespace
oc apply -f my-mongodb-service.yml
  1. Note the connection string:
  • To use with the same namespace : mongodb://admin:ilovemongo@<OCP_NODE_IP>:32017
  • To use and browser externally in MongoDB Compass: mongodb://admin:[email protected]:27017

AMQ Streams installation

  1. We will use the AMQ Streams Operator for this deployment, first create the following file amq-streams-operator.yml and apply it to the namespace
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: amq-streams
spec:
  channel: stable
  installPlanApproval: Automatic
  name: amq-streams
  source: redhat-operators
  sourceNamespace: openshift-marketplace
oc apply -f amqstreams-operator.yml

Kafka Cluster

  1. Create a basic Kafka cluster by creating the following file amq-streams-cluster.yml and apply it to the namespace
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-amqstreams-cluster
spec:
  kafka:
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
    config:
      default.replication.factor: 3
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 10Gi
        deleteClaim: true
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 5Gi
      deleteClaim: true
  entityOperator:
    topicOperator: {}
    userOperator: {}
oc apply -f my-amqstreams-cluster.yml
  1. Find and note the bootstrap Servers url :

my-amqstreams-cluster-kafka-bootstrap.cdc-demo.svc:9092

Source - Kafka Connect for Debezium

  1. Create the debezium-oracle-kafka-connect imagestream
oc create imagestream debezium-oracle-kafka-connect
  1. Create the following Debezium KafkaConnect custom resource (CR), name it my-debezium-oracle-kafka-connect.yml, and apply it to the namespace
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
  name: my-debezium-oracle-kafka-connect
  annotations:
    strimzi.io/use-connector-resources: "true"
spec:
  replicas: 1
  bootstrapServers: 'my-amqstreams-cluster-kafka-bootstrap.cdc-demo.svc:9092'
  build:
    output:
      type: imagestream
      image: debezium-oracle-kafka-connect:latest
    plugins:
     - name: debezium-connector-oracle
       artifacts:
         - type: zip
           url: 'https://maven.repository.redhat.com/ga/io/debezium/debezium-connector-oracle/2.1.4.Final-redhat-00001/debezium-connector-oracle-2.1.4.Final-redhat-00001-plugin.zip'
         - type: zip
           url: 'https://maven.repository.redhat.com/ga/io/debezium/debezium-scripting/2.1.4.Final-redhat-00001/debezium-scripting-2.1.4.Final-redhat-00001.zip'
         - type: jar
           url: 'https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.1.0.0/ojdbc8-21.1.0.0.jar'
  config:
    group.id: debezium-oracle-connect
    offset.storage.topic: connect-cluster-offsets
    config.storage.topic: connect-cluster-configs
    status.storage.topic: connect-cluster-status
    config.storage.replication.factor: -1
    offset.storage.replication.factor: -1
    status.storage.replication.factor: -1
oc apply -f my-debezium-oracle-kafka-connect.yml
  1. Create the following KafkaConnector CR, save it as oracle-source-connector-customers-table.yml, and apply it to the namespace
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: oracle-source-connector-customers-table
  labels:
    strimzi.io/cluster: my-debezium-oracle-kafka-connect
  annotations:
    strimzi.io/use-connector-resources: 'true'
spec:
  class: io.debezium.connector.oracle.OracleConnector
  tasksMax: 1
  config:
   schema.history.internal.kafka.bootstrap.servers: 'my-amqstreams-cluster-kafka-bootstrap.cdc-demo.svc:9092'
   schema.history.internal.kafka.topic: "schema-changes"
   database.hostname: 6.tcp.eu.ngrok.io
   database.port: 16731
   database.user: c##dbzuser
   database.password: dbz
   database.dbname: ORCLCDB
   topic.prefix: oracle-cdc
   table.include.list: C##RAA.CUSTOMERS
oc apply -f oracle-source-connector-customers-table.yml

Sink option 1 - Kafka Connect for MongoDB

  1. Create the mongodb-kafka-connect imagestream
oc create imagestream mongodb-kafka-connect
  1. Create the following MongoDB KafkaConnect custom resource (CR), name it my-mongodb-kafka-connect-cluster.yml, and apply it to the namespace
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
  name: my-mongodb-kafka-connect-cluster
  annotations:
    strimzi.io/use-connector-resources: "true"
spec:
  replicas: 1
  bootstrapServers: 'my-amqstreams-cluster-kafka-bootstrap.cdc-demo.svc:9092'
  build:
    output:
      type: imagestream
      image: mongodb-kafka-connect:latest
    plugins:
      - name: my-mongo-kafka-connect-plugin
        artifacts:
          - type: maven
            repository: 'https://repo1.maven.org/maven2'
            group: org.mongodb.kafka
            artifact: mongo-kafka-connect
            version: 1.11.0
  config:
    group.id: mongodb-connect-cluster
    key.converter: org.apache.kafka.connect.json.JsonConverter
    value.converter: org.apache.kafka.connect.json.JsonConverter
    key.converter.schemas.enable: false
    value.converter.schemas.enable: false
    offset.storage.topic: connect-offsets
    config.storage.topic: connect-configs
    status.storage.topic: connect-status
oc apply -f my-mongodb-kafka-connect-cluster.yml
  1. Create the following KafkaConnector CR, save it as oracle-source-connector-customers-table.yml, and apply it to the namespace
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  labels:
    strimzi.io/cluster: my-mongodb-kafka-connect-cluster
  name: mongodb-sink-connector
spec:
  class: com.mongodb.kafka.connect.MongoSinkConnector
  config:
    collection: customers-connect
    connection.uri: 'mongodb://admin:[email protected]:27017'
    database: sampledb
    key.converter: org.apache.kafka.connect.storage.StringConverter
    topics: oracle-cdc.C__RAA.CUSTOMERS
    value.converter: org.apache.kafka.connect.json.JsonConverter
    value.converter.schemas.enable: false
oc apply -f mongodb-sink-connector.yml

Sink option 2 - Kafka to MongoDB Apache Camel route

  1. Move to camel-mongodb-sink folder, open the Quarkus project in your favorite IDE, then inspect the Routes Class and the application.properties file
cd ./camel-mongodb-sink
  1. Build and deploy the project to Openshift, make sure you are on the right namespace
./mvnw clean package -Dquarkus.kubernetes.deploy=true -DskipTests -Dopenshift
  • To delete the deployed project you can run the following command
oc delete all -l app.kubernetes.io/name=camel-mongodb-sink

Demo script

  1. Open a new terminal, log into the sample customers database and run the following SQL commands:
INSERT INTO customers VALUES (1004, 'John Rambo','Arizona');
INSERT INTO customers VALUES (1005, 'Indiana Jones','Cairo');
INSERT INTO customers VALUES (1006, 'James Bond','London');
UPDATE customers set city= 'Kabul' where id=1004;
COMMIT;
  1. Verify that the changes to the customers database have been replicated into MongoDB

Learn More

About

Change data capture demo: From Oracle to MongoDB using #Debezium, #KafkaConnect, and #Camel

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages