Skip to content

Latest commit

 

History

History
89 lines (71 loc) · 9.68 KB

debezium.md

File metadata and controls

89 lines (71 loc) · 9.68 KB
description hidden
Debezium Data Connector Documentation
true

Debezium Data Connector

Debezium is an open-source platform that enables Change Data Capture (CDC) for efficient real-time updates of locally accelerated datasets. Spice supports connecting to a Kafka topic managed by Debezium to keep datasets up-to-date with the source data.

datasets:
  - from: debezium:my_kafka_topic_with_debezium_changes
    name: my_dataset
    params:
      debezium_transport: kafka # Optional. Only `kafka` is currently supported.
      debezium_message_format: json # Optional. Only `json` is currently supported.
      kafka_bootstrap_servers: broker1:9092,broker2:9092,broker3:9092 # Required. A comma separated list of Kafka broker servers.
      kafka_security_protocol: SASL_SSL # Default is `SASL_SSL`. Valid values are `PLAINTEXT`, `SSL`, `SASL_PLAINTEXT`, `SASL_SSL`.
      kafka_sasl_mechanism: SCRAM-SHA-512 # Default is `SCRAM-SHA-512`. Valid values are `PLAIN`, `SCRAM-SHA-256`, `SCRAM-SHA-512`.
      kafka_sasl_username: kafka # Required if `kafka_security_protocol` is `SASL_PLAINTEXT` or `SASL_SSL`.
      kafka_sasl_password: ${secrets:kafka_sasl_password} # Required if `kafka_security_protocol` is `SASL_PLAINTEXT` or `SASL_SSL`.
      kafka_ssl_ca_location: ./certs/kafka_ca_cert.pem # Optional. Used to verify the SSL/TLS certificate of the Kafka broker.
      kafka_enable_ssl_certificate_verification: true # Default is `true`. Set to `false` to disable SSL/TLS certificate verification.
      kafka_ssl_endpoint_identification_algorithm: https # Default is `https`. Valid values are `none` and `https`.

    acceleration:
      enabled: true # Acceleration is required for the debezium connector.
      engine: duckdb # `duckdb`, `sqlite` and `postgres` are supported acceleration engines for Debezium.
      refresh_mode: changes # Optional. If specified, this is required to be set to `changes` - any other value is an error.
      mode: file # Persistence is recommended to not have to rebuild the table each time Spice starts.

Configuration

from

The from field takes the form of debezium:kafka_topic where kafka_topic is the name of the Kafka topic where Debezium is notifying consumers about any upstream changes. In the example above it would listen to the my_kafka_topic_with_debezium_changes topic.

name

The dataset name. This will be used as the table name within Spice.

datasets:
  - from: debezium:my_kafka_topic_with_debezium_changes
    name: cool_dataset
SELECT COUNT(*) FROM cool_dataset;
+----------+
| count(*) |
+----------+
| 6001215  |
+----------+

params

Parameter Name Description
debezium_transport Optional. The message broker transport to use. The default is kafka. Possible values:
  • kafka: Use Kafka as the message broker transport. Spice may support additional transports in the future.
debezium_message_format Optional. The message format to use. The default is json. Possible values:
  • json: Use JSON as the message format. Spice is expected to support additional message formats in the future, like avro.
kafka_bootstrap_servers Required. A list of host/port pairs for establishing the initial Kafka cluster connection. The client will use all servers, regardless of the bootstrapping servers specified here. This list only affects the initial hosts used to discover the full server set and should be formatted as host1:port1,host2:port2,....
kafka_security_protocol Security protocol for Kafka connections. Default: SASL_SSL. Options:
  • PLAINTEXT
  • SSL
  • SASL_PLAINTEXT
  • SASL_SSL
kafka_sasl_mechanism SASL (Simple Authentication and Security Layer) authentication mechanism. Default: SCRAM-SHA-512. Options:
  • PLAIN
  • SCRAM-SHA-256
  • SCRAM-SHA-512
kafka_sasl_username SASL username.
kafka_sasl_password SASL password.
kafka_ssl_ca_location Path to the SSL/TLS CA certificate file for server verification.
kafka_enable_ssl_certificate_verification Enable SSL/TLS certificate verification. Default: true.
kafka_ssl_endpoint_identification_algorithm SSL/TLS endpoint identification algorithm. Default: https. Options:
  • none
  • https

Acceleration Settings

{% hint style="warning" %} Using the Debezium connector requires acceleration to be enabled. {% endhint %}

The following settings are required:

Parameter Name Description
enabled Required. Must be set to true to enable acceleration.
engine Required. The acceleration engine to use. Possible valid values:
  • duckdb: Use DuckDB as the acceleration engine.
  • sqlite: Use SQLite as the acceleration engine.
  • postgres: Use PostgreSQL as the acceleration engine.
refresh_mode Optional. The refresh mode to use. If specified, this must be set to changes. Any other value is an error.
mode Optional. The persistence mode to use. When using the duckdb and sqlite engines, it is recommended to set this to file to persist the data across restarts. Spice also persists metadata about the dataset, so it can resume from the last known state of the dataset instead of re-fetching the entire dataset.