Kryptonite for Kafka: Apache Kafka Connect SMTs

Disclaimer: This is an UNOFFICIAL community project!

📚 Documentation

👉 For the latest module documentation please go to the official docs page for the Apache Kafka Connect SMTs 👈

The contents in this README are most likely outdated by now and won't be regularly maintained going forward. You should regard them as deprecated and expect removal any time without prior notice.

Kafka Connect Transformation (SMT)

Kryptonite for Kafka provides a turn-key ready Kafka Connect single message transformation (SMT) called CipherField. The simple examples below show how to install, configure and apply the SMT to encrypt and decrypt record fields.

Build and Deployment

Either you build this project from sources via Maven or you can download pre-built, self-contained packages of the latest artefacts. Starting with Kryptonite for Kafka 0.4.0, the pre-built Kakfa Connect SMT can be downloaded directly from the release pages.

In order to deploy this custom SMT put the root folder of the extracted archive into your 'connect plugin path' that is configured to be scanned during boostrap of the kafka connect worker node(s).

Data Records without Schema

The following fictional data record value without schema - represented in JSON-encoded format - is used to illustrate a simple encrypt/decrypt scenario:

{
  "id": "1234567890",
  "myString": "some foo bla text",
  "myInt": 42,
  "myBoolean": true,
  "mySubDoc1": {"myString":"hello json"},
  "myArray1": ["str_1","str_2","...","str_N"],
  "mySubDoc2": {"k1":9,"k2":8,"k3":7}
}

Encryption of selected fields

Let's assume the fields "myString","myArray1" and "mySubDoc2" of the above data record should get encrypted, the CipherField SMT can be configured like so:

{
  //...
  "transforms":"cipher",
  "transforms.cipher.type":"com.github.hpgrahsl.kafka.connect.transforms.kryptonite.CipherField$Value",
  "transforms.cipher.cipher_mode": "ENCRYPT",
  "transforms.cipher.cipher_data_keys": "[{\"identifier\":\"my-demo-secret-key-123\",\"material\":{<TINK_KEYSET_SPEC_JSON_HERE>}}]", //key materials of utmost secrecy!
  "transforms.cipher.cipher_data_key_identifier": "my-demo-secret-key-123",
  "transforms.cipher.field_config": "[{\"name\":\"myString\"},{\"name\":\"myArray1\"},{\"name\":\"mySubDoc2\"}]",
  "transforms.cipher.field_mode": "OBJECT",
  //...
}

The result after applying this SMT is a record in which all the fields specified in the field_config parameter are encrypted using the keyset configured by its id with the cipher_data_key_identifier parameter. The keysets themselves are configured using the parameter cipher_data_keys where the key material itself is specified according to a Tink keyset configuration in JSON format (here is a concrete example). Apparently, the configured key materials have to be treated with utmost secrecy, for leaking any of the keyset materials renders encryption useless. The recommended way of doing this for now is to either

indirectly reference keyset materials by externalizing them into a separate properties file (find a few details here)

to NOT store the keyset materials at the client-side in the first place, but instead resolve keysets at runtime from a cloud KMS — Azure Key Vault, AWS Secrets Manager, and GCP Secret Manager are all supported.

In general though, this can be considered a "chicken-and-egg" problem since the confidential settings in order to access a remote KMS also need to be stored somewhere somehow.

Since the configuration parameter field_mode is set to OBJECT, complex field types are processed as a whole instead of element-wise, the latter of which can be achieved by choosing ELEMENT mode.

Below is an exemplary JSON-encoded record after the encryption:

{
  "id": "1234567890",
  "myString": "M007MIScg8F0A/cAddWbayvUPObjxuGFxisu5MUckDhBss6fo3gMWSsR4xOLPEfs4toSDDCxa7E=",
  "myInt": 42,
  "myBoolean": true,
  "mySubDoc1": {"myString":"hello json"},
  "myArray1": "UuEKnrv91bLImQvKqXTET7RTP93XeLfNRhzJaXVc6OGA4E+mbvGFs/q6WEFCAFy9wklJE5EPXJ+P85nTBCiVrTkU+TR+kUWB9zNplmOL70sENwwwsWux",
  "mySubDoc2": "fLAnBod5U8eS+LVNEm3vDJ1m32/HM170ASgJLKdPF78qDxcsiWj+zOkvZBsk2g44ZWHiSDy3JrI1btmUQhJc4OTnmqIPB1qAADqKhJztvyfcffOfM+y0ISsNk4+V6k0XHBdaT1tJXqLTsyoQfWmSZsnwpM4WARo5/cQWdAwwsWux"
}

NOTE: Encrypted fields are always represented as Base64-encoded strings which contain both, the ciphertext of the fields' original values and authenticated but unencrypted(!) meta-data. If you want to learn about a few more details look here.

Decryption of selected fields

Provided that the keyset used to encrypt the original data record is made available to a specific sink connector, the CipherField SMT can be configured to decrypt the data as follows:

{
  //...
  "transforms":"cipher",
  "transforms.cipher.type":"com.github.hpgrahsl.kafka.connect.transforms.kryptonite.CipherField$Value",
  "transforms.cipher.cipher_mode": "DECRYPT",
  "transforms.cipher.cipher_data_keys": "[{\"identifier\":\"my-demo-secret-key-123\",\"material\":{<TINK_KEYSET_SPEC_JSON_HERE>}}]", //key materials of utmost secrecy!
  "transforms.cipher.field_config": "[{\"name\":\"myString\"},{\"name\":\"myArray1\"},{\"name\":\"mySubDoc2\"}]",
  "transforms.cipher.field_mode": "OBJECT",
  //...
}

The result after applying this SMT is a record in which all the fields specified in the field_config parameter are decrypted using the keyset that was used to encrypt the original data. Apparently, this can work if and only if the keyset is properly configured.

Below is an exemplary JSON-encoded record after the decryption, which is equal to the original record:

{
  "id": "1234567890",
  "myString": "some foo bla text",
  "myInt": 42,
  "myBoolean": true,
  "mySubDoc1": {"myString":"hello json"},
  "myArray1": ["str_1","str_2","...","str_N"],
  "mySubDoc2": {"k1":9,"k2":8,"k3":7}
}

Data Records with Schema

The following example is based on an Avro value record and used to illustrate a simple encrypt/decrypt scenario for data records with schema. The schema could be defined as:

{
    "type": "record", "fields": [
        { "name": "id", "type": "string" },
        { "name": "myString", "type": "string" },
        { "name": "myInt", "type": "int" },
        { "name": "myBoolean", "type": "boolean" },
        { "name": "mySubDoc1", "type": "record",
            "fields": [
                { "name": "myString", "type": "string" }
            ]
        },
        { "name": "myArray1", "type": { "type": "array", "items": "string"}},
        { "name": "mySubDoc2", "type": { "type": "map", "values": "int"}}
    ]
}

The data of one such fictional record - represented by its Struct.toString() output - might look as:

Struct{
  id=1234567890,
  myString=some foo bla text,
  myInt=42,
  myBoolean=true,
  mySubDoc1=Struct{myString=hello json},
  myArray1=[str_1, str_2, ..., str_N],
  mySubDoc2={k1=9, k2=8, k3=7}
}

Encryption of selected fields

Let's assume the fields "myString","myArray1" and "mySubDoc2" of the above data record should get encrypted, the CipherField SMT can be configured as follows:

{
  //...
  "transforms":"cipher",
  "transforms.cipher.type":"com.github.hpgrahsl.kafka.connect.transforms.kryptonite.CipherField$Value",
  "transforms.cipher.cipher_mode": "ENCRYPT",
  "transforms.cipher.cipher_data_keys": "[{\"identifier\":\"my-demo-secret-key-123\",\"material\":{<TINK_KEYSET_SPEC_JSON_HERE>}}]", //key materials of utmost secrecy!
  "transforms.cipher.cipher_data_key_identifier": "my-demo-secret-key-123",
  "transforms.cipher.field_config": "[{\"name\":\"myString\"},{\"name\":\"myArray1\"},{\"name\":\"mySubDoc2\"}]",
  "transforms.cipher.field_mode": "OBJECT",
  //...
}

indirectly reference keyset materials by externalizing them into a separate properties file (find a few details here)

to NOT store the keyset materials at the client-side in the first place, but instead resolve keysets at runtime from a cloud KMS — Azure Key Vault, AWS Secrets Manager, and GCP Secret Manager are all supported.

In general though, this can be considered a "chicken-and-egg" problem since the confidential settings in order to access a remote KMS also need to be stored somewhere somehow.

Since the configuration parameter field_mode in the configuration above is set to 'OBJECT', complex field types are processed as a whole instead of element-wise, the latter of which can be achieved by choosing ELEMENT mode.

Below is an exemplary Struct.toString() output of the record after the encryption:

Struct{
  id=1234567890,
  myString=MwpKn9k5V4prVVGvAZdm6iOp8GnVUR7zyT+Ljb+bhcrFaGEx9xSNOpbZaJZ4YeBsJAj7DDCxa7E=,
  myInt=42,
  myBoolean=true,
  mySubDoc1=Struct{myString=hello json},
  myArray1=Ujlij/mbI48akEIZ08q363zOfV+OMJ+ZFewZEMBiaCnk7NuZZH+mfw6HGobtRzvxeavRhTL3lKI1jYPz0CYl7PqS7DJOJtJ1ccKDa5FLAgP0BQwwsWux,
  mySubDoc2=fJxvxo1LX1ceg2/Ba4+vq2NlgyJNiWGZhjWh6rkHQzuG+C7I8lNW8ECLxqJkNhuYuMMlZjK51gAZfID4HEWcMPz026HexzurptZdgkM1fqJMTMIryDKVlAicXc8phZ7gELZCepQWE0XKmQg0UBXr924V46x9I9QwaWUAdgwwsWux
}

NOTE 1: Encrypted fields are always represented as Base64-encoded strings which contain both, the ciphertext of the fields' original values and authenticated meta-data (unencrypted!) about the field in question. If you want to learn about a few more details look here.

NOTE 2: Obviously, in order to support this the original schema of the data record is automatically redacted such that any encrypted fields can be stored as strings, even though the original data types for the fields in question were different ones.

Decryption of selected fields

Provided that the keyset used to encrypt the original data record is made available to a specific sink connector, the CipherField SMT can be configured to decrypt the data as follows:

{
  //...
  "transforms":"cipher",
  "transforms.cipher.type":"com.github.hpgrahsl.kafka.connect.transforms.kryptonite.CipherField$Value",
  "transforms.cipher.cipher_mode": "DECRYPT",
  "transforms.cipher.cipher_data_keys": "[{\"identifier\":\"my-demo-secret-key-123\",\"material\":{<TINK_KEYSET_SPEC_JSON_HERE>}}]", //key materials of utmost secrecy!
  "transforms.cipher.field_config": "[{\"name\":\"myString\",\"schema\": {\"type\": \"STRING\"}},{\"name\":\"myArray1\",\"schema\": {\"type\": \"ARRAY\",\"valueSchema\": {\"type\": \"STRING\"}}},{\"name\":\"mySubDoc2\",\"schema\": { \"type\": \"MAP\", \"keySchema\": { \"type\": \"STRING\" }, \"valueSchema\": { \"type\": \"INT32\"}}}]",
  "transforms.cipher.field_mode": "OBJECT",
  //...
}

Take notice of the extended field_config parameter settings. For decryption of schema-aware data, the SMT configuration expects that for each field to decrypt the original schema information is explicitly specified. This allows to redact the encrypted record's schema towards a compatible decrypted record's schema upfront, such that the resulting plaintext field values can be stored in accordance with their original data types.

Below is the decrypted data - represented by its Struct.toString() output - which is equal to the original record:

Struct{
  id=1234567890,
  myString=some foo bla text,
  myInt=42,
  myBoolean=true,
  mySubDoc1=Struct{myString=hello json},
  myArray1=[str_1, str_2, ..., str_N],
  mySubDoc2={k1=9, k2=8, k3=7}
}

Format Preserving Encryption (FPE)

Starting with version 0.6.0, Kryptonite for Kafka supports Format Preserving Encryption (FPE) using the FF3-1 algorithm. Unlike the already supported standard AEAD encryption schemes (AES-GCM/AES-GCM-SIV) which produces variable-length ciphertext, FPE maintains the original format and length of the plaintext data.

Key Characteristics of FPE

Format Preservation: Encrypted data maintains the same format and length as the original plaintext
Character Set Preservation: The ciphertext uses the same character set (alphabet) as the plaintext
Use Cases: Ideal for scenarios where encrypted data must conform to specific formats, such as:
- Credit card numbers (CCN)
- Social security numbers (SSN)
- Phone numbers
- Postal codes
- Database columns with strict format constraints

FPE Configuration

To use FPE, configure the following parameters in your field_config:

algorithm: Set to CUSTOM/MYSTO_FPE_FF3_1 (required for FPE)
fpeAlphabetType: Specifies the character set for encryption (required for FPE)
fpeTweak (optional): A tweak value for additional cryptographic variation (default: 0000000)
fpeAlphabetCustom (optional): Required only when fpeAlphabetType=CUSTOM

Supported Alphabet Types

Alphabet Type	Characters	Example Use Case
`DIGITS`	`0123456789`	Credit card numbers, SSN, numeric IDs
`UPPERCASE`	`A-Z`	Uppercase text data
`LOWERCASE`	`a-z`	Lowercase text data
`ALPHANUMERIC`	`0-9A-Za-z`	Mixed alphanumeric codes
`ALPHANUMERIC_EXTENDED`	`0-9A-Za-z _,.!?@%$&§"'°^-+*/;:#(){}[]<>=~\|`	Text with special characters
`HEXADECIMAL`	`0-9A-F`	Hexadecimal strings
`CUSTOM`	User-defined via `fpeAlphabetCustom`	Custom character sets (e.g., binary: `01`)

FPE Example: Encrypting Sensitive Data

Let's assume you have a record with sensitive fields that must maintain their format:

{
  "customerId": "CUST-12345",
  "creditCardNumber": "4455202014528870",
  "ssn": "230564998",
  "email": "customer@example.com"
}

To encrypt the credit card number and SSN using FPE while preserving their numeric format:

{
  //...
  "transforms":"cipher",
  "transforms.cipher.type":"com.github.hpgrahsl.kafka.connect.transforms.kryptonite.CipherField$Value",
  "transforms.cipher.cipher_mode": "ENCRYPT",
  "transforms.cipher.cipher_data_keys": "[{\"identifier\":\"my-fpe-key-456\",\"material\":{<FPE_TINK_KEYSET_SPEC_JSON_HERE>}}]",
  "transforms.cipher.cipher_data_key_identifier": "my-fpe-key-456",
  "transforms.cipher.field_config": "[{\"name\":\"creditCardNumber\",\"algorithm\":\"CUSTOM/MYSTO_FPE_FF3_1\",\"fpeAlphabetType\":\"DIGITS\"},{\"name\":\"ssn\",\"algorithm\":\"CUSTOM/MYSTO_FPE_FF3_1\",\"fpeAlphabetType\":\"DIGITS\"}]",
  "transforms.cipher.field_mode": "OBJECT",
  //...
}

After encryption, the record maintains the numeric format:

{
  "customerId": "CUST-12345",
  "creditCardNumber": "7823956140762231",  // Still 16 digits!
  "ssn": "845721369",  // Still 9 digits!
  "email": "customer@example.com"
}

FPE Decryption

Decryption requires the exact same FPE configuration i.e. algorithm, alphabet type, and (optional) tweak:

{
  //...
  "transforms":"cipher",
  "transforms.cipher.type":"com.github.hpgrahsl.kafka.connect.transforms.kryptonite.CipherField$Value",
  "transforms.cipher.cipher_mode": "DECRYPT",
  "transforms.cipher.cipher_data_keys": "[{\"identifier\":\"my-fpe-key-456\",\"material\":{<FPE_TINK_KEYSET_SPEC_JSON_HERE>}}]",
  "transforms.cipher.field_config": "[{\"name\":\"creditCardNumber\",\"schema\":{\"type\":\"STRING\"},\"algorithm\":\"CUSTOM/MYSTO_FPE_FF3_1\",\"fpeAlphabetType\":\"DIGITS\"},{\"name\":\"ssn\",\"schema\":{\"type\":\"STRING\"},\"algorithm\":\"CUSTOM/MYSTO_FPE_FF3_1\",\"fpeAlphabetType\":\"DIGITS\"}]",
  "transforms.cipher.field_mode": "OBJECT",
  //...
}

FPE Keyset Configuration

FPE requires special keyset material with a custom type URL. Here's an example keyset for FPE:

{
  "identifier": "my-fpe-key-456",
  "material": {
    "primaryKeyId": 2000001,
    "key": [
      {
        "keyData": {
          "typeUrl": "io.github.hpgrahsl.kryptonite/crypto.custom.mysto.fpe.FpeKey",
          "value": "<BASE64_ENCODED_FPE_KEY_HERE>",
          "keyMaterialType": "SYMMETRIC"
        },
        "status": "ENABLED",
        "keyId": 2000001,
        "outputPrefixType": "RAW"
      }
    ]
  }
}

Key differences from standard AEAD keysets:

typeUrl: Must be io.github.hpgrahsl.kryptonite/crypto.custom.mysto.fpe.FpeKey (not type.googleapis.com/google.crypto.tink.AesGcmKey)
outputPrefixType: Should be RAW (not TINK)

FPE Considerations

Minimum Length: FPE requires input data to meet minimum length requirements based on the alphabet size. Ensure your data is long enough for the chosen alphabet. In case it's not, you'll get an exception hinting at the length violation at runtime.
Consistent Configuration: The exact same fpeAlphabetType, fpeAlphabetCustom, and fpeTweak must be used for both encryption and decryption in order for FPE to work correctly.
Security vs Format: While FPE preserves format, it offers different - usually weaker - security properties compared to standard AEAD encryption based on AES ciphers.
Tweak Parameter: The optional fpeTweak adds cryptographic variation, resulting in different ciphertext for the same plaintext when different tweaks are used.

Configuration Parameters

Name	Description	Type	Default	Valid Values	Importance
cipher_data_key_identifier	keyset identifier to be used as default data encryption keyset for all fields which don't refer to a specific keyset identifier in its `field_config`	string	!no default!	non-empty string if cipher_mode=ENCRYPT empty string if cipher_mode=DECRYPT	high
cipher_data_keys	JSON array with plain or encrypted data key objects specifying the key identifiers together with key sets for encryption / decryption which are defined in Tink's key specification format. The contained keyset objects are mandatory if `kms_type=NONE` but the array may be left empty when using e.g. `kms_type=AZ_KV_SECRETS`, `kms_type=AWS_SM_SECRETS`, or `kms_type=GCP_SM_SECRETS` in order to resolve keysets from a remote KMS. NOTE: Irrespective of their origin, all plain or encrypted keysets (see the example values in the right column) are expected to be valid tink keyset descriptions in JSON format.	string	[]	JSON array either empty or holding N data key config objects each of which refers to a tink keyset in JSON format (see "material" field) plain data key config example: [ { "identifier": "my-demo-secret-key-123", "material": { "primaryKeyId": 123456789, "key": [ { "keyData": { "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey", "value": "<BASE64_ENCODED_KEY_HERE>", "keyMaterialType": "SYMMETRIC" }, "status": "ENABLED", "keyId": 123456789, "outputPrefixType": "TINK" } ] } } ] encrypted data key config example: [ { "identifier": "my-demo-secret-key-123", "material": { "encryptedKeyset": "<ENCRYPTED_AND_BASE64_ENCODED_KEYSET_HERE>", "keysetInfo": { "primaryKeyId": 123456789, "keyInfo": [ { "typeUrl": "type.googleapis.com/google.crypto.tink.AesSivKey", "status": "ENABLED", "keyId": 123456789, "outputPrefixType": "TINK" } ] } } } ]	high
cipher_mode	defines whether the data should get encrypted or decrypted	string	!no default!	ENCRYPT DECRYPT	high
field_config	JSON array with field config objects specifying which fields together with their settings should get either encrypted / decrypted (nested field names are expected to be separated by `.` per default, or by a custom `path_delimiter`	string		JSON array holding at least one valid field config object, e.g. [ { "name": "my-field-abc" }, { "name": "my-nested.field-xyz" } ]	high
key_source	defines the nature and origin of the keysets: plain data keysets in `cipher_data_keys (key_source=CONFIG)` encrypted data keysets in `cipher_data_keys (key_source=CONFIG_ENCRYPTED)` plain data keysets residing in a cloud/remote key management system `(key_source=KMS)` encrypted data keysets residing in a cloud/remote key management system `(key_source=KMS_ENCRYPTED)` When using the KMS options refer to the `kms_type` and `kms_config` settings. When using encrypted data keysets refer to the `kek_type`, `kek_config` and `kek_uri` settings as well.	string	CONFIG	CONFIG CONFIG_ENCRYPTED KMS KMS_ENCRYPTED	high
kms_type	defines if: data keysets are read from the config directly `kms_source=CONFIG \| CONFIG_ENCRYPTED` data keysets are resolved from a remote/cloud key management system (Azure Key Vault, AWS Secrets Manager, or GCP Secret Manager) `kms_source=KMS \| KMS_ENCRYPTED`	string	NONE	NONE AZ_KV_SECRETS AWS_SM_SECRETS GCP_SM_SECRETS	medium
kms_config	JSON object specifying KMS-specific client authentication settings for the chosen `kms_type`	string	{}	JSON object defining the KMS-specific client authentication settings: for Azure Key Vault (`kms_type=AZ_KV_SECRETS`): { "clientId": "...", "tenantId": "...", "clientSecret": "...", "keyVaultUrl": "https://<vault-name>.vault.azure.net" } for AWS Secrets Manager (`kms_type=AWS_SM_SECRETS`): { "accessKey": "AKIA...", "secretKey": "...", "region": "eu-central-1" } for GCP Secret Manager (`kms_type=GCP_SM_SECRETS`): { "credentials": "<GCP service account JSON contents>", "projectId": "my-gcp-project" }	medium
kek_type	defines which cloud KMS is used to perform key encryption, hence where the Key Encryption Key (KEK) to protect data keysets at rest resides. Must be specified when using `kms_source=CONFIG_ENCRYPTED \| KMS_ENCRYPTED`	string	NONE	NONE GCP AWS AZURE	medium
kek_config	JSON object specifying KMS-specific client authentication settings for the chosen `kek_type`	string	{}	JSON object specifying the KMS-specific client authentication settings: for GCP Cloud KMS (`kek_type=GCP`): { "credentials": "<GCP service account JSON contents>", "projectId": "my-gcp-project" } for AWS KMS (`kek_type=AWS`): { "accessKey": "AKIA...", "secretKey": "..." } for Azure Key Vault (`kek_type=AZURE`): { "clientId": "...", "tenantId": "...", "clientSecret": "...", "keyVaultUrl": "https://<vault-name>.vault.azure.net" }	medium
kek_uri	URI referring to the key encryption key stored in the respective remote/cloud KMS	string	!no default!	a valid key encryption key URI for the chosen `kek_type`: GCP Cloud KMS (`kek_type=GCP`): gcp-kms://projects/<project>/locations/<location>/keyRings/<keyring>/cryptoKeys/<key> AWS KMS (`kek_type=AWS`): aws-kms://arn:aws:kms:<region>:<account-id>:key/<key-id> Azure Key Vault (`kek_type=AZURE`): azure-kv://<vault-name>.vault.azure.net/keys/<key-name>	medium
field_mode	defines how to process complex field types (maps, lists, structs), either as full objects or element-wise	string	ELEMENT	ELEMENT OBJECT	medium
cipher_algorithm	default cipher algorithm used for data encryption if not specified for a field in its `field_config`	string	TINK/AES_GCM	TINK/AES_GCM TINK/AES_GCM_SIV CUSTOM/MYSTO_FPE_FF3_1	medium
cipher_fpe_tweak	default tweak value for Format Preserving Encryption (FPE) if not specified for a field in its `field_config`. The tweak provides additional cryptographic variation - different tweaks produce different ciphertexts for the same plaintext.	string	0000000	any string value (typically 7 characters)	medium
cipher_fpe_alphabet_type	default alphabet type for Format Preserving Encryption (FPE) if not specified for a field in its `field_config`. Defines the character set to be used. Note that the plaintext may only contain characters from this set. As a result, the ciphertext after FPE encryption will be composed of the same set of characters.	string	ALPHANUMERIC	DIGITS UPPERCASE LOWERCASE ALPHANUMERIC ALPHANUMERIC_EXTENDED HEXADECIMAL CUSTOM	medium
cipher_fpe_alphabet_custom	custom alphabet for Format Preserving Encryption (FPE) when `cipher_fpe_alphabet_type=CUSTOM`. Specifies the exact set of characters to use for encryption (e.g., "01" for binary, "0123456789ABCDEF" for hexadecimal).	string		any non-empty string defining a custom character set (minimum 2 unique characters)	medium
cipher_text_encoding	defines the encoding of the resulting ciphertext bytes.	string	BASE64	BASE64 RAW_BYTES	low
path_delimiter	path delimiter used as field name separator when referring to nested fields in the input record	string	.	non-empty string	low

Externalize configuration parameters

The problem with directly specifying configuration parameters which contain sensitive data, such as keyset materials, is that they are exposed via Kafka Connect's REST API. This means for connect clusters that are shared among teams the configured keyset materials would leak, which would be unacceptable. The way to deal with this for now, is to indirectly reference such configuration parameters from external property files.

This approach can be used to configure any kind of sensitive data such as keyset materials themselves or KMS-specific client authentication settings, in case the keysets aren't sourced from the config directly but rather retrieved from a cloud KMS such as Azure Key Vault, AWS Secrets Manager, or GCP Secret Manager.

Below is a quick example of how such a configuration would look like:

Before you can make use of configuration parameters from external sources you have to customize your Kafka Connect worker configuration by adding the following two settings:

connect.config.providers=file
connect.config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider

Then you create the external properties file e.g. classified.properties which contains the keyset materials. This file needs to be available on all your Kafka Connect workers which you want to run Kryptonite on. Let's pretend the file is located at path /secrets/kryptonite/classified.properties on your worker nodes:

cipher_data_keys=[{"identifier":"my-demo-secret-key-123","material":{<TINK_KEYSET_SPEC_JSON_HERE>}}]

Finally, you simply reference this file and the corresponding key of the property therein, from your SMT configuration like so:

{
  //...
  "transforms":"cipher",
  "transforms.cipher.type":"com.github.hpgrahsl.kafka.connect.transforms.kryptonite.CipherField$Value",
  "transforms.cipher.cipher_mode": "ENCRYPT",
  "transforms.cipher.cipher_data_keys": "${file:/secrets/kryptonite/classified.properties:cipher_data_keys}",
  "transforms.cipher.cipher_data_key_identifier": "my-demo-secret-key-123",
  "transforms.cipher.field_config": "[{\"name\":\"myString\"},{\"name\":\"myArray1\"},{\"name\":\"mySubDoc2\"}]",
  "transforms.cipher.field_mode": "OBJECT",
  //...
}

In case you want to learn more about configuration parameter externalization there is e.g. this nice blog post from the Debezium team showing how to externalize username and password settings using a docker-compose example.

Tink Keysets

Key material is configured in the cipher_data_keys property of the CipherField SMT which takes an array of JSON objects. The material field in one such JSON object represents a keyset and might look as follows:

{
  "primaryKeyId": 1234567890,
  "key": [
    {
      "keyData": {
        "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
        "value": "<BASE64_ENCODED_KEY_HERE>",
        "keyMaterialType": "SYMMETRIC"
      },
      "status": "ENABLED",
      "keyId": 1234567890,
      "outputPrefixType": "TINK"
    }
  ]
}

Note that the JSON snippet above needs to be specified either:

as single-line JSON object in an external config file (.properties)

... "material": { "primaryKeyId": 1234567890, "key": [ { "keyData": { "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey", "value": "<BASE64_ENCODED_KEY_HERE>", "keyMaterialType": "SYMMETRIC" }, "status": "ENABLED", "keyId": 1234567890, "outputPrefixType": "TINK" } ] } ...

as single-line escape/quoted JSON string if included directly within a connector's JSON configuration

"... \"material\": { \"primaryKeyId\": 1234567890, \"key\": [ { \"keyData\": { \"typeUrl\": \"type.googleapis.com/google.crypto.tink.AesGcmKey\", \"value\": \"<BASE64_ENCODED_KEY_HERE>\", \"keyMaterialType\": \"SYMMETRIC\" }, \"status\": \"ENABLED\", \"keyId\": 1234567890, \"outputPrefixType\": \"TINK\" } ] } ..."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kryptonite for Kafka: Apache Kafka Connect SMTs

📚 Documentation

👉 For the latest module documentation please go to the official docs page for the Apache Kafka Connect SMTs 👈

Kafka Connect Transformation (SMT)

Build and Deployment

Data Records without Schema

Encryption of selected fields

Decryption of selected fields

Data Records with Schema

Encryption of selected fields

Decryption of selected fields

Format Preserving Encryption (FPE)

Key Characteristics of FPE

FPE Configuration

Supported Alphabet Types

FPE Example: Encrypting Sensitive Data

FPE Decryption

FPE Keyset Configuration

FPE Considerations

Configuration Parameters

Externalize configuration parameters

Tink Keysets

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Kryptonite for Kafka: Apache Kafka Connect SMTs

📚 Documentation

👉 For the latest module documentation please go to the official docs page for the Apache Kafka Connect SMTs 👈

Kafka Connect Transformation (SMT)

Build and Deployment

Data Records without Schema

Encryption of selected fields

Decryption of selected fields

Data Records with Schema

Encryption of selected fields

Decryption of selected fields

Format Preserving Encryption (FPE)

Key Characteristics of FPE

FPE Configuration

Supported Alphabet Types

FPE Example: Encrypting Sensitive Data

FPE Decryption

FPE Keyset Configuration

FPE Considerations

Configuration Parameters

Externalize configuration parameters

Tink Keysets