Skip to content

cmek split 3/8: AWS and GCP KMS clients#4572

Open
tenfyzhong wants to merge 8 commits intomasterfrom
split/pr3955-03-kms-client
Open

cmek split 3/8: AWS and GCP KMS clients#4572
tenfyzhong wants to merge 8 commits intomasterfrom
split/pr3955-03-kms-client

Conversation

@tenfyzhong
Copy link
Copy Markdown
Collaborator

@tenfyzhong tenfyzhong commented Mar 23, 2026

What problem does this PR solve?

Issue Number: ref #3943

This is split PR 3/8 from #3955.

What is changed and how it works?

This PR adds KMS integration and related configuration:

  • Add AWS KMS and GCP KMS decryptor implementations.
  • Add KMS client factory with provider-specific configuration resolution and client reuse.
  • Add CMEK-related debug config fields for AWS/GCP overrides.
  • Update module dependencies required by KMS providers.
  • Add unit tests for KMS client behavior.

This PR prepares master-key decryption capability used by encryption metadata manager in later splits.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?

No direct behavior change for EventStore/SchemaStore in this split.

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

None

Summary by CodeRabbit

  • New Features

    • Added encryption-at-rest support via AWS KMS or GCP KMS with configurable enablement, refresh/cache intervals, and degrade-on-error behavior.
    • Flexible credential and endpoint configuration for cloud KMS providers.
    • JSON byte-array input now accepts base64 strings, integer arrays, or null for keys.
    • In-process mock KMS client for local encrypt/decrypt testing.
  • Tests

    • Expanded tests for config decoding, validation, KMS flows, caching, and error cases.
  • Chores

    • Updated cloud SDK and API dependencies; promoted relevant Google API to a direct dependency.

Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
@ti-chi-bot ti-chi-bot bot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Mar 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e173b582-9d6f-415a-8d79-7b387d50b04f

📥 Commits

Reviewing files that changed from the base of the PR and between a474f93 and 323af60.

📒 Files selected for processing (2)
  • pkg/encryption/kms/client.go
  • pkg/encryption/kms/client_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/encryption/kms/client_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/encryption/kms/client.go

📝 Walkthrough

Walkthrough

Adds CMEK support: new encryption configuration, TOML/json parsing, a pluggable KMS client with AWS and GCP decryptors (and caching), JSON byte-array parsing, a mock KMS client for tests, and dependency updates for cloud KMS SDKs.

Changes

Cohort / File(s) Summary
Module deps
go.mod
Added cloud.google.com/go/kms v1.15.8, promoted google.golang.org/api v0.170.0 to direct, upgraded github.com/aws/aws-sdk-go-v2 and added .../service/kms; bumped several indirect deps.
Config types & server wiring
pkg/config/encryption.go, pkg/config/server.go, pkg/config/server_config_test.go
Introduced EncryptionConfig, KMSConfig, AWSKMSConfig, GCPKMSConfig, NewDefaultEncryptionConfig(), wired Encryption into ServerConfig, and added TOML decoding and ValidateAndAdjust tests.
JSON utilities
pkg/encryption/json_types.go
Added encryption.ByteArray with custom UnmarshalJSON supporting null, base64 string, or integer-array inputs and validating byte ranges.
KMS client core & tests
pkg/encryption/kms/client.go, pkg/encryption/kms/client_test.go
Added KMS client abstraction (vendor normalization, validation, thread-safe decryptor caching, injectable factories, DecryptMasterKey, Close) and unit tests covering selection, caching, validation, and logging.
AWS KMS implementation
pkg/encryption/kms/aws_kms.go
New AWS decryptor using AWS SDK v2; supports region/profile/static creds, custom endpoint normalization, and Decrypt implementation.
GCP KMS implementation
pkg/encryption/kms/gcp_kms.go
New GCP decryptor using cloudkms KeyManagementClient; supports endpoint/credentials-file/JSON, Decrypt and Close, plus endpoint normalization.
Mock KMS client (test helper)
pkg/encryption/kms/mock_client.go
Added KMSClient interface and MockKMSClient that encrypts/decrypts master keys using an in-memory AES-CTR key for tests.

Sequence Diagram(s)

sequenceDiagram
    participant Config as EncryptionConfig
    participant Client as KMS Client
    participant Cache as Decryptor Cache
    participant AWS as AWS KMS
    participant GCP as GCP KMS

    Config->>Client: NewClient(cfg)
    Client->>Client: validate cfg & init cache

    Client->>Client: DecryptMasterKey(vendor, keyID, ciphertext, region, endpoint)
    Client->>Client: normalize vendor & resolve cfg
    Client->>Cache: lookup decryptor (region/endpoint key)

    alt cached
        Cache-->>Client: decryptor
    else not cached
        alt vendor == aws
            Client->>AWS: newAWSDecryptor(resolved cfg)
            AWS-->>Client: awsKMSDecryptor
        else vendor == gcp
            Client->>GCP: newGCPDecryptor(resolved cfg)
            GCP-->>Client: gcpKMSDecryptor
        end
        Client->>Cache: store decryptor
    end

    Client->>Client: decryptor.Decrypt(ctx, keyID, ciphertext)
    Client-->>Config: return plaintext / error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • asddongmen
  • flowbehappy

Poem

🐰 I hopped through configs, keys aglow,
AWS and GCP in a tidy row.
TOML whispered secrets, caches held tight,
Tests danced softly through the night,
A rabbit cheered — encryption takes flight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'cmek split 3/8: AWS and GCP KMS clients' clearly and concisely summarizes the primary change—adding AWS and GCP KMS client implementations as part of a larger split PR series.
Description check ✅ Passed The description addresses all critical template sections: references issue #3943, explains this is split PR 3/8, describes the changes and how they work, marks unit tests as included, confirms no performance regressions or compatibility breaks, states no documentation updates needed, and provides a release note (None).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch split/pr3955-03-kms-client

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 23, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for supporting Customer-Managed Encryption Keys (CMEK) at rest within TiCDC. By integrating dedicated clients for AWS and GCP Key Management Services, it enables the system to securely decrypt master keys using cloud-native encryption solutions. This enhancement significantly boosts data security by allowing users to manage their own encryption keys in their preferred cloud environment, ensuring compliance and control over sensitive data.

Highlights

  • AWS KMS Client Integration: Introduced a new AWS KMS client and decryptor, enabling decryption of master keys using AWS Key Management Service with configurable regions, endpoints, profiles, and static credentials.
  • GCP KMS Client Integration: Implemented a new GCP KMS client and decryptor, allowing master key decryption via Google Cloud KMS, with support for custom endpoints and service account credentials (file or JSON content).
  • Encryption Configuration: Added a comprehensive EncryptionConfig to the debug and server configurations, including settings for enabling encryption, metadata refresh intervals, cache TTL, error degradation, and specific KMS client overrides for AWS and GCP.
  • Dependency Updates: Updated go.mod and go.sum to include necessary AWS SDK v2 and Google Cloud KMS dependencies, along with other minor dependency version adjustments.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces AWS and GCP KMS client integrations for CMEK encryption at rest. It includes new configuration structures, client implementations, and corresponding tests. The changes to go.mod and go.sum reflect the addition of new cloud provider SDK dependencies and some version updates. Overall, the new functionality seems well-structured, but there are a few areas that could be improved for robustness and clarity.

go.mod Outdated
module github.com/pingcap/ticdc

go 1.25.8
go 1.25.5
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Go version is being downgraded from 1.25.8 to 1.25.5. Please provide a clear justification for this downgrade. Downgrading the Go version can sometimes lead to compatibility issues or loss of performance/security improvements available in newer patch releases.

cloud.google.com/go/iam v1.1.7 // indirect
cloud.google.com/go/kms v1.15.8 // indirect
filippo.io/edwards25519 v1.1.1 // indirect
filippo.io/edwards25519 v1.1.0 // indirect
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The filippo.io/edwards25519 dependency is being downgraded from v1.1.1 to v1.1.0. Patch version downgrades can sometimes reintroduce bugs or security vulnerabilities that were fixed in the newer version. Please ensure this downgrade is intentional and does not negatively impact the project's stability or security.

Comment on lines +71 to +73
if err == nil && parsed.Host != "" {
return parsed.Host
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In normalizeGCPEndpoint, the error returned by url.Parse(endpoint) is ignored. If the endpoint string is malformed, this could lead to an invalid host being returned or unexpected behavior later when the endpoint is used. It's safer to handle this error explicitly, perhaps by logging it and returning an empty string or an error.

        parsed, err := url.Parse(endpoint)
        if err != nil {
                // Log the error or handle it appropriately
                return ""
        }
        if parsed.Host != "" {
                return parsed.Host
        }

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 23, 2026
@tenfyzhong tenfyzhong force-pushed the split/pr3955-02-tikv-meta-client branch from 16993f5 to cf27855 Compare March 23, 2026 09:53
Base automatically changed from split/pr3955-02-tikv-meta-client to master March 31, 2026 14:36
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 1, 2026
}

// EncryptionConfig represents config for CMEK encryption at rest
type EncryptionConfig struct {
Copy link
Copy Markdown
Collaborator

@lidezhu lidezhu Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more appropriate to place EncryptionConfig not under the debug section? Given that this is a configuration users may need to modify themselves, it should make more sense to have encryption as a root-level configuration item.

Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

The Encryption configuration has been relocated to the root level. PTAL @lidezhu

Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (10)
pkg/encryption/kms/gcp_kms.go (2)

30-39: Wrap GCP client errors for consistent stack traces.

Per coding guidelines, errors from library/third-party calls should be wrapped immediately with errors.Trace(err) or errors.WrapError(...) to attach a stack trace.

♻️ Proposed fix
+	"github.com/pingcap/errors"
 )

 func (d *gcpKMSDecryptor) Decrypt(ctx context.Context, keyID string, ciphertext []byte) ([]byte, error) {
 	resp, err := d.client.Decrypt(ctx, &kmspb.DecryptRequest{
 		Name:       keyID,
 		Ciphertext: ciphertext,
 	})
 	if err != nil {
-		return nil, err
+		return nil, errors.Trace(err)
 	}
 	return resp.Plaintext, nil
 }

As per coding guidelines: "When an error comes from a third-party or library call in Go, wrap it immediately with errors.Trace(err) or errors.WrapError(...) to attach a stack trace."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/encryption/kms/gcp_kms.go` around lines 30 - 39, The Decrypt
implementation in gcpKMSDecryptor returns third-party errors directly from
d.client.Decrypt; per guidelines wrap those errors to attach a stack trace.
Modify gcpKMSDecryptor.Decrypt to wrap the returned err (from d.client.Decrypt)
using errors.Trace(err) or errors.WrapError(...) before returning (while still
returning nil for the plaintext on error), so callers get a wrapped error with
stack info.

45-62: Wrap error from NewKeyManagementClient.

The error returned from the GCP SDK should be wrapped for consistent stack traces.

♻️ Proposed fix
 	client, err := cloudkms.NewKeyManagementClient(ctx, opts...)
 	if err != nil {
-		return nil, err
+		return nil, errors.Trace(err)
 	}
 	return &gcpKMSDecryptor{client: client}, nil

As per coding guidelines: "When an error comes from a third-party or library call in Go, wrap it immediately with errors.Trace(err)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/encryption/kms/gcp_kms.go` around lines 45 - 62, The call to
cloudkms.NewKeyManagementClient in newGCPDecryptor returns an unwrapped error;
update the error handling to wrap the returned err using errors.Trace(err)
before returning so stack traces are preserved (i.e., replace the direct return
of err after NewKeyManagementClient with a wrapped error via errors.Trace(err)
and keep returning the gcpKMSDecryptor on success).
pkg/encryption/kms/aws_kms.go (3)

71-77: Wrap error from LoadDefaultConfig.

The error from AWS config loading should be wrapped for consistent stack traces.

♻️ Proposed fix
 	awsCfg, err := awsconfig.LoadDefaultConfig(ctx, opts...)
 	if err != nil {
-		return nil, err
+		return nil, errors.Trace(err)
 	}

 	return &awsKMSDecryptor{client: awskms.NewFromConfig(awsCfg)}, nil

As per coding guidelines: "When an error comes from a third-party or library call in Go, wrap it immediately with errors.Trace(err)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/encryption/kms/aws_kms.go` around lines 71 - 77, The error returned by
awsconfig.LoadDefaultConfig in the aws_kms.go initializer is not wrapped; update
the error handling in the function that constructs awsKMSDecryptor so that the
returned error is wrapped with errors.Trace(err) before returning (i.e., replace
returning err with returning errors.Trace(err)), keeping the rest of the logic
that creates &awsKMSDecryptor{client: awskms.NewFromConfig(awsCfg)} unchanged.

30-42: Wrap AWS client errors for consistent stack traces.

Per coding guidelines, errors from library calls should be wrapped to attach stack traces.

♻️ Proposed fix
+	"github.com/pingcap/errors"
 )

 func (d *awsKMSDecryptor) Decrypt(ctx context.Context, keyID string, ciphertext []byte) ([]byte, error) {
 	input := &awskms.DecryptInput{
 		CiphertextBlob: ciphertext,
 	}
 	if keyID != "" {
 		input.KeyId = aws.String(keyID)
 	}
 	out, err := d.client.Decrypt(ctx, input)
 	if err != nil {
-		return nil, err
+		return nil, errors.Trace(err)
 	}
 	return out.Plaintext, nil
 }

As per coding guidelines: "When an error comes from a third-party or library call in Go, wrap it immediately with errors.Trace(err)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/encryption/kms/aws_kms.go` around lines 30 - 42, The call in
awsKMSDecryptor.Decrypt returns raw library errors (from d.client.Decrypt);
update the error handling to wrap the returned err with errors.Trace(err) before
returning so stack traces are attached (i.e., replace direct return of err with
return nil, errors.Trace(err) in the Decrypt method after calling
d.client.Decrypt). Ensure you import and use the errors.Trace function and keep
existing behavior for successful returns (out.Plaintext).

56-69: Consider updating to EndpointResolverV2 for better compatibility with future AWS SDK versions.

The aws.EndpointResolverWithOptions approach is deprecated in AWS SDK for Go v2 in favor of service-specific endpoint resolution. The deprecated pattern can prevent access to newer endpoint-related features. Consider updating to either BaseEndpoint for simple URL overrides or the service-specific EndpointResolverV2 interface for custom endpoint logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/encryption/kms/aws_kms.go` around lines 56 - 69, The code currently uses
aws.EndpointResolverWithOptionsFunc in the block guarded by cfg.Endpoint (see
cfg.Endpoint, normalizeAWSEndpoint) which is deprecated; replace this custom
resolver with the newer aws.EndpointResolverV2 implementation (or use
service-specific BaseEndpoint override) and register it via
awsconfig.WithEndpointResolver to ensure compatibility: implement an
aws.EndpointResolverV2 that returns an aws.Endpoint with URL and SigningRegion
for awskms.ServiceID and use awsconfig.WithEndpointResolver to append that
resolver where opts is built (replace aws.EndpointResolverWithOptionsFunc and
awsconfig.WithEndpointResolverWithOptions usage).
pkg/config/encryption.go (3)

43-56: Consider marking sensitive fields for log redaction.

SecretAccessKey and SessionToken contain sensitive credentials. While the config isn't directly logged, ensure these fields are redacted if the config is ever serialized for debugging or error messages. A common pattern is adding a String() method that masks sensitive fields.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/encryption.go` around lines 43 - 56, The AWSKMSConfig struct
contains sensitive fields (SecretAccessKey and SessionToken) that should be
masked when serialized or logged; add a String() method on AWSKMSConfig that
returns a safe representation by redacting or replacing SecretAccessKey and
SessionToken with a placeholder (e.g., "***REDACTED***") while leaving
non-sensitive fields like Region, Endpoint, Profile, and AccessKey intact, and
ensure any existing debug or error paths use this String() method when printing
the config.

23-27: Document that MetaRefreshInterval and MetaCacheTTL are reserved for future use.

Based on the context snippets from pkg/encryption/kms/client.go, these fields are declared and initialized but never consumed by the KMS client or any other component in this PR. This could confuse operators who set these values expecting them to have an effect.

Consider either:

  1. Removing these fields until they're implemented
  2. Adding comments clarifying they are reserved for future encryption metadata manager integration
+	// MetaRefreshInterval is the interval for refreshing encryption metadata.
+	// NOTE: Reserved for future use by the encryption metadata manager.
 	MetaRefreshInterval TomlDuration `toml:"meta-refresh-interval" json:"meta_refresh_interval"`

+	// MetaCacheTTL is the TTL for caching encryption metadata.
+	// NOTE: Reserved for future use by the encryption metadata manager.
 	MetaCacheTTL TomlDuration `toml:"meta-cache-ttl" json:"meta_cache_ttl"`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/encryption.go` around lines 23 - 27, The MetaRefreshInterval and
MetaCacheTTL fields (TomlDuration types) are declared but unused; update their
comments to state clearly that they are reserved for future use by the
encryption metadata manager and currently have no effect, so operators won't be
misled, or remove the fields entirely if you prefer not to expose unused config;
locate the struct containing MetaRefreshInterval and MetaCacheTTL and either
change the comment above each field to: "Reserved for future use by the
encryption metadata manager; currently not consumed" (keeping the toml/json
tags) or delete both fields and any initialization/defaulting tied to them.

58-66: Consider marking CredentialsJSON for log redaction.

Similar to AWS credentials, CredentialsJSON contains the full service account key which is highly sensitive.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/encryption.go` around lines 58 - 66, The GCPKMSConfig struct's
CredentialsJSON field contains sensitive service account keys and should be
marked for log redaction; locate the GCPKMSConfig type and add a
redaction/logging tag to the CredentialsJSON field (e.g., include a tag like
`log:"sensitive"` or the project's equivalent redaction tag) so automatic
logging/redaction mechanisms will hide its value; ensure the existing toml/json
tags remain intact and only the additional redaction tag is appended to the
CredentialsJSON struct tag.
pkg/encryption/kms/mock_client.go (1)

27-31: Consider relocating the KMSClient interface to the client file.

The KMSClient interface is defined in this mock file, but based on context snippet 1 (pkg/encryption/kms/client.go:92-107), the NewClient function returns KMSClient. Defining the interface in the mock file can create import cycles or confusion. Typically, interfaces should be defined alongside their primary implementation or in a dedicated types file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/encryption/kms/mock_client.go` around lines 27 - 31, The KMSClient
interface is currently declared in mock_client.go but NewClient (in client.go)
returns KMSClient; move the KMSClient interface to the primary implementation
file (client.go) or a new shared types file so the interface lives with its main
implementation; update mock_client.go to import/use that moved interface
(ensuring the mock still implements KMSClient) and adjust package-level
visibility if needed so NewClient and mocks reference the same KMSClient symbol
without causing import cycles.
pkg/config/server.go (1)

286-288: Consider adding validation for the Encryption configuration.

The ValidateAndAdjust method initializes Encryption from defaults when nil, but unlike KVClient (lines 282-284) and Debug (lines 293-295), it doesn't call any validation method on the encryption config. If encryption is enabled with invalid KMS settings, the error would surface at runtime rather than during config validation.

 	if c.Encryption == nil {
 		c.Encryption = defaultCfg.Encryption
 	}
+	// Consider adding encryption config validation here when a
+	// ValidateAndAdjust method is added to EncryptionConfig
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/config/server.go` around lines 286 - 288, The ValidateAndAdjust function
currently copies defaultCfg.Encryption into c.Encryption when nil but never
validates it; update ValidateAndAdjust to run the encryption config's validation
routine (e.g., call c.Encryption.Validate or c.Encryption.ValidateAndAdjust,
matching the pattern used for KVClient and Debug) after assigning
defaultCfg.Encryption and whenever c.Encryption is non-nil, and return any
validation error so invalid KMS/encryption settings fail during config
validation rather than at runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/encryption/kms/client.go`:
- Around line 282-299: The awsFactory result (cli) is not nil-checked before
caching, so a custom factory returning (nil, nil) gets stored and later causes
panics on methods like Decrypt/Close; update the factory handling in the
function that calls c.awsFactory so you validate that cli != nil and return a
non-nil error if it is (do not cache nil). Specifically, after calling
c.awsFactory(ctx, cfg) and before acquiring c.awsMu or assigning
c.awsClients[key] = cli, return an initialization error when cli == nil; apply
the same nil-check/hardening to the analogous factory call handled in the other
block referenced (the 317-334 block).
- Around line 172-179: Add a closed state and coordinate with in-flight
decrypts: add a bool (e.g., closed) plus a sync.Mutex/sync.Cond or a
sync.WaitGroup on the client struct and use gcpMu (or a new mu) to guard it;
modify getGCPDecryptor (and any AWS decryptor acquisition paths such as
awsDecryptors access) to check the closed flag under the same lock and
refuse/return an error if closed, and increment an in-flight counter (or
wg.Add(1)) before returning a decryptor; on decrypt completion decrement the
counter (or wg.Done()). In Close(), acquire the lock, set closed=true to stop
new acquisitions, wait for in-flight count to reach zero (or wg.Wait()), then
iterate gcpClients and aws decryptors to call Close() and clear the maps (while
holding the lock or under gcpMu), ensuring no decrypt can race to use a client
being closed.

In `@pkg/encryption/kms/gcp_kms.go`:
- Around line 64-76: Add a concise comment above the normalizeGCPEndpoint
function explaining why the function strips URL schemes: note that we use
cloudkms.NewKeyManagementClient() which uses gRPC and expects endpoints in
host:port form (e.g., cloudkms.googleapis.com:443), so we remove schemes/paths;
also mention that REST HTTP clients would require full HTTPS URLs to avoid
confusion for future readers modifying normalizeGCPEndpoint.

---

Nitpick comments:
In `@pkg/config/encryption.go`:
- Around line 43-56: The AWSKMSConfig struct contains sensitive fields
(SecretAccessKey and SessionToken) that should be masked when serialized or
logged; add a String() method on AWSKMSConfig that returns a safe representation
by redacting or replacing SecretAccessKey and SessionToken with a placeholder
(e.g., "***REDACTED***") while leaving non-sensitive fields like Region,
Endpoint, Profile, and AccessKey intact, and ensure any existing debug or error
paths use this String() method when printing the config.
- Around line 23-27: The MetaRefreshInterval and MetaCacheTTL fields
(TomlDuration types) are declared but unused; update their comments to state
clearly that they are reserved for future use by the encryption metadata manager
and currently have no effect, so operators won't be misled, or remove the fields
entirely if you prefer not to expose unused config; locate the struct containing
MetaRefreshInterval and MetaCacheTTL and either change the comment above each
field to: "Reserved for future use by the encryption metadata manager; currently
not consumed" (keeping the toml/json tags) or delete both fields and any
initialization/defaulting tied to them.
- Around line 58-66: The GCPKMSConfig struct's CredentialsJSON field contains
sensitive service account keys and should be marked for log redaction; locate
the GCPKMSConfig type and add a redaction/logging tag to the CredentialsJSON
field (e.g., include a tag like `log:"sensitive"` or the project's equivalent
redaction tag) so automatic logging/redaction mechanisms will hide its value;
ensure the existing toml/json tags remain intact and only the additional
redaction tag is appended to the CredentialsJSON struct tag.

In `@pkg/config/server.go`:
- Around line 286-288: The ValidateAndAdjust function currently copies
defaultCfg.Encryption into c.Encryption when nil but never validates it; update
ValidateAndAdjust to run the encryption config's validation routine (e.g., call
c.Encryption.Validate or c.Encryption.ValidateAndAdjust, matching the pattern
used for KVClient and Debug) after assigning defaultCfg.Encryption and whenever
c.Encryption is non-nil, and return any validation error so invalid
KMS/encryption settings fail during config validation rather than at runtime.

In `@pkg/encryption/kms/aws_kms.go`:
- Around line 71-77: The error returned by awsconfig.LoadDefaultConfig in the
aws_kms.go initializer is not wrapped; update the error handling in the function
that constructs awsKMSDecryptor so that the returned error is wrapped with
errors.Trace(err) before returning (i.e., replace returning err with returning
errors.Trace(err)), keeping the rest of the logic that creates
&awsKMSDecryptor{client: awskms.NewFromConfig(awsCfg)} unchanged.
- Around line 30-42: The call in awsKMSDecryptor.Decrypt returns raw library
errors (from d.client.Decrypt); update the error handling to wrap the returned
err with errors.Trace(err) before returning so stack traces are attached (i.e.,
replace direct return of err with return nil, errors.Trace(err) in the Decrypt
method after calling d.client.Decrypt). Ensure you import and use the
errors.Trace function and keep existing behavior for successful returns
(out.Plaintext).
- Around line 56-69: The code currently uses aws.EndpointResolverWithOptionsFunc
in the block guarded by cfg.Endpoint (see cfg.Endpoint, normalizeAWSEndpoint)
which is deprecated; replace this custom resolver with the newer
aws.EndpointResolverV2 implementation (or use service-specific BaseEndpoint
override) and register it via awsconfig.WithEndpointResolver to ensure
compatibility: implement an aws.EndpointResolverV2 that returns an aws.Endpoint
with URL and SigningRegion for awskms.ServiceID and use
awsconfig.WithEndpointResolver to append that resolver where opts is built
(replace aws.EndpointResolverWithOptionsFunc and
awsconfig.WithEndpointResolverWithOptions usage).

In `@pkg/encryption/kms/gcp_kms.go`:
- Around line 30-39: The Decrypt implementation in gcpKMSDecryptor returns
third-party errors directly from d.client.Decrypt; per guidelines wrap those
errors to attach a stack trace. Modify gcpKMSDecryptor.Decrypt to wrap the
returned err (from d.client.Decrypt) using errors.Trace(err) or
errors.WrapError(...) before returning (while still returning nil for the
plaintext on error), so callers get a wrapped error with stack info.
- Around line 45-62: The call to cloudkms.NewKeyManagementClient in
newGCPDecryptor returns an unwrapped error; update the error handling to wrap
the returned err using errors.Trace(err) before returning so stack traces are
preserved (i.e., replace the direct return of err after NewKeyManagementClient
with a wrapped error via errors.Trace(err) and keep returning the
gcpKMSDecryptor on success).

In `@pkg/encryption/kms/mock_client.go`:
- Around line 27-31: The KMSClient interface is currently declared in
mock_client.go but NewClient (in client.go) returns KMSClient; move the
KMSClient interface to the primary implementation file (client.go) or a new
shared types file so the interface lives with its main implementation; update
mock_client.go to import/use that moved interface (ensuring the mock still
implements KMSClient) and adjust package-level visibility if needed so NewClient
and mocks reference the same KMSClient symbol without causing import cycles.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ffd3fd74-88c8-4187-b1ab-4c3927c53801

📥 Commits

Reviewing files that changed from the base of the PR and between 7500c69 and 8d45672.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (10)
  • go.mod
  • pkg/config/encryption.go
  • pkg/config/server.go
  • pkg/config/server_config_test.go
  • pkg/encryption/json_types.go
  • pkg/encryption/kms/aws_kms.go
  • pkg/encryption/kms/client.go
  • pkg/encryption/kms/client_test.go
  • pkg/encryption/kms/gcp_kms.go
  • pkg/encryption/kms/mock_client.go

Comment on lines +172 to +179
func (c *client) Close() {
c.gcpMu.Lock()
defer c.gcpMu.Unlock()

for _, cli := range c.gcpClients {
_ = cli.Close()
}
clear(c.gcpClients)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Close() can race with decrypts and still leaves the client reusable.

This only clears cached GCP clients. Cached AWS decryptors remain reachable after shutdown, getGCPDecryptor can recreate new clients immediately afterward, and a concurrent GCP decrypt can grab a cached client right before it is closed here. Please add a closed state and coordinate Close() with in-flight decrypts.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/encryption/kms/client.go` around lines 172 - 179, Add a closed state and
coordinate with in-flight decrypts: add a bool (e.g., closed) plus a
sync.Mutex/sync.Cond or a sync.WaitGroup on the client struct and use gcpMu (or
a new mu) to guard it; modify getGCPDecryptor (and any AWS decryptor acquisition
paths such as awsDecryptors access) to check the closed flag under the same lock
and refuse/return an error if closed, and increment an in-flight counter (or
wg.Add(1)) before returning a decryptor; on decrypt completion decrement the
counter (or wg.Done()). In Close(), acquire the lock, set closed=true to stop
new acquisitions, wait for in-flight count to reach zero (or wg.Wait()), then
iterate gcpClients and aws decryptors to call Close() and clear the maps (while
holding the lock or under gcpMu), ensuring no decrypt can race to use a client
being closed.

Comment on lines +282 to +299
cli, err := c.awsFactory(ctx, cfg)
if err != nil {
log.Warn("failed to initialize aws kms decryptor",
zap.String("region", cfg.Region),
zap.String("endpoint", cfg.Endpoint),
zap.Bool("hasProfile", cfg.Profile != ""),
zap.Bool("hasStaticCredentials", cfg.AccessKey != ""),
zap.Error(err))
return nil, err
}

c.awsMu.Lock()
defer c.awsMu.Unlock()
if existing, ok := c.awsClients[key]; ok {
return existing, nil
}
c.awsClients[key] = cli
return cli, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against nil decryptors from custom factories.

A faulty injected factory can return (nil, nil) here; that value gets cached and later panics on Decrypt or Close. Failing fast turns that into a regular initialization error.

🛡️ Proposed hardening
 	cli, err := c.awsFactory(ctx, cfg)
 	if err != nil {
 		log.Warn("failed to initialize aws kms decryptor",
 			zap.String("region", cfg.Region),
 			zap.String("endpoint", cfg.Endpoint),
 			zap.Bool("hasProfile", cfg.Profile != ""),
 			zap.Bool("hasStaticCredentials", cfg.AccessKey != ""),
 			zap.Error(err))
 		return nil, err
 	}
+	if cli == nil {
+		return nil, cerrors.ErrEncryptionFailed.GenWithStackByArgs("aws kms decryptor factory returned nil")
+	}
@@
 	cli, err := c.gcpFactory(ctx, cfg)
 	if err != nil {
 		log.Warn("failed to initialize gcp kms decryptor",
 			zap.String("endpoint", cfg.Endpoint),
 			zap.Bool("hasCredentialsFile", cfg.CredentialsFile != ""),
 			zap.Bool("hasCredentialsJSON", cfg.CredentialsJSON != ""),
 			zap.Error(err))
 		return nil, err
 	}
+	if cli == nil {
+		return nil, cerrors.ErrEncryptionFailed.GenWithStackByArgs("gcp kms decryptor factory returned nil")
+	}

Also applies to: 317-334

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/encryption/kms/client.go` around lines 282 - 299, The awsFactory result
(cli) is not nil-checked before caching, so a custom factory returning (nil,
nil) gets stored and later causes panics on methods like Decrypt/Close; update
the factory handling in the function that calls c.awsFactory so you validate
that cli != nil and return a non-nil error if it is (do not cache nil).
Specifically, after calling c.awsFactory(ctx, cfg) and before acquiring c.awsMu
or assigning c.awsClients[key] = cli, return an initialization error when cli ==
nil; apply the same nil-check/hardening to the analogous factory call handled in
the other block referenced (the 317-334 block).

Comment on lines +64 to +76
func normalizeGCPEndpoint(endpoint string) string {
endpoint = strings.TrimSpace(endpoint)
if endpoint == "" {
return ""
}
if strings.Contains(endpoint, "://") {
parsed, err := url.Parse(endpoint)
if err == nil && parsed.Host != "" {
return parsed.Host
}
}
return endpoint
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

What format does Google Cloud Go SDK option.WithEndpoint expect for the endpoint parameter?

💡 Result:

option.WithEndpoint expects a service base endpoint string, and the expected format depends on the transport:

  • gRPC clients: use host:port (no scheme). Example: us-central1-speech.googleapis.com:443. [1]
  • REST/HTTP clients: use a full HTTPS URL (scheme included). Example: https://speech.googleapis.com (port typically omitted). [1]

By default, Google APIs generally accept HTTPS endpoints. [1]

Sources: [1]


🏁 Script executed:

# First, find where normalizeGCPEndpoint is called
rg "normalizeGCPEndpoint" pkg/encryption/kms/

# Also check the full GCP KMS file to understand the context
wc -l pkg/encryption/kms/gcp_kms.go

Repository: pingcap/ticdc

Length of output: 280


🏁 Script executed:

# Check how the endpoint is used with GCP client
rg "WithEndpoint" pkg/encryption/kms/gcp_kms.go -A 2 -B 2

Repository: pingcap/ticdc

Length of output: 227


🏁 Script executed:

# Compare with AWS endpoint normalization for context
rg "normalizeAWSEndpoint" pkg/encryption/kms/ -A 5 -B 2

Repository: pingcap/ticdc

Length of output: 1030


🏁 Script executed:

# Get the broader context of the GCP KMS setup function
sed -n '40,80p' pkg/encryption/kms/gcp_kms.go

Repository: pingcap/ticdc

Length of output: 1015


🏁 Script executed:

# Check the import statements to see which GCP client library is used
head -40 pkg/encryption/kms/gcp_kms.go

Repository: pingcap/ticdc

Length of output: 1099


🌐 Web query:

Google Cloud Go KMS apiv1 KeyManagementClient gRPC REST transport

💡 Result:

In cloud.google.com/go/kms/apiv1, you choose the transport by which constructor you call:

  • gRPC transport (default): kms.NewKeyManagementClient(ctx, opts...) — “creates a new … client based on gRPC.” [1]
  • REST / HTTP+JSON transport: kms.NewKeyManagementRESTClient(ctx, opts...) — “creates a new … rest client.” [2]

Both constructors accept standard Google client options (...option.ClientOption), including endpoint override via option.WithEndpoint(...) (documented for all Cloud Go clients). [3]

Example:

import (
  "context"
  kms "cloud.google.com/go/kms/apiv1"
  "google.golang.org/api/option"
)

func grpcClient(ctx context.Context) (*kms.KeyManagementClient, error) {
  return kms.NewKeyManagementClient(ctx, option.WithEndpoint("cloudkms.googleapis.com:443"))
}

func restClient(ctx context.Context) (*kms.KeyManagementClient, error) {
  return kms.NewKeyManagementRESTClient(ctx, option.WithEndpoint("https://cloudkms.googleapis.com"))
}

Sources: gRPC constructor docs [1], REST constructor docs [2], Cloud Go “Client Options / Endpoint Override” docs [3].

[1] (docs.cloud.google.com)
[2] (docs.cloud.google.com)
[3] (cloud.google.com)

Citations:


Add a comment explaining the scheme stripping for gRPC transport.

The normalizeGCPEndpoint function correctly strips the scheme because the code uses cloudkms.NewKeyManagementClient(), which is the gRPC transport and expects endpoints in host:port format (e.g., cloudkms.googleapis.com:443). However, this design choice should be documented in a comment to prevent confusion, especially since REST clients would require the full HTTPS URL.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/encryption/kms/gcp_kms.go` around lines 64 - 76, Add a concise comment
above the normalizeGCPEndpoint function explaining why the function strips URL
schemes: note that we use cloudkms.NewKeyManagementClient() which uses gRPC and
expects endpoints in host:port form (e.g., cloudkms.googleapis.com:443), so we
remove schemes/paths; also mention that REST HTTP clients would require full
HTTPS URLs to avoid confusion for future readers modifying normalizeGCPEndpoint.

Signed-off-by: tenfyzhong <tenfy@tenfy.cn>
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 8, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 8, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-01 08:31:49.434038853 +0000 UTC m=+340314.639398910: ☑️ agreed by asddongmen.
  • 2026-04-08 15:39:32.235658942 +0000 UTC m=+970777.441018999: ☑️ agreed by lidezhu.

@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/tide

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Apr 9, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asddongmen, flowbehappy, lidezhu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Apr 9, 2026
@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/retest

@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/test pull-cdc-mysql-integration-light

@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/retest-required

@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/retest-required

1 similar comment
@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/retest-required

@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/retest

3 similar comments
@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/retest

@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/retest

@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/retest

@tenfyzhong
Copy link
Copy Markdown
Collaborator Author

/retest-required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants