Skip to content

Conversation

@ishikaa-p
Copy link
Contributor

@ishikaa-p ishikaa-p commented Dec 2, 2025

Description

Problem:
With deployment_strategy: rolling in multi-cluster Kafka Connect setups, each host attempted to deploy connectors to all clusters, causing failures when accessing files on remote cluster nodes (e.g., [Errno 2] No such file or directory).

Fixes # (issue)

Added filtering logic so each host only deploys connectors for its own cluster during rolling/serial deployments.
Fixed the subgroups difference filter to use a list instead of a string.

Changes:

  • moved remove temporary removal of certs task to kafka connect playbook
  • Filter subgroups for rolling deployment (lines 82-85): When deployment_strategy is rolling or serial, filter subgroups to only include the current host's cluster group (parent_kafka_connect_cluster_group).
  • Updated multi-cluster connector deployment (line 97): Use filtered_subgroups instead of subgroups, with a fallback to subgroups if filtered_subgroups is not defined.
  • Fixed subgroups difference filter (line 4): Changed from string to list format for proper filtering.

Behavior:

  1. Multi-cluster setup (with child groups):

Rolling/Serial: Each host deploys connectors only for its own cluster

  • kafka-connect-0 (cluster1) → deploys on cluster1 only
  • kafka-connect-1 (cluster1) → deploys on cluster1 only
  • kafka-connect-2 (cluster2) → deploys on cluster2 only
  • kafka-connect-3 (cluster2) → deploys on cluster2 only

Parallel deployment:
All hosts run simultaneously, but each host only processes connectors for its own cluster
With run_once: true, connectors are deployed once per cluster (not per host)

  • kafka-connect-0 (cluster1) → only cluster1 connectors are processed, deployed once
  • kafka-connect-0 => kafka-connect-2 (cluster2) → only cluster2 connectors are processed, deployed once
  1. Single cluster setup (no child groups):
    Works as before; connectors deploy via the single-cluster task

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Checklist:

  • Any variable/code changes have been validated to be backwards compatible (doesn't break upgrade)
  • I have added tests that prove my fix is effective or that my feature works
  • If required, I have ensured the changes can be discovered by cp-ansible discovery codebase
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

@ishikaa-p ishikaa-p requested a review from a team as a code owner December 2, 2025 11:37
@ishikaa-p
Copy link
Contributor Author

ishikaa-p commented Dec 2, 2025

@rrbadiani
Copy link
Member

Parallel: All hosts deploy connectors for all clusters (unchanged)

This doesn't seem correct

@ishikaa-p
Copy link
Contributor Author

ishikaa-p commented Dec 2, 2025

Parallel: All hosts deploy connectors for all clusters (unchanged)

This doesn't seem correct

I've updated the description

- "{{kafka_connect_cert_path}}"
- "{{kafka_connect_key_path}}"
when: (ssl_provided_keystore_and_truststore | bool)
tags: deploy_connectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this tag included here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a helpful feature for customer to be able to skip deploying connectors during their upgrads if they want to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work ?
when we give --tags deploy_connectors does it go inside kafka connect roles tasks main.yml ?

hosts: kafka_connect
gather_facts: false
tags: kafka_connect
environment: "{{ proxy_env }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we give this proxy_env here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

customer can set proxy env vars, hence added it here as well

- name: Filter subgroups for rolling deployment strategy
set_fact:
filtered_subgroups: "{{ [parent_kafka_connect_cluster_group] if ((kafka_connect_deployment_strategy == 'rolling' or kafka_connect_deployment_strategy == 'serial') and parent_kafka_connect_cluster_group is defined and parent_kafka_connect_cluster_group in subgroups) else subgroups }}"
when: subgroups is defined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for subgroups to be not defined ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, In case of parallel run and no children structure, there are no subgroups

when: hostvars[groups[item][0]].kafka_connect_connectors is defined
delegate_to: "{{ groups[item][0] }}"
loop: "{{subgroups}}"
loop: "{{ filtered_subgroups | default(subgroups) }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for filtered_subgroups to be not defined ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants