Skip to content

Active-active domain support - Part 1/N #6799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 16, 2025

Conversation

taylanisikdemir
Copy link
Member

@taylanisikdemir taylanisikdemir commented Apr 10, 2025

What changed?
The prototype implementation in #6724 seems to work so I will be breaking it apart and sending smaller PRs.

Changes:

  • Added design document
  • Created active cluster manager component with a fake implementation
  • Added region field to cluster group config (example configs will be added in follow up PRs)

Misc change:

  • Cluster metadata constructor signature simplification

RegionInformation struct {
// InitialFailoverVersion is the identifier of each region.
// It is used for active-active domains to determine the region of workflows which don't have an external entity mapping. (origin stickyness)
InitialFailoverVersion int64 `yaml:"initialFailoverVersion"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is its relationship to the initialFailoverVersion in ClusterInformation? If they're completely unrelated, can we use a different name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check an example config in docs/design/active-active/active-active.md. This semantically means the same thing but per region.

Notice that regions also have a failover version now. This will be used to determine the active cluster of a workflow based on following lookups:
- Workflow maps to an entity (or directly to a region). This is static and cannot change over time.
- Entity maps to a region. This is dynamic and can change over time.
- Region maps to a cluster. This is static and cannot change over time. Note that there can be more than one cluster in a region but an active-active domain can only have one active cluster per region.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this constraint to the list? This should be more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this information stored?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already there. There can be only one active cluster in a region for an active-active domain.
ActiveClusters will be a new field of domain record in DB.
Cluster/region information is available in static config yml


Workflow start request determines which cluster selection strategy to be used.

| Has active-region.lookup-key | Has active-region.origin | Strategy |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the 3rd case is type 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mixed up the conditions. Fixed it now.
Type 1 -> No lookup key specified cases
Type 2 -> Lookup key is specified


| EntityType | EntityKey | Region | Failover Version | LastUpdated |
|------------|-----------|--------|------------------|-------------|
| user-location | seattle | us-west | 1 | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When and how are these records inserted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's briefly explained in the bullet points above. For each such 3rd party entity type there needs to be a watcher implemented which populates this table. e.g. user-location entity type would be managed by a new UserLocationWatcher which runs in primary cluster as a global singleton

@taylanisikdemir taylanisikdemir enabled auto-merge (squash) April 16, 2025 00:45
@taylanisikdemir taylanisikdemir merged commit 9242115 into cadence-workflow:master Apr 16, 2025
23 checks passed
@taylanisikdemir taylanisikdemir deleted the taylan/aa_1 branch April 16, 2025 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants