Skip to content

feat: eager "pre-resolution" of shard homes#32881

Draft
leviramsey wants to merge 7 commits intoakka:mainfrom
leviramsey:sharding-eager-discover
Draft

feat: eager "pre-resolution" of shard homes#32881
leviramsey wants to merge 7 commits intoakka:mainfrom
leviramsey:sharding-eager-discover

Conversation

@leviramsey
Copy link
Contributor

During a rolling restart, shard regions are typically stopped, culminating in (ideally) the shard coordinator stopping and handing off to the next oldest node, as new shard regions start. While the new shard regions do get the homes of the existing shards en masse from the coordinator at registration, shards which get stopped later only get recreated on demand (assuming remember entities isn't in use, which is not always advisable).

Assuming that the demand to recreate shards arises organically, the process of waiting for the shard coordinator to successfully write to ddata introduces some extra latency into the hot path to service that demand.

Applications can of course synthesize demand, but in the typical case (where the entity to shard mapping is based on hash code of entity ID) finding an entity ID for each shard adds a bit of complexity, especially if the goal is for the entity associated with that ID to not require an attempt to rehydrate from persistence.

This change instead allows cluster nodes to signal their local shard region that a shard home should be located (and if necessary allocated) without sending a message to any entity.

Copy link
Contributor

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice feature.

* otherwise it will request the home of the shard, which may result in the shard being allocated on
* some node in the cluster. No message will be sent to any entity within the shard.
*/
def preResolveShard(typeName: String, shard: ShardRegion.ShardId): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the classic api we should follow the existing convention to have this as public message that is sent to the shardRegion instead of via this additional method. See for example GracefulShutdown

* and cache it. This may result in the shard being allocated on some node in the cluster. No message will
* be sent to any entity within the shard.
*/
def preResolveShard[M, E](entity: Entity[M, E], shard: String): Unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should follow the existing convention here too, that it's actor messages. We have that for the ClusterShardingQuery and ShardCommand/Passivate. I think we can expand the ShardCommand with this PreResolveShard message.

* Direct the [[ShardRegion]] actor responsible for the named entity type to resolve the location
* of the given shard and cache it. If the `ShardRegion` already knows the location, it will not do anything,
* otherwise it will request the home of the shard, which may result in the shard being allocated on
* some node in the cluster. No message will be sent to any entity within the shard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it even be possible to automate this (if feature is enabled in config)? For remember entities we have already unallocatedShards in the Coordinator State. Could we make use of that and automatically pre-allocate shards from the coordinator? Once a shard has been in use, it will always be allocated again as soon as possible (best effort).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants