feat: eager "pre-resolution" of shard homes#32881
feat: eager "pre-resolution" of shard homes#32881leviramsey wants to merge 7 commits intoakka:mainfrom
Conversation
patriknw
left a comment
There was a problem hiding this comment.
Looks like a nice feature.
| * otherwise it will request the home of the shard, which may result in the shard being allocated on | ||
| * some node in the cluster. No message will be sent to any entity within the shard. | ||
| */ | ||
| def preResolveShard(typeName: String, shard: ShardRegion.ShardId): Unit = { |
There was a problem hiding this comment.
For the classic api we should follow the existing convention to have this as public message that is sent to the shardRegion instead of via this additional method. See for example GracefulShutdown
| * and cache it. This may result in the shard being allocated on some node in the cluster. No message will | ||
| * be sent to any entity within the shard. | ||
| */ | ||
| def preResolveShard[M, E](entity: Entity[M, E], shard: String): Unit |
There was a problem hiding this comment.
Perhaps we should follow the existing convention here too, that it's actor messages. We have that for the ClusterShardingQuery and ShardCommand/Passivate. I think we can expand the ShardCommand with this PreResolveShard message.
| * Direct the [[ShardRegion]] actor responsible for the named entity type to resolve the location | ||
| * of the given shard and cache it. If the `ShardRegion` already knows the location, it will not do anything, | ||
| * otherwise it will request the home of the shard, which may result in the shard being allocated on | ||
| * some node in the cluster. No message will be sent to any entity within the shard. |
There was a problem hiding this comment.
Would it even be possible to automate this (if feature is enabled in config)? For remember entities we have already unallocatedShards in the Coordinator State. Could we make use of that and automatically pre-allocate shards from the coordinator? Once a shard has been in use, it will always be allocated again as soon as possible (best effort).
During a rolling restart, shard regions are typically stopped, culminating in (ideally) the shard coordinator stopping and handing off to the next oldest node, as new shard regions start. While the new shard regions do get the homes of the existing shards en masse from the coordinator at registration, shards which get stopped later only get recreated on demand (assuming remember entities isn't in use, which is not always advisable).
Assuming that the demand to recreate shards arises organically, the process of waiting for the shard coordinator to successfully write to ddata introduces some extra latency into the hot path to service that demand.
Applications can of course synthesize demand, but in the typical case (where the entity to shard mapping is based on hash code of entity ID) finding an entity ID for each shard adds a bit of complexity, especially if the goal is for the entity associated with that ID to not require an attempt to rehydrate from persistence.
This change instead allows cluster nodes to signal their local shard region that a shard home should be located (and if necessary allocated) without sending a message to any entity.