-
Notifications
You must be signed in to change notification settings - Fork 55
refactor: clean up ProcessInternalUpdate to use standard callback system #521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
allenss-amazon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks OK to me.
We will want to get this reviewed by the original author of the metadata handling system (Jacob @ GCP) when it's all done.
7d64a1b to
2e0c761
Compare
src/commands/ft_internal_update.cc
Outdated
| if (argc != kFTInternalUpdateArgCount) { | ||
| return absl::InvalidArgumentError( | ||
| "ERR wrong number of arguments for FT_INTERNAL_UPDATE"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, if you've registered the command with 4 args, this can't happen. Maybe a CHECK is the right thing here?
src/commands/ft_internal_update.cc
Outdated
| const char *header_data = ValkeyModule_StringPtrLen(argv[3], &header_len); | ||
| coordinator::GlobalMetadataVersionHeader version_header; | ||
| if (!version_header.ParseFromArray(header_data, header_len)) { | ||
| return HandleParseFailure(ctx, "GlobalMetadataVersionHeader", header_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
8343573 to
0cf9994
Compare
src/coordinator/metadata_manager.cc
Outdated
| metadata.mutable_version_header()->set_top_level_min_version( | ||
| top_level_min_version); | ||
|
|
||
| metadata_ = metadata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this redundant with line 213?
src/coordinator/metadata_manager.cc
Outdated
| } | ||
|
|
||
| std::vector<vmsdk::cluster_map::NodeInfo> MetadataManager::GetPrimaryNodes( | ||
| ValkeyModuleCtx *ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx isn't used, right? I think it should be removed from the signature. Also, I'm still unclear on why you need to be doing a refresh of the cluster map here. If this is to work around an initialization problem, let's solve that in initialization.
murphyjacob4
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall makes a lot of sense. Converging these all into an internal command and then serializing that to the replication stream solves most of the problems. Mostly want to double check that we are properly guarding against BroadcastMetadata on replicas
We may want to start out with broadcasting to all nodes, and only accept the broadcast if we are a primary. This way we can guard against propagation delays.
It is okay to accept a metadata broadcast iff you are a primary locally. The remote view of whether that node is a primary won't matter (it could also be out of sync). If we reconcile metadata locally while being demoted, this is okay, but it may need to full sync to our new primary (our offset will be ahead of the new primary, which we can't recover from)
src/coordinator/metadata_manager.cc
Outdated
|
|
||
| // Call FT.INTERNAL_UPDATE for coordinator to ensure unified AOF replication | ||
| CallFTInternalUpdate(new_entry, metadata.version_header(), encoded_id, | ||
| "CreateEntry"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, maybe we can also rename the function? I know right now it is just create, but if we later support alter, it would make sense to just support upsert.
| "CreateEntry"); | |
| "UpsertEntry"); |
src/coordinator/metadata_manager.cc
Outdated
| kMetadataBroadcastClusterMessageReceiverId, | ||
| payload.c_str(), payload.size()); | ||
|
|
||
| auto primary_nodes = GetPrimaryNodes(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the local node's view is stale, I guess we could send it to a primary that is now demoted. I think we need to double check our primaryship when we receive a metadata message as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| (*mutable_entries)[id].set_fingerprint(fingerprint); | ||
| (*mutable_entries)[id].set_encoding_version( | ||
| rt_it->second.encoding_version); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check we are a primary when we get broadcasted metadata? As a replica, we should never even request the metadata on a conflict since we are going to wait for our primary
src/coordinator/metadata_manager.cc
Outdated
| if (!vmsdk::IsMainThread()) { | ||
| thread_safe_ctx = | ||
| vmsdk::MakeUniqueValkeyDetachedThreadSafeContext(detached_ctx_.get()); | ||
| ValkeyModule_ThreadSafeContextLock(thread_safe_ctx.get()); | ||
| safe_context = thread_safe_ctx.get(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC we made it a constraint that metadata would only get processed by the main thread for simplicity. My mental model may be slightly out of date though. Is that no longer true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right resolved it
|
Also - have we thought through upgrade/downgrade? We will enqueue these metadata changes into the replication buffer which may cause problems on downlevel versions |
634a341 to
e0e3ad5
Compare
86e108e to
65ce359
Compare
Created ft_internal_update with admin flag. This command will be used to replicate create/drop with global version and index metadata version and fingerprint. Customers will not be allowed to call it. gRPC broadcast only happen to primaries. Replicas will only receive information via replication stream (ft_internal_update). For CMD cluster no change happen we still replicate create/drop as is. Signed-off-by: Elias Tamraz <[email protected]>
65ce359 to
856b5c6
Compare
src/commands/ft_internal_update.cc
Outdated
|
|
||
| // Helper function to handle parse failures with poison pill recovery | ||
| absl::Status HandleInternalUpdateFailure( | ||
| ValkeyModuleCtx *ctx, bool success, const std::string &operation_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is success and status redundant? It seems like we set success == status.ok. Maybe we just use status?
src/coordinator/metadata_manager.cc
Outdated
| VMSDK_LOG(WARNING, ctx) | ||
| << "Failed during CreateEntryOnReplica callback for type " << type_name | ||
| << ", id " << id << " from " << "CreateEntryOnReplica"; | ||
| Metrics::GetStats().process_internal_update_callback_failures_cnt++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does incrementing the stat/logging really matter if we just crash next line?
| "during loading"; | ||
|
|
||
| auto obj_name = ObjName::Decode(id); | ||
| auto callback_result = TriggerCallbacks(type_name, obj_name, *metadata_entry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just return this status if it isn't OK? The only other exit path is returning status OK. We can handle the error in the calling function
src/coordinator/metadata_manager.cc
Outdated
| entry.SerializeToString(&metadata_binary); | ||
| header.SerializeToString(&header_binary); | ||
|
|
||
| ValkeyModuleCallReply *reply = ValkeyModule_Call( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is only executed on primaries, right? Via FT.CREATE and FT.DROP. And FT._INTERNAL_UPDATE just replicates on primaries with no effect, right?
Should we replace this call with a simple call to VM_Replicate?
| return absl::OkStatus(); | ||
| } | ||
|
|
||
| absl::Status MetadataManager::CallFTInternalUpdateForReconciliation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this code is very similar to the code below. Can we unify to one function? It seems like reconciliation is basically identical to CallFTInternalUpdate(proposed_entry, metadata_.Get().version_header()). Am I missing something?
cb114c3 to
b726d63
Compare
- Remove redundant success parameter from HandleInternalUpdateFailure - Remove unnecessary logging before crashes - Return error status instead of crashing - Replace ValkeyModule_Call with ValkeyModule_Replicate - Rename CallFTInternalUpdate to ReplicateFTInternalUpdate Signed-off-by: Elias Tamraz <[email protected]>
b726d63 to
343960e
Compare
@allenss-amazon This is a draft PR, I will raise a revision with unit test and integration tests as well
changes:
Failure modes behaviour
Once the node join back the cluster(or performed failover), it will get the index via gRPC (then it sends it to its replicas via replication stream)
Replica will get the index from its primary later when it joins the cluster.
Added a config to allow skipping that command to avoid a poison pill situation as an action that the operator could do.
Testing
Note: if you want to test the code you need to get this fix as well to fix replica crash during drop index
Fix replica crash by using legacy ACL path during replication #510
CME
echo "=== CME (Cluster Mode) Test ===" && echo "" && echo "Cluster Structure:" && /tmp/valkey/src/valkey-cli -c -p 7000 CLUSTER NODESecho "Initial Index Status:" && echo "Node 7000 (primary):" && /tmp/valkey/src/valkey-cli -c -p 7000 FT._LIST && echo "Node 7001 (primary):" && /tmp/valkey/src/valkey-cli -c -p 7001 FT._LIST && echo "Node 7002 (replica):" && /tmp/valkey/src/valkey-cli -c -p 7002 FT._LIST && echo "Node 7003 (replica):" && /tmp/valkey/src/valkey-cli -c -p 7003 FT._LIST/tmp/valkey/src/valkey-cli -c -p 7000 FT.CREATE cluster_demo SCHEMA field VECTOR HNSW 6 TYPE FLOAT32 DIM 128 DISTANCE_METRIC L2OK
/tmp/valkey/src/valkey-cli -c -p 7001 FT.DROPINDEX cluster_demo && echo "After dropping index:" && echo "Node 7000 (primary):" && /tmp/valkey/src/valkey-cli -c -p 7000 FT._LIST && echo "Node 7001 (primary):" && /tmp/valkey/src/valkey-cli -c -p 7001 FT._LIST && echo "Node 7002 (replica):" && /tmp/valkey/src/valkey-cli -c -p 7002 FT._LIST && echo "Node 7003 (replica):" && /tmp/valkey/src/valkey-cli -c -p 7003 FT._LIST7.AOF files for reference:
note
Node 7000 AOF:
Node 7001 AOF:
CMD test
echo "=== CMD (Primary-Replica) Test ===" && echo "Creating cluster_demo on primary:" && /tmp/valkey/src/valkey-cli -p 8888 FT.CREATE cluster_demo SCHEMA field VECTOR HNSW 6 TYPE FLOAT32 DIM 128 DISTANCE_METRIC L2 && echo "Listing indexes:" && echo "Primary (8888):" && /tmp/valkey/src/valkey-cli -p 8888 FT._LIST && echo "Replica (8889):" && /tmp/valkey/src/valkey-cli -p 8889 FT._LIST/tmp/valkey/src/valkey-cli -p 8888 FT.DROPINDEX cluster_demo && echo "Primary (8888):" && /tmp/valkey/src/valkey-cli -p 8888 FT._LIST && echo "Replica (8889):" && /tmp/valkey/src/valkey-cli -p 8889 FT._LISTAOF file: (same bevahiour)