Create a custom thrift protocol to ensure client/server compatibility #5398

DomGarguilo · 2025-03-11T20:06:46Z

Addresses #5390

DomGarguilo · 2025-03-11T20:08:26Z

Created this in draft since, for now, it only addresses the task outlined in the ticket:

create a custom protocol with the versioning information stored in a header and checks on the server side to validate it

I'm not sure if any of the other proposed tasks should be added here or as follow-on PRs.

keith-turner · 2025-03-11T23:29:52Z

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java

+  public static class AccumuloProtocol extends TCompactProtocol {
+
+    private static final int MAGIC_NUMBER = 0x41434355; // "ACCU" in ASCII
+    private static final byte PROTOCOL_VERSION = 1;


Should probably add a comment to AccumuloDataVersion.CURRENT_VERSION that references this and mentions that when changing CURRENT_VERSION may also need to change this.

I was wondering about that... whether the protocol version should be independent, so we can version the serialization format of the protocol header.

We will probably want to include the AccumuloDataVersion.CURRENT_VERSION in the protocol header directly, though. We could also include the actual Accumulo release version in here, too (major.minor.patch; e.g. 4.0.0). I think to verify, we'd probably want to ensure the same major and same minor version, if we want to be even more restrictive than the data version. We really shouldn't have Accumulo servers or clients/servers talking across major.minor versions (different patch releases are okay, and is important for rolling upgrades).

I left it separate for now just in case we wanted to have multiple protocol versions able to talk to each other but am not sure about that.

The data version and protocol version definitely seem related but I'm not sure how closely tied we want to have those versions.

I think the protocol version can be independently numbered, not connected to the data version. I think it's sufficient to check the Accumulo version (Constants.VERSION) major.minor. We don't need to use the data version in here at all. The protocol version is the version of the RPC protocol... the data version is the version of the data persisted in ZK and HDFS... and the Accumulo version is the overall version of the software. Only the RPC protocol version and the Accumulo version are relevant here.

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java

ctubbsii

I'm not sure if we actually need to distinguish between client and server for the spans.

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java

DomGarguilo · 2025-03-17T20:57:30Z

Marking this as ready for review. I tried to address the following two suggestions from the ticket:

Primarily, create a custom protocol with the versioning information stored in a header and checks on the server side to validate it, then

Secondarily, simplify the thrift APIs by adding tracing info to the protocol header

There are a lot of changed files in this PR since I had to update the thrift code. There are also a lot of places where the only change is the removal of the TInfo object as the first param in a method call which was a trivial change. Hopefully that helps skip/skim over portions of the diff while reviewing.

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java

core/src/main/java/org/apache/accumulo/core/rpc/ThriftUtil.java

DomGarguilo · 2025-03-28T18:43:55Z

@kevinrr888 your suggestions should all be addressed as of 29c1234

in that commit I...

refactored the protocol version compatibility check method to throw an exception directly instead of return a boolean. I also updated the wording on that message to make things more clear
addressed the IDE warnings regarding nullness in various places
added javadoc to the static client and server factory methods in ThriftUtil to make it clearer which should be used

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java

keith-turner · 2025-04-04T14:28:37Z

@DomGarguilo did not realize merging the ClientTabletCache changes would conflict w/ this. Makes sense though w/ all the TInfo changes. Would probably be good to get this merged in, as it will continue to have conflicts, the testing that was done sounds really good.

DomGarguilo · 2025-04-04T16:33:02Z

@DomGarguilo did not realize merging the ClientTabletCache changes would conflict w/ this. Makes sense though w/ all the TInfo changes. Would probably be good to get this merged in, as it will continue to have conflicts, the testing that was done sounds really good.

I'm fine with merging it in or leaving it open. The merge conflicts are pretty easy to resolve here since its just a param change in most files. Also, @ctubbsii left a comment about wanting to review soon.

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java

core/src/main/java/org/apache/accumulo/core/trace/TraceUtil.java

ctubbsii · 2025-04-04T21:15:29Z

core/src/main/java/org/apache/accumulo/core/trace/TraceUtil.java

+      Span span = Span.current(); // should be set by protocol
+      try {


So, I think that what the old code was doing when it called startServerRpcSpan was that it was always creating a new "inner span" inside the outer span that that the user might have created in the client. That was useful because it helped us distinguish the a child or inner span that was just associated with the RPC call, rather than be grouped with all the other things the client code might have been doing around the RPC call.

This code, however, seems to just grab the current span the user created, and attach an exception to it if an RPC exception occurred. I think we should do the other thing. If there's an exception, it should be attached to the inner/child span for the RPC, not the outer span the user created.

Okay I think I fixed this in 8d742ea. Please take a look.

I think I may have misunderstood what you were doing before. I think you might have had it correct before. I didn't realize you were creating the server span in the protocol, prior to this invocation handler being called. The expectation, I think, was that the invocation handler just needed to get a reference to the span started by the protocol, for the purposes of attaching any exceptions to it. I think I will need to look at this more closely, but I am now thinking I misled you by my earlier comment, and you were doing it correctly before.

ctubbsii · 2025-04-08T19:50:50Z

One additional thought I had was that clients should send the instanceId, and the server should verify it, so that we don't get connections from clients for a different instance.

…olean const Co-authored-by: Christopher Tubbs <[email protected]>

Co-authored-by: Christopher Tubbs <[email protected]>

DomGarguilo · 2025-04-11T19:05:16Z

One additional thought I had was that clients should send the instanceId, and the server should verify it, so that we don't get connections from clients for a different instance.

I was trying to work this in but am not sure the best place to pull in the instanceId from. I tried to pass it in as a parameter but had to reach pretty high up the chain which changed a lot of code. Was hoping you had a better idea for where I can add this in from.

core/src/test/java/org/apache/accumulo/core/rpc/AccumuloProtocolTest.java

ctubbsii · 2025-04-16T00:56:04Z

One additional thought I had was that clients should send the instanceId, and the server should verify it, so that we don't get connections from clients for a different instance.

I was trying to work this in but am not sure the best place to pull in the instanceId from. I tried to pass it in as a parameter but had to reach pretty high up the chain which changed a lot of code. Was hoping you had a better idea for where I can add this in from.

The ServerContext and/or ClientContext should be able to provide the InstanceId, and the protocol factory could be constructed with it. I'm not sure how much code that touches. I can try to take a look if you hit a wall here.

core/src/main/java/org/apache/accumulo/core/trace/TraceUtil.java

…olTest.java Co-authored-by: Christopher Tubbs <[email protected]>

Co-authored-by: Christopher Tubbs <[email protected]

DomGarguilo · 2025-04-16T19:48:53Z

One additional thought I had was that clients should send the instanceId, and the server should verify it, so that we don't get connections from clients for a different instance.

I was trying to work this in but am not sure the best place to pull in the instanceId from. I tried to pass it in as a parameter but had to reach pretty high up the chain which changed a lot of code. Was hoping you had a better idea for where I can add this in from.

The ServerContext and/or ClientContext should be able to provide the InstanceId, and the protocol factory could be constructed with it. I'm not sure how much code that touches. I can try to take a look if you hit a wall here.

Okay I added the instance ID validation in 769d38f. Had to reach pretty far up the chain to get the instanceId so there were quite a few files changed but all of those are just parameter changes.

ctubbsii · 2025-04-23T20:02:29Z

Can you provide an example of the opentelemetry tracing output (screenshot or log sequences or something)? I'd be curious what they look like when a client performs a traced operation over the new RPC protocol.

DomGarguilo · 2025-04-23T22:53:01Z

Can you provide an example of the opentelemetry tracing output (screenshot or log sequences or something)? I'd be curious what they look like when a client performs a traced operation over the new RPC protocol.

I'm not too sure how much this shows but here is a screenshot from the "graph view" in Jaeger capturing traces from a scan from the shell

Create a custom thrift protocol to ensure client/server compatibility

2ddf5ae

DomGarguilo added this to the 4.0.0 milestone Mar 11, 2025

DomGarguilo requested a review from ctubbsii March 11, 2025 20:06

DomGarguilo self-assigned this Mar 11, 2025

Fix checkstyle violation

d2083db

keith-turner reviewed Mar 11, 2025

View reviewed changes

ctubbsii reviewed Mar 12, 2025

View reviewed changes

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java Outdated Show resolved Hide resolved

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java Outdated Show resolved Hide resolved

DomGarguilo added 8 commits March 12, 2025 14:03

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

46e2bac

Address feedback

1ea49cc

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

00f74e0

add exception recording to the span

5dfab38

remove TInfo and use headers instead

5f01ee0

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

3c89fef

make header map final

8022a49

remove missed TInfo use

17b418c

DomGarguilo marked this pull request as ready for review March 17, 2025 20:57

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

5e4c62f

keith-turner reviewed Mar 19, 2025

View reviewed changes

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java Outdated Show resolved Hide resolved

DomGarguilo added 4 commits March 19, 2025 16:59

Add unit test

4b8460f

Use int instead of short

59a55ae

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

7f1d05f

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

a52ebd5

DomGarguilo linked an issue Mar 20, 2025 that may be closed by this pull request

Create a custom thrift protocol to ensure client/server compatibility #5390

Open

kevinrr888 reviewed Mar 26, 2025

View reviewed changes

Address feedback

29c1234

kevinrr888 reviewed Mar 28, 2025

View reviewed changes

core/src/main/java/org/apache/accumulo/core/rpc/AccumuloProtocolFactory.java Outdated Show resolved Hide resolved

Use TriftUtil helper method for exception tracing

473426d

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

9ca0efe

ctubbsii requested changes Apr 4, 2025

View reviewed changes

DomGarguilo and others added 8 commits April 9, 2025 14:06

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

7c6cd0f

Rename to writeCLientHeader(), set scope & span null, remove fixed bo…

eba746e

…olean const Co-authored-by: Christopher Tubbs <[email protected]>

Move serialize and deserialize to TraceUtil

5580e8d

Rename trace from handleMessage to handleRpcMessage

6e749e2

Co-authored-by: Christopher Tubbs <[email protected]>

Validate against accumulo version

713226c

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

5219b5f

Improve error message and tests

387a38c

Fix unapproved char

0b8b3b7

DomGarguilo added 2 commits April 15, 2025 15:40

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

f1dff97

Create inner child span for each RPC

8d742ea

ctubbsii reviewed Apr 16, 2025

View reviewed changes

core/src/test/java/org/apache/accumulo/core/rpc/AccumuloProtocolTest.java Outdated Show resolved Hide resolved

ctubbsii reviewed Apr 16, 2025

View reviewed changes

core/src/main/java/org/apache/accumulo/core/trace/TraceUtil.java Outdated Show resolved Hide resolved

DomGarguilo and others added 5 commits April 16, 2025 11:03

Update core/src/test/java/org/apache/accumulo/core/rpc/AccumuloProtoc…

adc8e26

…olTest.java Co-authored-by: Christopher Tubbs <[email protected]>

Remove unneeded null check

aa5c8f5

Co-authored-by: Christopher Tubbs <[email protected]

Add instance ID validation to header

769d38f

use getInstanceId() method

cee2e7f

fix checkstyle

c93077b

DomGarguilo requested a review from ctubbsii April 16, 2025 19:49

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

f3c0fab

Merge remote-tracking branch 'upstream/main' into accumuloProtocol

4bbdf9d

		Span span = Span.current(); // should be set by protocol
		try {

Create a custom thrift protocol to ensure client/server compatibility #5398

Are you sure you want to change the base?

Create a custom thrift protocol to ensure client/server compatibility #5398

Uh oh!

Conversation

DomGarguilo commented Mar 11, 2025

Uh oh!

DomGarguilo commented Mar 11, 2025

Uh oh!

keith-turner Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

ctubbsii Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

DomGarguilo Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

ctubbsii Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ctubbsii left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DomGarguilo commented Mar 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DomGarguilo commented Mar 28, 2025

Uh oh!

Uh oh!

keith-turner commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DomGarguilo commented Apr 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ctubbsii Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

DomGarguilo Apr 15, 2025

Choose a reason for hiding this comment

Uh oh!

ctubbsii Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

ctubbsii commented Apr 8, 2025

Uh oh!

DomGarguilo commented Apr 11, 2025

Uh oh!

Uh oh!

ctubbsii commented Apr 16, 2025

Uh oh!

Uh oh!

DomGarguilo commented Apr 16, 2025

Uh oh!

ctubbsii commented Apr 23, 2025

Uh oh!

DomGarguilo commented Apr 23, 2025

Uh oh!

Uh oh!

keith-turner commented Apr 4, 2025 •

edited

Loading