-
Notifications
You must be signed in to change notification settings - Fork 460
WIP add config for max total thrift messages sizes #5550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.1
Are you sure you want to change the base?
WIP add config for max total thrift messages sizes #5550
Conversation
I like this approach to managing the thrift queue. Is it reasonable to say that different processes may need a different limit? |
Could definitely see different processes having different settings. Need a way to get information at runtime on this atomic long buried in the thrift code that would be useful for tuning. |
Opened apache/thrift#3135 to provide some runtime insight into the memory limiting feature that thrift has. |
Some observations: It appears that based on the referenced Thrift code, if the incoming frame is less than the max frame size, but reading the frame would use more memory than the total, it returns The tcp stack has configuration knobs for setting the buffer size on the sender and receiver side, and for how much data can be in flight. I think the other question here is what happens on the Thrift sender side when the total max memory is reached on the receiver side. Does it just sit there and wait indefinitely? If the sender waits forever and does not get a timeout or other exception, then maybe there is nothing to do here. |
I added an IT to test what happens in this case. The IT starts a Coordinator derivative that sleeps when The first three Thrift client calls is less than the RPC_MAX_MESSAGE_SIZE, so they are allowed into the Coordinator. The fourth call crosses the total message size boundary, so it sits there and waits. You can see the 4 tcp connections from the test to the Coordinator, with 3 of them having no bytes in the receive and send queue, and 1 of them having almost 1 million bytes on the receive queue. You can also see in the coordinator log that 3 msgs are received. The IT will sit there in this state for 10 minutes, the default timeout. This is because the |
yeah that is what I suspected it was doing. It seems to read 4 bytes off the network for the frame size and then only does further reads if the frame size passes the two checks. I was thinking that these props could be set w/ like a 1M max frame size and 100M max total buffer amount. If the max frame size was set to 100M and the max total amount to buffer was set to 1M, then it could create a situation that allows the code to hang forever. Probably need to check the two property values to make sure that one is |
Thrift supports the following two configurations related to incoming framed messages sizes.
The thrift code implementing to these two configurations can be seen here.
Accumulo configures both thrift settings to the same value. Experimented with adding a seperate setting for the total across all configurations in this PR. Also added a ton of logging and a test to convince myself this is actually working, and it is working will see thrift deferring reading from a connection when there is too much already in memory. Can not think of a way to actaully test this feature because it will eventually work, there is just an indeterminate delay.
One uncertainty I have about adding this new property is the way it changes behavior. The existing accumulo property used to configure two thrift settings in code. After this change the existing accumulo property only configures one thrift setting code and the new accumulo property configures the other thrift setting. This changes seem tricky for the case where the existing property is already set. That uncertainty about the new property and not having an ability to test this new property are the reasons why I am opening this as draft for now. The test that was added was used to cause activity that would allow me to look in the logs and verify this is working, it does not actually test the new property is working.
Another reason this draft is that only supported this new property in the default thrift server type, do not look at the other thrift server types to see if the new property would be applicable.