-
Notifications
You must be signed in to change notification settings - Fork 21
tls/ossl: Add concurrency protections around output #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v25.3.x
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds concurrency protections around the TLS session's output operations to prevent rare race conditions during writes. The changes address a scenario where data could be written to the output BIO during get() operations (e.g., during renegotiation) outside the protection of the _out_sem semaphore, potentially causing assertion failures or nullptr dereferences.
Key Changes:
- Introduced
_output_in_progressflag to track when output operations are actively being processed - Removed friend declarations for BIO methods to enforce proper encapsulation
- Added public accessor methods (
output_pending(),output_available(),assign_output_pending(), etc.) for controlled access to output state
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5099fbf to
a2a5da3
Compare
rockwotj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think this makes sense. I wish there was a simpler way to encode this but I'm struggling to come up with something.
PS it'd be nice to coroutinize some of this code now.
a2a5da3 to
33cf225
Compare
|
Force push:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
33cf225 to
c30d237
Compare
|
Force push:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/net/ossl.cc
Outdated
| _output_in_progress = false; | ||
| }); |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Directly setting _output_in_progress = false bypasses the encapsulation provided by assign_output_error(). Consider using a dedicated method like clear_output_in_progress() to maintain consistency with other output state management functions and improve maintainability.
| semaphore _in_sem; | ||
| semaphore _out_sem; | ||
| tls_options _options; | ||
|
|
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _output_in_progress flag is a critical concurrency control mechanism but lacks documentation. Add a comment explaining its purpose: tracking when output operations are actively being processed, distinct from when futures are pending, to prevent concurrent writes during renegotiation.
| // Tracks when output operations are actively being processed. | |
| // This is distinct from when futures are pending (_output_pending). | |
| // Used as a concurrency control mechanism to prevent concurrent writes | |
| // during renegotiation. |
|
All failing the DNS unit test: happening in both OpenSSL & GnuTLS so probably not what I did.... |
src/net/ossl.cc
Outdated
| SEASTAR_ASSERT(!_output_in_progress); | ||
| SEASTAR_ASSERT(_output_pending.available()); | ||
| _output_in_progress = true; | ||
| _output_pending = std::move(f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just tag the reset for output in progress here
_output_in_progress = true;
_output_pending = std::move(f).finally([this]{this->_output_in_progress = false;});There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good thought... that won't conflict with error handling in wait_for_output right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move the error handling to here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really only need wait_for_output to side channel the future from the synchronous BIO_* stuff in ossl right? Feels like consolidating everythere here is better.
| tls_log.debug("{} bio_write_ex: system error occurred: {}", *session, e.what()); | ||
| ERR_raise_data(ERR_LIB_SYS, e.code().value(), e.what()); | ||
| session->_output_pending = make_exception_future<>(std::current_exception()); | ||
| session->assign_output_error(std::current_exception()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q:
so if i'm reading this correctly the source of errors is our put onto the network layer
which just gets stored in the future as an exception future,
and then all subsequent calls to bio_write_ex will just continually grab, rethrow, and re-store the exception back into the future?
Who is the ultimate upstream error handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caller to put() will handle the error. These are considered fatal so then the session should be destructed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does get_exception extract the exception?
If thats the case we could have
put -> bio_write -> future emplaced -> exception -> put detects and throws
followed by an out of band write or another put
Wondering if we should check that the future remains errored for all subsequent callers as a signal of "this session is borked"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens is
- PUT
- BIO write (error occurs and error is put into
_output_pending wait_for_output()gets called and attemps to resolve_output_pending- Within
wait_for_ouptut()handle_exceptionis triggered and that emplaces the error into_error - Any future calls to
put()check_errorand if that is set, returns_error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PUT
BIO write -> error stored in _output_pending
wait_for_output -> moves the exception out?, regardless throws
read fiber performs an out of band write, will it see the exception?
| // _output_pending is not enough to ensure that the output has completed. | ||
| // The purpose of the output_in_progress flag is to await that that | ||
| // future has resolved before permitting another write to be enqueued. | ||
| return !_output_in_progress && _output_pending.available(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should theoretically be sufficient to just check _output_in_progress, ya?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah... I guess I'm just overdoing it a little bit
c30d237 to
027a751
Compare
|
DNS tests will be fixed here: #246 |
| auto n = std::min(dlen, session->_input.size()); | ||
| memcpy(data, session->_input.get(), n); | ||
| session->_input.trim_front(n); | ||
| auto n = std::min(dlen, session->input().size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
the usages of input are either const (get size/empty)
or an extraction of a chunk.
the ref handing function breaks encapsulation, maybe just two member functions for get_input_size and extract_input(max_size)
This commit makes the following changes to add better concurrency protection around the output path: * No longer treats the BIO methods as friend functions * Adds an _output_in_progress flag It was originally understood that data being written to the output BIO would only occur while the TLS session's _out_sem semaphore was held. Assertions ensuring there were no concurrent output operations occuring provided guarantees that concurrent writes would not occur. Anytime data was written to the output BIO of the TLS session, it was followed by a call to wait_for_output() which would resolve only after the write completed. However, we have come to learn that data may be written outside of the protections afforded by the _out_sem semaphore during the get() method. This would, rarely, result in assertions firing during put(). This additionally could cause, in rare circumstances, a nullptr dereference as this extraneous write would overwrite the packet of a write in progress. The change to use the _output_in_progress flag ensures that the output operation has fully completed before attempting another write to the socket. Additionally, removing the friend relationship between the BIO methods and the TLS session ensures that future developers use the public functions of the session rather than directly modifying the members of the class. Signed-off-by: Michael Boquard <[email protected]>
027a751 to
b374848
Compare
|
Force push:
|
rockwotj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the status here? Do we feel good about this PR?
This commit makes the following changes to add better concurrency protection around the output path:
It was originally understood that data being written to the output BIO would only occur while the TLS session's _out_sem semaphore was held. Assertions ensuring there were no concurrent output operations occuring provided guarantees that concurrent writes would not occur. Anytime data was written to the output BIO of the TLS session, it was followed by a call to wait_for_output() which would resolve only after the write completed.
However, we have come to learn that data may be written outside of the protections afforded by the _out_sem semaphore during the get() method. This would, rarely, result in assertions firing during put().
This additionally could cause, in rare circumstances, a nullptr dereference as this extraneous write would overwrite the packet of a write in progress.
The change to use the _output_in_progress flag ensures that the output operation has fully completed before attempting another write to the socket. Additionally, removing the friend relationship between the BIO methods and the TLS session ensures that future developers use the public functions of the session rather than directly modifying the members of the class.