Skip to content

special case single TLS provider a bit#3

Closed
travisdowns wants to merge 7 commits into
masterfrom
td-tls-single-provider
Closed

special case single TLS provider a bit#3
travisdowns wants to merge 7 commits into
masterfrom
td-tls-single-provider

Conversation

@travisdowns

Copy link
Copy Markdown
Owner

The thrust is to make the single TLS provider mode work a bit more like it did before, i.e., not rely on runtime population of provider + error codes: if you compile in a single provider, take advantage of that.

This fixes some issue on Redpanda side in the 26.2.x rebase and avoids footguns.

A debug-only variant of SEASTAR_ASSERT that compiles to nothing in
non-debug build modes (Release, Dev, etc.) but still references its
argument via (void)sizeof so unused-variable warnings stay quiet.

For asserts that catch internal invariants too expensive to keep on in
release builds. The author should ensure the assert condition is
side-effect-free since it will not be evaluated in non-debug modes.

This is an alternative to `assert` from `<cassert>` as that has less
clear enablement semantics as end-users may adjust NDEBUG.
Replace OpenSSL's auto-detect-and-enable with explicit opt-in. The new
configure.py flag --tls-mode={gnutls,openssl,both} (default gnutls)
drives Seastar_GNUTLS and Seastar_OPENSSL together. Direct
-DSeastar_GNUTLS / -DSeastar_OPENSSL cache overrides still work.

This is less magic than auto-detect: users will want to pick what
backend they are using, rather than have cmake decide for them depending
on installed libraries which is fragile in the face of external changes
(e.g install some random library that happens to bring in to OpenSSL
on openssl which suddenly changes your seastar build mode).

When both backends are enabled, SEASTAR_TLS_DUAL_BACKEND is added as a
PUBLIC compile definition so the public TLS header and downstream code
can distinguish single- vs. dual-backend builds.
The seastar::tls::ERROR_* globals (e.g. ERROR_UNKNOWN_CIPHER_SUITE) were
mutable ints, zero-initialized at static-init time and filled in at
reactor startup by the active backend's init_error_codes() method.
Any access before reactor init (static initializers, unit tests that
don't spin up a reactor) silently read as 0, locking in the wrong value
with no diagnostic. This bit Redpanda unit tests that compare against
these constants without starting a reactor.

In single-backend builds (SEASTAR_TLS_DUAL_BACKEND not defined), the
active backend is fixed at compile time, so the values can be hard
coded. Use a new SEASTAR_TLS_ERROR_QUALIFIERS macro that expands to
'extern' in dual-backend builds and 'extern const' in single-backend
builds; define the globals as const with the backend's constants in
tls_<backend>.cc. Dual-backend builds still go through the dynamic
init_error_codes() path with no behavior change.
The opening comment described GnuTLS as the only backend with OpenSSL
replacement framed as hypothetical. Both backends are supported today,
optionally at the same time with the active one selected at reactor
startup via --crypto-provider.

Also add a "When backend-dependent state is valid" section that
documents the single-backend vs. dual-backend lifetime rules for
error_category(), backend_name(), the ERROR_* globals, and any function
that internally creates a TLS session, credentials, or DH params (all
of which route through internal::crypto::provider() in dual-backend
builds and require it to be installed by smp::configure() first). Trim
the per-symbol blurbs on error_category(), backend_name(), and the
ERROR_* block to point back at the shared section.
@travisdowns travisdowns requested review from StephanDollberg and removed request for StephanDollberg May 11, 2026 21:20

@dotnwat dotnwat left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks great. One question: for the case in which an application wants to access the error codes from global/static context and they are in dual-mode, then this is just a case where the application needs to adapt by avoiding accesses prior to reactor initialization?

@travisdowns

Copy link
Copy Markdown
Owner Author

PR looks great. One question: for the case in which an application wants to access the error codes from global/static context and they are in dual-mode, then this is just a case where the application needs to adapt by avoiding accesses prior to reactor initialization?

Yes. I don't really see a better way around it: since they are "just ints" we can't really hook into their access. I considered making them an object with implicit version to int, then we can hook the conversion to give a good error (instead of silently returning 0), but that's not a transparent change (object with conversion to int behaves differently to int in many contexts).

…uilds

In single-backend builds (only one of GnuTLS / OpenSSL compiled in)
there is no runtime choice to make: the active provider is fixed at
compile time. Replace the unique_ptr-installed-from-smp::configure()
scheme with a function-local static in provider(), constructed lazily
on first call.

In dual-backend builds, the runtime-installed provider is now paired
with an explicit reset: add internal::crypto::reset_provider() and
call it from smp::cleanup() so a subsequent app::run() in the same
process (which calls smp::configure() -> set_provider() again) starts
from a clean slate. set_provider() previously silently overwrote any
prior install, which obscured cross-app lifecycle bugs; the explicit
set/reset cycle makes the invariant follow-up commits will assert
("set is called exactly once per cycle") observable.

As a result:
* internal::crypto::set_provider() and reset_provider() are not
  compiled at all in single-backend builds, and the corresponding call
  sites in smp::configure() / smp::cleanup() are conditional on
  SEASTAR_TLS_DUAL_BACKEND.
* provider() is valid at any time in single-backend builds, including
  from static initializers and before reactor startup, mirroring the
  static-init guarantee the ERROR_* globals just got.

The --crypto-provider CLI flag and the reactor_options::crypto_provider
field stay unconditionally for compatibility; in single-backend builds
the option only offers the compiled-in backend (its value is unused
since there is nothing to install).
Dual-backend builds rely on smp::configure() calling set_provider()
exactly once before any provider() consumer runs. Catch violations
explicitly:

* SEASTAR_ASSERT in set_provider() that the_provider is null, so a
  double install (which would silently drop the previous provider and
  re-run init_error_codes()) fires loudly in all builds.
* SEASTAR_DEBUG_ASSERT in provider() that the_provider is set, so
  too-early access is caught in debug/sanitize/fuzz builds without
  paying the branch cost in release.
The configure.py default flipped from "auto-detect both backends" to
"--tls-mode=gnutls" (single-backend GnuTLS), which means the existing
matrix now exercises only the single-backend GnuTLS code path. Add two
standalone jobs to cover the other configurations:

* build_with_dual_tls (--tls-mode=both): keeps the dual-backend
  init_error_codes() + set_provider() path covered.
* build_with_openssl_tls (--tls-mode=openssl): exercises the
  single-backend OpenSSL static-init path which would otherwise be
  uncovered.

clang++ / C++23 / release matches the other dedicated-feature jobs
(DPDK, C++ modules) for consistency.
@travisdowns travisdowns force-pushed the td-tls-single-provider branch from 2a4eeda to 93f1650 Compare May 12, 2026 16:20
@travisdowns

Copy link
Copy Markdown
Owner Author

closing this, the real PR is upstream scylladb#3400 and also on our v26.2.x fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants