Proposed v26.2.x: 68 RP commits on scylladb/master c302102b7#277
Proposed v26.2.x: 68 RP commits on scylladb/master c302102b7#277travisdowns wants to merge 73 commits into
Conversation
Seastar http server implementation supports multiple listeners. It may be required for the handler logic to know which listener the connection is coming from. Added listener_idx field to `httpd::request` to allow handler recognize listener. Signed-off-by: Michal Maslanka <michal@vectorized.io>
Since an exception carries some text for the response body text, the raising site might like to specify the content type if it's e.g. json. Signed-off-by: John Spray <jcs@vectorized.io>
This enables throwing a base_exception from a json request handler with a json payload inside it. Signed-off-by: John Spray <jcs@vectorized.io>
Signed-off-by: John Spray <jcs@vectorized.io>
Prior to this patch seastar only exposes one global metrics::impl::impl object which holds all metric related data for one application. This patch changes the implementation details such that multiple metrics::impl::impl objects can exist for any given application. Said objects are stored into a map on each shard and created dinamically whenever requested. A metrics::impl::impl is identified by an integer handle that acts as the key for the storage map. Implementation note: in order to avoid issues caused by the ordering of static thread_local objects I had to declare the storage in reactor.cc. (cherry picked from commit 585a8af)
This patch extends the metrics internal apis to use a specific metrics::impl::impl object identified by its integer handle. (cherry picked from commit 6ee4af7)
Add a public method to metric_groups_impl that exposes the handle of the internal implementation it is using. This is required in order for the metric_groups class to be able to reset itself to the configured implementation handle.
This patch extends the metrics user facing apis to use a specific metrics::impl::impl object identified by its integer handle. Note that the constructor of 'metric_groups' is marked explicit in this patch and updates two call sites where the constructor was used implicitly.
This patch removes two subsequent calls to `get_local_impl` and reuses the returned handle in that scope.
This patch extends the user facing prometheus apis allowing the user to specify the internal metrics implementation to be used through a handle. Additionally, 'add_prometheus_routes' now takes an argument that specifies the route on which to advertise the metrics. This enables different metrics "namespaces" to be served by different endpoints in isolation. (cherry picked from commit 6189522)
This patch extends the scollectd apis with the ability to select the internal metrics implementation to be used by providing a handle. (cherry picked from commit d4331d1)
This patch adds a 'get_skip_when_empy' getter to the 'registered_metric' class. It is used by follow-up patches in order to replicate metrics.
This patch adds private methods to the 'metrics::impl' class that deal with the creation of replicated metrics. They will be used to build the public api in future commits.
This patch adds private helpers to 'metrics::impl' that deal with the removal of replicated metric families from their destintation implementation. These methods will be used in subsequent commits to manage the lifetime of replicated metrics.
This patch adds a public method to the 'metrics::impl' class: 'set_metric_families_to_replicate'. When this method is called the families that match any of the specifications will be replicated on the specified destinations.
This patch extends the metric registration and unregistration processes to make them aware of metric replication. In the case of metric registration, if the new metric belongs to a family that matches one of the replication specs, then a replicated metric is created accordingly. For unregistration of a metric, the replicated metric is unregistered too if one exists.
This patch exposes a method in the public interface of the metrics
module ('replicate_metric_families'), which enables metric replication
internally for the requested metric families.
Extends the metrics api to allow changing the aggregation labels of a metrics family. Otherwise one had to un-register every single metric instance in a metric family and then re-register with the changed aggregation labels. For metric families with thousands of instances (e.g.: histograms with lots of different labels) this is quite expensive. With this change we avoid the full reconstruction of the metrics family and all its metrics. Only the work associated with marking the metrics `dirty()` is needed then.
This commit extends the public interface of scheduling_group to expose usage statistics (e.g. runtime).
Redpanda uses this logger name for its http client and we choose to change the logger name in Seastar to avoid duplicate logger registration exceptions.
We sort of inadvertently picked up the change to io_uring when rebasing our seastar fork, but for the coming release we'd like to keep using aio to reduce risk and give sufficient time to do performance tests on io_uring. This effectively rolls back the upstream commit: eedca15 We simply put aio and epoll above io_uring in the available list, the default backend is the first one in that list Issue redpanda-data/redpanda#10105.
This change is to allow for the timer code in the stall detectors to be used in a profiler implementation. The hope is to be able to reuse a lot of the stall_detector codebase in the profiler without complicating the existing stall_detector implementation. The new posix_timer constructor sets sigev_notify_thread_id via the macro that musl libc / FreeBSD / Linux define for that field, mirroring upstream commit bbe1af3 (which patched the cpu_stall_detector_posix_timer constructor but predated and so missed the new posix_timer site). see https://sourceware.org/bugzilla/show_bug.cgi?id=27417
Moves some functions and classes defined in the stall_detector test to a separate header file so they can be reused in other tests.
In libgcc there is a critical section where the stack is being modified so execution can return to an exceptions landing pad. However, this partially modified state isn’t valid or specified by the dwarf info in the eh_frames. So a seg fault tends to occur when `backtrace()` tries to unwind through this partially modified stack and follows an invalid pointer in it.
in the configuration of the io_group. The io_groups can be used to get the original values which are needed to implement throttling in Redpanda.
EAGAIN is expected here when "Insufficient resources are available to queue any iocbs" (see io_submit(2)). Abort on any other error, as those indicate an internal error on our side. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit 9fff5f3)
The `aio_general_context` had the implicit assumption that in a single tick we would never queue more than `--max-networking-io-control-blocks` events/iocbs. This however ignores situations such as queuing multiple iocbs per socket per tick, having left over iocbs in the queue from previous ticks via the new EAGAIN handling or simply because a lot more sockets are in use which isn't prevented anywhere else. If this condition was hit (`last > end`) the reactor would just assert out and crash. To avoid this situation this patch introduces a backlog into which elements are being enqueued when the original array is full and which can grow unbounded. This mirrors how the `aio_storage_context` works which uses the `io_sink` for the same purpose. To avoid oversized allocations after startup the split into two separate data structures is needed (instead of just regrowing the array). Further the datastructure from which the iocbs are passed into `io_submit` needs to be in contigiuous memory (and also provide an API to use it which most containers don't). `std::deque` is used in the backlog to avoid oversized allocations in the backlog itself. The existing array solution for `iocbs` is kept to fulfill the contigiuous memory requirement. Further we slightly change how EAGAIN is handled. Instead of backshifting the array we keep the array as is and just track the `begin` of the array across `flush` calls. This is possible now as the backlog handling is in place. This introduces "batching" and prevents degenerate cases where only a single element is being submitted which would then result in repeated shifting of the whole array on each `flush` call. Given we use a chunked data structure like `std::deque` erasing from the front of the backlog is relatively cheap and does not require shifting all the elements in the backlog. Hence, the per-iocb overhead is amortized constant. Note that in general we try to submit as many iocbs per `io_submit` call. Given the new behaviour of not backshifting the iocb array and immediately backfilling from the backlog we might introduce `io_submit` calls that don't try to submit the max amount of iocbs. However we assume that if we ran into EAGAIN then either: - We are still behind the next time around: it's unlikely we would succeed in submitting all the iocbs anyway - We have now caugt up: we have introduced a single additional `io_submit` call which only submits `max_poll()/2` iocbs on average. The backlog will be drained at full `max_poll` per `io_submit`. (cherry picked from commit d9175fc)
http::request::content is deprecated upstream, with the idea that you set the server into streaming mode and use the input_stream<> in the request directly. This is not a totally trivial change, so for now we want to just keep using content as this functionality is the same as always, so we remove the deprecation from our fork for now. See also CORE-15051.
…back mode When using scoped_system_alloc_fallback, large allocations are expected and intentional. Reduce the log level from warn to debug to avoid spamming logs in this case. Also change the message text to avoid triggering the BLL check and skip the backtrace since it's not useful for expected allocations.
Add struct cert_info (serial number + expiry) to the public API and implement get_cert_info() and get_trust_list_info() on certificate_credentials, backed by virtual methods on credentials_impl. Both GnuTLS and OpenSSL backends extract serial and expiry from loaded certificates and trust lists. Port of the following commits from v26.2.x-pre onto the upstream crypto provider architecture: b8438b3 net/tls: Introduce cert_info and accessors e4c696a net/tls(ossl): Introduce cert_info and accessors 76b82e3 net/tls: Adjust type for cert_info.serial 06eaf07 net/tls: Replace cert_info::bytes with vector<byte>
Add a new reload_callback_with_creds callback type that receives the
reloaded certificate_credentials and trust file blob, in addition to the
changed files and exception_ptr. This allows callers to inspect the new
credentials (e.g., via get_cert_info()) at reload time without having to
rebuild them.
Add credentials_builder::get_trust_file_blob() to retrieve the loaded
trust file contents. Add build_reloadable_{certificate,server}_credentials
overloads accepting reload_callback_with_creds.
Add test_reload_certificates_with_creds test case.
Port of the following commits from v26.2.x-pre onto the upstream
crypto provider architecture:
f716e6a net/tls: Add reload_callback_with_creds
232567e tls: Include trust file contents with reload callback
3f86a53 tls_test: Add tests for new reload callback and cert_info accessors
Add enum class dn_format { legacy, rfc2253 } and an overload of
get_dn_information() that accepts a format parameter. The OpenSSL
backend switches X509_NAME_print_ex flags based on the format:
XN_FLAG_RFC2253 for rfc2253, and the legacy seastar flags for legacy.
GnuTLS ignores the parameter as it does not provide a mechanism to
change the DN output format.
Port of the following commit from v26.2.x-pre onto the upstream
crypto provider architecture:
291dc51 tls: Added support for fetching DN in RFC2253 format
OpenSSL's API contract is too loose and the impact too wide (e.g. low-priority HTTPS traffic could crash the whole process) that it makes sense to only terminate here in debug builds. Restores RP-specific behavior on top of upstream PR scylladb#3369 (which removed the assert entirely in favor of just logging). Uses plain assert() rather than SEASTAR_ASSERT to fire only in debug builds. Port of v26.2.x-pre commit 4152d2f onto upstream's verify_clean_error_queue function (different file: tls_openssl.cc vs the original ossl.cc).
pgellert
left a comment
There was a problem hiding this comment.
I took a look at the TLS changes, and they look good to me
8d712a2 to
097536f
Compare
|
@StephanDollberg wrote:
Both have been dropped. The latter needs changes on RP side, which have been pushed as redpanda-data/redpanda#30395. |
A debug-only variant of SEASTAR_ASSERT that compiles to nothing in non-debug build modes (Release, Dev, etc.) but still references its argument via (void)sizeof so unused-variable warnings stay quiet. For asserts that catch internal invariants too expensive to keep on in release builds. The author should ensure the assert condition is side-effect-free since it will not be evaluated in non-debug modes. This is an alternative to `assert` from `<cassert>` as that has less clear enablement semantics as end-users may adjust NDEBUG.
Replace OpenSSL's auto-detect-and-enable with explicit opt-in. The new
configure.py flag --tls-mode={gnutls,openssl,both} (default gnutls)
drives Seastar_GNUTLS and Seastar_OPENSSL together. Direct
-DSeastar_GNUTLS / -DSeastar_OPENSSL cache overrides still work.
This is less magic than auto-detect: users will want to pick what
backend they are using, rather than have cmake decide for them depending
on installed libraries which is fragile in the face of external changes
(e.g install some random library that happens to bring in to OpenSSL
on openssl which suddenly changes your seastar build mode).
When both backends are enabled, SEASTAR_TLS_DUAL_BACKEND is added as a
PUBLIC compile definition so the public TLS header and downstream code
can distinguish single- vs. dual-backend builds.
The seastar::tls::ERROR_* globals (e.g. ERROR_UNKNOWN_CIPHER_SUITE) were mutable ints, zero-initialized at static-init time and filled in at reactor startup by the active backend's init_error_codes() method. Any access before reactor init (static initializers, unit tests that don't spin up a reactor) silently read as 0, locking in the wrong value with no diagnostic. This bit Redpanda unit tests that compare against these constants without starting a reactor. In single-backend builds (SEASTAR_TLS_DUAL_BACKEND not defined), the active backend is fixed at compile time, so the values can be hard coded. Use a new SEASTAR_TLS_ERROR_QUALIFIERS macro that expands to 'extern' in dual-backend builds and 'extern const' in single-backend builds; define the globals as const with the backend's constants in tls_<backend>.cc. Dual-backend builds still go through the dynamic init_error_codes() path with no behavior change.
The opening comment described GnuTLS as the only backend with OpenSSL replacement framed as hypothetical. Both backends are supported today, optionally at the same time with the active one selected at reactor startup via --crypto-provider. Also add a "When backend-dependent state is valid" section that documents the single-backend vs. dual-backend lifetime rules for error_category(), backend_name(), the ERROR_* globals, and any function that internally creates a TLS session, credentials, or DH params (all of which route through internal::crypto::provider() in dual-backend builds and require it to be installed by smp::configure() first). Trim the per-symbol blurbs on error_category(), backend_name(), and the ERROR_* block to point back at the shared section.
…uilds
In single-backend builds (only one of GnuTLS / OpenSSL compiled in)
there is no runtime choice to make: the active provider is fixed at
compile time. Replace the unique_ptr-installed-from-smp::configure()
scheme with a function-local static in provider(), constructed lazily
on first call.
In dual-backend builds, the runtime-installed provider is now paired
with an explicit reset: add internal::crypto::reset_provider() and
call it from smp::cleanup() so a subsequent app::run() in the same
process (which calls smp::configure() -> set_provider() again) starts
from a clean slate. set_provider() previously silently overwrote any
prior install, which obscured cross-app lifecycle bugs; the explicit
set/reset cycle makes the invariant follow-up commits will assert
("set is called exactly once per cycle") observable.
As a result:
* internal::crypto::set_provider() and reset_provider() are not
compiled at all in single-backend builds, and the corresponding call
sites in smp::configure() / smp::cleanup() are conditional on
SEASTAR_TLS_DUAL_BACKEND.
* provider() is valid at any time in single-backend builds, including
from static initializers and before reactor startup, mirroring the
static-init guarantee the ERROR_* globals just got.
The --crypto-provider CLI flag and the reactor_options::crypto_provider
field stay unconditionally for compatibility; in single-backend builds
the option only offers the compiled-in backend (its value is unused
since there is nothing to install).
Dual-backend builds rely on smp::configure() calling set_provider() exactly once before any provider() consumer runs. Catch violations explicitly: * SEASTAR_ASSERT in set_provider() that the_provider is null, so a double install (which would silently drop the previous provider and re-run init_error_codes()) fires loudly in all builds. * SEASTAR_DEBUG_ASSERT in provider() that the_provider is set, so too-early access is caught in debug/sanitize/fuzz builds without paying the branch cost in release.
The configure.py default flipped from "auto-detect both backends" to "--tls-mode=gnutls" (single-backend GnuTLS), which means the existing matrix now exercises only the single-backend GnuTLS code path. Add two standalone jobs to cover the other configurations: * build_with_dual_tls (--tls-mode=both): keeps the dual-backend init_error_codes() + set_provider() path covered. * build_with_openssl_tls (--tls-mode=openssl): exercises the single-backend OpenSSL static-init path which would otherwise be uncovered. clang++ / C++23 / release matches the other dedicated-feature jobs (DPDK, C++ modules) for consistency.
Updated 2026-05-12Force-pushed What changedDropped two commits during code review as suggested by @StephanDollberg :
Cherry-picked 7 new commits from the
Build status
Updated artifacts
|
Point at the rebased v26.2.x branch (redpanda-data/seastar#277). Replaces the prior v26.2.x-pre snapshot at a0b4f2a6. Picks up the TLS fixes — see redpanda-data/seastar#277 (comment).
Point at the rebased v26.2.x branch (redpanda-data/seastar#277). Replaces the prior v26.2.x-pre snapshot at a0b4f2a6. Picks up the TLS fixes — see redpanda-data/seastar#277 (comment).
Point at the rebased v26.2.x branch (redpanda-data/seastar#277). Replaces the prior v26.2.x-pre snapshot at a0b4f2a6. Picks up the TLS fixes — see redpanda-data/seastar#277 (comment).
Point at the rebased v26.2.x branch (redpanda-data/seastar#277). Replaces the prior v26.2.x-pre snapshot at a0b4f2a6. Picks up the TLS fixes — see redpanda-data/seastar#277 (comment).
Point at the rebased v26.2.x branch (redpanda-data/seastar#277). Replaces the prior v26.2.x-pre snapshot at a0b4f2a6. Picks up the TLS fixes — see redpanda-data/seastar#277 (comment).
|
closing as this was for review only: we have since completed the rebase and redpanda is running on the rebased version |
This PR shows the proposed contents of the v26.2.x branch after rebase for review. It contains 73 redpanda-specific commits on top of
scylladb/masteratc302102b7. I don't expect anyone to review these lines by line as these have already been largely reviewed when originally checked in, but more look at the overall approach and maybe do spot checks.@dotnwat put you on here especially for the OpenSSL stuff.
Reference
proposed-v26.2.x-merge-base):c302102b7c3ac10a02723167dcb155be908b135c—scylladb/mastertip as of the rebase point. This is a frozen reference for the diff.travisdowns:proposed-v26.2.x):0dd1a13c11d8573481180deab69eb7ca0b345b74— the proposed v26.2.x with all RP commits replayed.What's in here
73 commits, broken down as:
td-tls-single-providerbranch that adds explicit TLS-backend selection (--tls-mode={gnutls,openssl,both}) and tightens up single-backend buildsA further commits from v26.2.x-pre were dropped or upstreamed during this work.
Reading the diff
The TLS surface in particular saw the biggest changes because upstream merged a pluggable crypto provider rewrite (Noah Watkins, scylladb#3360 series); our RP-specific TLS features (
cert_info,reload_callback_with_creds,dn_format) were rewritten against that new architecture rather than carried forward as-is.Additional details
Two comments on this PR have additional context:
v26.2.x-precommit with its disposition in this PR (clean / edits / ported / upstreamed / dropped).