Skip to content

Conversation

@ygoldfeld
Copy link
Contributor

@ygoldfeld ygoldfeld commented Jan 15, 2026

fixes #84
fixes #66

fixes major part of #76
fixes good chunk of #82

  • Added new API class templates flow::util::Thread_local_state<T>, Polled_shared_state<S> for advanced thread-local-state patterns.
  • flow::util::Basic_blob<Allocator> support when Allocator is stateful.
  • flow::util::Basic_blob et al can optionally (via new overloads) alloc-and-zero as opposed to just-alloc, bringing additional performance (also convenience).
  • Added noexcept to certain APIs for flow::util::Basic_blob et al, resolving the stealth-flaw wherein a container thereof (e.g., vector<Blob>) would invoke blob copying instead of moving. Also slightly decrease sizeof(Basic_blob) et al. flow: Add noexcept to move ctors (at least); including to avoid copying in vector (at least). #76
  • flow::util::Linked_hash_map and Linked_hash_set have new API overloads to support move-semantics at insertion. Also internally increased perf and memory use. Lastly removed APIs [pop_]front(), [pop_]back().
  • flow::util added niceties including a couple new or improved arithmetic operations.
  • flow::log::Log_context::set_logger() added (not thread-safe).
  • New class flow::log::Log_context_mt which is thread-safe w/r/t set_logger() et al.
  • Misc internal impl clarity/brevity improvements.
  • Added unit-tests for several existing flow::util items (Basic_blob, Blob_with_log_context, Linked_hash_map, Linked_hash_set, and miscellany including ostream_op_string(), Scoped_setter) and for flow::log::Log_context. flow: Unit tests (more of them) needed. #82
  • Added unit-tests for aforementioned new and improved APIs.
  • Small GitHub CI pipeline tweaks.
  • Small build script tweaks.
  • Style guide update and associated refactor: nullptr over 0, {} over () for constructor invocation. flow: Style: 0 -> nullptr. #66
  • Comment and/or doc changes.

API notes

  • New APIs:
    • In flow::util:
      • As pertains to thread-local state:
        • New class template Thread_local_state_registry<T> maintains a thread-local user object of type T, created via new T on-demand and deleted either when its thread is joined (exits), or the ~Thread_local_state_registry() destructor executes -- whichever happens first. You may also, from any thread, use Thread_local_state_registry::state_per_thread() to access all extant threads' existing thread-local Ts.
          • At its core this utility is similar to boost::thread_specific_ptr, which many users took advantage of before built-in thread_local storage existed in C++.
          • However, thread_specific_ptr<T> is intentionally simple; it resolutely refuses to provide any registry-of-extant-thread-local-data functionality, including the specific would-be feature wherein ~thread_specific_ptr() dtor would clean up other threads' extant Ts.
          • Thread_local_state_registry thus works on top of thread_specific_ptr to provide such advanced functionality. Use that functionality with care. The doc headers for the utility give advice on this.
        • Also new is Polled_shared_state<Shared_state>, an optional companion to Thread_local_state_registry that enables a particular high-performance pattern, wherein one may "arm" an atomic flag per extant TLS-owning thread, and in each such thread quickly check this "armed-or-not" flag.
          • If armed (generally assumed to be a rare occurrence), checking the flag disarms it, and having observed this occuring one can perform some action the flag is meant to trigger.
          • In the fast-path the atomic flag-check yields a negative result and uses very few processor cycles.
          • Shared_state template parameter can be used to communicate progress/state about whichever cross-thread operation the "armed-or-not" flags are meant to facilitate. One can use an empty struct {}, if there is no need for such state, and the flag-set is sufficient.
      • As pertains to class Basic_blob and sub-class Blob_with_log_context templates:
        • Basic_blob<Allocator> now supports stateful Allocators (such as boost::interprocess::allocator), whereas before this would not compile.
          • It was meant to be supported already; but this was not tested until now. Unit-tests (see below) revealed the deficiency at compile-time.
        • New Basic_blob and Blob_with_log_context overloads that do zero-initialize memory during buffer allocation: reserve(), resize(), and resize()ing ctor. These overloads take tag-typed value CLEAR_ON_ALLOC.
          • Perf point: This is not mere syntactic sugar: It can be significantly faster to allocate-and-zero in one op (a-la calloc()), versus allocating (a-la malloc()) followed by zeroing (a-la memset(0)).
      • As pertains to the Linked_hash_map and Linked_hash_set container templates:
        • Linked_hash_map::insert() and operator[]() overloads with move-semantics for keys and, where applicable, mapped-values.
        • Linked_hash_set::insert() overload with move-semantics for keys.
        • Type alias members such as Key and Hash.
      • New constexpr function template round_to_multiple(a, b) (e.g., if b = 1024, then a = 0 => 1024, 1 => 1024, 1024 => 1024, 1025 => 2048).
      • Existing function template ceil_div() is now constexpr and slightly more flexible.
      • Added aliases for std::span-like ops: Span, DYNAMIC_EXTENT.
    • In flow::log:
      • Log_context::set_logger(): ability to (non-thread-safely) change the deriver's Logger*.
      • Log_context_mt: variant of Log_context with the ability to thread-safely (via mutex) change the deriver's Logger*. The mutex also protects assignment and swap().
        • Usage/perf point: As this adds a mutex-lock in the potentially frequently-invoked get_logger() function, this is intended for use when performance is irrelevant, or else if the perf-critical fast-paths zealously avoid log calls. Keep in mind this applies even to log-call-sites that do not pass the verbosity check, including every FLOW_LOG_TRACE() call for example. Use with caution.
  • Breaking changes:
    • Removed flow::util::Linked_hash_{map|set}::front() and back() as well as Linked_hash_set::pop_{front|back}().
      • These are sequence ops and don't really belong in these associative containers.

Impl notes

  • flow::util::Linked_hash_map and Linked_hash_set perf and memory-use upgrade: When inserting a new key, it is no longer stored in two locations internally and thus requires no copying. When using the new move-semantics overloads for insert-ops, no key copyability is required or used.
    • Internally this required some subtle data structures and template-wrangling; the 2 public class templates' impls use new common detail/ items: Linked_hash_key[_{hash|pred|set}]. The associative container used internally, for both public templates, is an unordered_set instead of unordered_map.
  • Internally to flow::util::Linked_hash_map and Linked_hash_set constructors, when n_buckets = -1 is specified, use a less dubious and faster technique to wrangle unordered_* to use a default bucket count.
  • flow::util::Basic_blob and Blob_with_log_context significant stealth-flaw fixed: The lack of noexcept on certain ops caused containers of blobs (e.g., vector<Blob>) to silently use Basic_blob/Blob_... copying instead of moving when needing to change their memory locations (e.g., when vector<Blob> must realloc upon exceeding the vector's .capacity()).
  • flow::util::Basic_blob and Blob_with_log_context use less space by internally employing the Empty Base Optimization (EBO). flow.util: Basic_blob - Empty base class optimization & members order optimization. #84
  • flow::util::ostream_op_[to_]string() and feed_args_to_ostream() impls are much shorter (use fold-expressions instead of compile-time recursion).
  • GitHub CI pipeline: Tweak to get gcc-9 builds to work again (GitHub changed preinstalled packages).
  • GitHub CI pipeline: Now that it is needed, adding machinery for LSAN (leak-sanitizer) suppression where needed for harmless/false positives.
  • GitHub CI pipeline: Minor tweak for consistency in the event the build fails before the NetFlow echo test/demo executes.
  • In blob and log-context impls, initializing members even when not strictly necessary; this eliminates some warnings in some compiler versions at a tiny possible perf cost.
  • Some Basic_blob internal comments contained falsehoods, while others were overly verbose and confusing; they have been corrected or removed.
  • Build script(s): Do not skip any gcc/clang "required by" per-function-template-call-frame lines in error output. Sometimes these contain key information about the source of a given compile error.

Code review notes

@echan-dev reviewed this exact change elsewhere.

… in *all* cases relevant to) by that Flow-IPC perf deep-dive. Detailed notes forthcoming in PR.
… so setting the manual-install flag for that compiler/version.
…lated problem with `optional<>` and internally used `boost::thread_specific_ptr`. / (cont) Eliminate unit-test compile warning in clang: self-assignment of `auto` object.
…t, swap; found in code review by echan; for some reason my brain glitched and told me I need not lock a const thing... which is embarrassing).
@ygoldfeld ygoldfeld self-assigned this Jan 15, 2026
@ygoldfeld ygoldfeld merged commit 82a10f3 into main Jan 15, 2026
96 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flow.util: Basic_blob - Empty base class optimization & members order optimization. flow: Style: 0 -> nullptr.

2 participants