|
| 1 | +# The Frame Allocator Is Not the Environment Allocator |
| 2 | + |
| 3 | +Dietmar, |
| 4 | + |
| 5 | +This builds on the previous document (p3552-tls-fix.md). The TLS |
| 6 | +mechanism described there is correct but one design choice was |
| 7 | +wrong: storing the frame allocator from `get_allocator(get_env(receiver))` |
| 8 | +at connect time. The frame allocator should never come from the |
| 9 | +environment. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## 1. Two Allocators, Two Purposes |
| 14 | + |
| 15 | +The P2300 environment's `get_allocator` is a general-purpose |
| 16 | +allocator. Any coroutine can query it: |
| 17 | + |
| 18 | +```cpp |
| 19 | +auto alloc = co_await ex::read_env(ex::get_allocator); |
| 20 | +alloc.allocate(sizeof(MyObject)); // used for anything |
| 21 | +``` |
| 22 | +
|
| 23 | +This is the right design for general-purpose allocation. Users |
| 24 | +put an allocator in the environment and any code in the chain can |
| 25 | +use it for containers, strings, connection pools, whatever the |
| 26 | +application needs. |
| 27 | +
|
| 28 | +A frame allocator is a different thing. Frame allocators exploit |
| 29 | +a narrow pattern: coroutine frame sizes repeat, lifetimes nest, |
| 30 | +and deallocation order mirrors allocation order. A recycling |
| 31 | +frame allocator caches recently freed frames for immediate reuse. |
| 32 | +P4003R0 Section 2.3 shows 3.1x speedup on MSVC over |
| 33 | +`std::allocator`, and 1.28x over mimalloc. |
| 34 | +
|
| 35 | +These performance properties depend on the allocation pattern |
| 36 | +being narrow. If arbitrary code can grab the frame allocator and |
| 37 | +use it for a `std::vector` that grows to 10 MB, or a database |
| 38 | +connection pool, or a JSON parser's scratch buffer, the pattern |
| 39 | +is destroyed. The recycler's size classes no longer match. The |
| 40 | +LIFO assumption breaks. The bounded pool overflows. |
| 41 | +
|
| 42 | +If the frame allocator is in the environment, this is exactly |
| 43 | +what happens. Any coroutine in the chain can |
| 44 | +`co_await read_env(get_allocator)` and use it for anything. The |
| 45 | +environment does not distinguish between "allocator for frames" |
| 46 | +and "allocator for everything else." There is one query and one |
| 47 | +answer. |
| 48 | +
|
| 49 | +The frame allocator must be a separate channel: |
| 50 | +
|
| 51 | +- Not queryable from user code |
| 52 | +- Not in the environment |
| 53 | +- Read only by `promise_type::operator new` |
| 54 | +- Propagated through TLS, restored at resume points |
| 55 | +
|
| 56 | +The environment's `get_allocator` remains what it is: a |
| 57 | +general-purpose allocator for application use. |
| 58 | +
|
| 59 | +--- |
| 60 | +
|
| 61 | +## 2. `with_frame_allocator` |
| 62 | +
|
| 63 | +The previous document's Section 3.3 stored the frame allocator |
| 64 | +from the environment at connect time. Replace that. The frame |
| 65 | +allocator is established at the launch site through TLS and |
| 66 | +recovered from the frame itself, never from the environment. |
| 67 | +
|
| 68 | +### 2.1 The Algorithm |
| 69 | +
|
| 70 | +```cpp |
| 71 | +auto with_frame_allocator(std::pmr::memory_resource* mr) { |
| 72 | + detail::set_frame_allocator(mr); |
| 73 | + return [](auto task) { |
| 74 | + detail::set_frame_allocator(nullptr); |
| 75 | + return std::move(task); |
| 76 | + }; |
| 77 | +} |
| 78 | +``` |
| 79 | + |
| 80 | +Usage: |
| 81 | + |
| 82 | +```cpp |
| 83 | +with_frame_allocator(&pool)(my_server(sock)) |
| 84 | +``` |
| 85 | +
|
| 86 | +### 2.2 Why This Is Safe |
| 87 | +
|
| 88 | +C++17 [expr.call] p5: "The postfix-expression is sequenced |
| 89 | +before each expression in the expression-list." |
| 90 | +
|
| 91 | +In `with_frame_allocator(&pool)(my_server(sock))`: |
| 92 | +
|
| 93 | +1. `with_frame_allocator(&pool)` executes first - sets TLS, |
| 94 | + returns a callable |
| 95 | +2. `my_server(sock)` executes second - `operator new` reads |
| 96 | + TLS, frame allocated with the pool |
| 97 | +3. The callable is invoked with the task - clears TLS, returns |
| 98 | + the task as-is |
| 99 | +
|
| 100 | +TLS is set and cleared within a single expression. No guard to |
| 101 | +misuse. No way to forget to clear it. |
| 102 | +
|
| 103 | +### 2.3 The Change from the Previous Document |
| 104 | +
|
| 105 | +The previous document's Section 3.3 said: at connect time, read |
| 106 | +`get_allocator(get_env(receiver))` and store it in the promise. |
| 107 | +This is replaced. |
| 108 | +
|
| 109 | +The promise recovers the frame allocator from its own frame. The |
| 110 | +`operator new` in Section 3.2 of the previous document already |
| 111 | +stashes the `memory_resource*` at the end of the frame. The |
| 112 | +promise reads it back: |
| 113 | +
|
| 114 | +```cpp |
| 115 | +std::pmr::memory_resource* recover_frame_allocator() noexcept { |
| 116 | + auto* self = coroutine_handle<promise_type>::from_promise(*this) |
| 117 | + .address(); |
| 118 | + std::pmr::memory_resource* mr; |
| 119 | + std::memcpy(&mr, |
| 120 | + static_cast<char*>(self) + frame_size_, |
| 121 | + sizeof(mr)); |
| 122 | + return mr; |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +This is the value that the resume-point restoration (Section 3.4 |
| 127 | +of the previous document) writes to TLS. It comes from the frame, |
| 128 | +not from the environment. |
| 129 | + |
| 130 | +The environment is not involved. `get_allocator` in the |
| 131 | +environment is untouched. The two allocators are independent. |
| 132 | + |
| 133 | +--- |
| 134 | + |
| 135 | +## 3. What the User Sees |
| 136 | + |
| 137 | +Frame allocator only: |
| 138 | + |
| 139 | +```cpp |
| 140 | +std::pmr::monotonic_buffer_resource pool; |
| 141 | + |
| 142 | +ex::sync_wait( |
| 143 | + with_frame_allocator(&pool)(my_server(sock))); |
| 144 | +``` |
| 145 | +
|
| 146 | +Frame allocator plus general-purpose environment allocator: |
| 147 | +
|
| 148 | +```cpp |
| 149 | +std::pmr::monotonic_buffer_resource frame_pool; |
| 150 | +std::pmr::polymorphic_allocator general_alloc(&app_pool); |
| 151 | +
|
| 152 | +ex::sync_wait( |
| 153 | + ex::write_env( |
| 154 | + with_frame_allocator(&frame_pool)(my_server(sock)), |
| 155 | + ex::env{ex::prop{ex::get_allocator, general_alloc}})); |
| 156 | +``` |
| 157 | + |
| 158 | +Two allocators. Two channels. Independent. The coroutine chain |
| 159 | +is unaware of both: |
| 160 | + |
| 161 | +```cpp |
| 162 | +ex::task<void> my_server(socket& sock) { |
| 163 | + for (;;) { |
| 164 | + auto conn = co_await accept(sock); |
| 165 | + co_await handle_connection(conn); |
| 166 | + } |
| 167 | +} |
| 168 | + |
| 169 | +ex::task<void> handle_connection(connection& conn) { |
| 170 | + auto req = co_await read_request(conn); |
| 171 | + auto resp = process(req); |
| 172 | + co_await write_response(conn, resp); |
| 173 | +} |
| 174 | +``` |
| 175 | +
|
| 176 | +Every frame in the chain uses the frame pool. If any coroutine |
| 177 | +needs the general-purpose allocator for application logic, it |
| 178 | +queries the environment: |
| 179 | +
|
| 180 | +```cpp |
| 181 | +auto alloc = co_await ex::read_env(ex::get_allocator); |
| 182 | +std::pmr::vector<char> buf(alloc); |
| 183 | +``` |
| 184 | + |
| 185 | +The frame allocator is not reachable from this query. It cannot |
| 186 | +be misused. |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +## 4. Swappable Implementations |
| 191 | + |
| 192 | +Because `with_frame_allocator` takes a `memory_resource*`, the |
| 193 | +user is never locked into a particular frame allocator |
| 194 | +implementation. A recycling allocator with size-class buckets is |
| 195 | +optimal for coroutine frames - Capy ships one: |
| 196 | + |
| 197 | +https://github.com/cppalliance/capy/blob/18c30d2197ac8804f9426005576d4f9b80a76135/include/boost/capy/ex/recycling_memory_resource.hpp |
| 198 | + |
| 199 | +It uses power-of-two size classes (64 to 2048 bytes), a |
| 200 | +thread-local pool for lock-free fast-path allocation, and a |
| 201 | +global pool with a mutex for cross-thread block sharing. |
| 202 | +Allocations above 2048 bytes bypass the pools. This is the kind |
| 203 | +of allocator that exploits the narrow frame allocation pattern - |
| 204 | +and the kind that breaks if arbitrary non-frame allocations |
| 205 | +pollute it. |
| 206 | + |
| 207 | +But the user can swap in any `memory_resource*` they want: |
| 208 | + |
| 209 | +```cpp |
| 210 | +// recycling allocator (production) |
| 211 | +with_frame_allocator(get_recycling_memory_resource())(my_server(sock)) |
| 212 | + |
| 213 | +// monotonic buffer (testing, bounded memory) |
| 214 | +std::pmr::monotonic_buffer_resource buf; |
| 215 | +with_frame_allocator(&buf)(my_server(sock)) |
| 216 | + |
| 217 | +// tracking allocator (debugging) |
| 218 | +tracking_memory_resource tracker; |
| 219 | +with_frame_allocator(&tracker)(my_server(sock)) |
| 220 | +``` |
| 221 | +
|
| 222 | +One call site, any strategy. The coroutine chain does not change. |
| 223 | +
|
| 224 | +--- |
| 225 | +
|
| 226 | +## 5. Extensibility |
| 227 | +
|
| 228 | +How third-party task authors participate in frame allocator |
| 229 | +propagation is your design space. The mechanism is in |
| 230 | +`promise_type::operator new` - any task type that reads TLS in |
| 231 | +its `operator new` and restores TLS at resume points |
| 232 | +participates. You can expose the TLS accessors, define a |
| 233 | +concept, or leave it as implementation detail. |
| 234 | +
|
| 235 | +P4003R0 Section 5 shows one approach. It is not the only one. |
0 commit comments