Skip to content

Commit 773960b

Browse files
committed
Add p3552r3 task allocator timing fixes
1 parent 99d47e6 commit 773960b

File tree

2 files changed

+588
-0
lines changed

2 files changed

+588
-0
lines changed

source/p2300r10-alloc-fix.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# The Frame Allocator Is Not the Environment Allocator
2+
3+
Dietmar,
4+
5+
This builds on the previous document (p3552-tls-fix.md). The TLS
6+
mechanism described there is correct but one design choice was
7+
wrong: storing the frame allocator from `get_allocator(get_env(receiver))`
8+
at connect time. The frame allocator should never come from the
9+
environment.
10+
11+
---
12+
13+
## 1. Two Allocators, Two Purposes
14+
15+
The P2300 environment's `get_allocator` is a general-purpose
16+
allocator. Any coroutine can query it:
17+
18+
```cpp
19+
auto alloc = co_await ex::read_env(ex::get_allocator);
20+
alloc.allocate(sizeof(MyObject)); // used for anything
21+
```
22+
23+
This is the right design for general-purpose allocation. Users
24+
put an allocator in the environment and any code in the chain can
25+
use it for containers, strings, connection pools, whatever the
26+
application needs.
27+
28+
A frame allocator is a different thing. Frame allocators exploit
29+
a narrow pattern: coroutine frame sizes repeat, lifetimes nest,
30+
and deallocation order mirrors allocation order. A recycling
31+
frame allocator caches recently freed frames for immediate reuse.
32+
P4003R0 Section 2.3 shows 3.1x speedup on MSVC over
33+
`std::allocator`, and 1.28x over mimalloc.
34+
35+
These performance properties depend on the allocation pattern
36+
being narrow. If arbitrary code can grab the frame allocator and
37+
use it for a `std::vector` that grows to 10 MB, or a database
38+
connection pool, or a JSON parser's scratch buffer, the pattern
39+
is destroyed. The recycler's size classes no longer match. The
40+
LIFO assumption breaks. The bounded pool overflows.
41+
42+
If the frame allocator is in the environment, this is exactly
43+
what happens. Any coroutine in the chain can
44+
`co_await read_env(get_allocator)` and use it for anything. The
45+
environment does not distinguish between "allocator for frames"
46+
and "allocator for everything else." There is one query and one
47+
answer.
48+
49+
The frame allocator must be a separate channel:
50+
51+
- Not queryable from user code
52+
- Not in the environment
53+
- Read only by `promise_type::operator new`
54+
- Propagated through TLS, restored at resume points
55+
56+
The environment's `get_allocator` remains what it is: a
57+
general-purpose allocator for application use.
58+
59+
---
60+
61+
## 2. `with_frame_allocator`
62+
63+
The previous document's Section 3.3 stored the frame allocator
64+
from the environment at connect time. Replace that. The frame
65+
allocator is established at the launch site through TLS and
66+
recovered from the frame itself, never from the environment.
67+
68+
### 2.1 The Algorithm
69+
70+
```cpp
71+
auto with_frame_allocator(std::pmr::memory_resource* mr) {
72+
detail::set_frame_allocator(mr);
73+
return [](auto task) {
74+
detail::set_frame_allocator(nullptr);
75+
return std::move(task);
76+
};
77+
}
78+
```
79+
80+
Usage:
81+
82+
```cpp
83+
with_frame_allocator(&pool)(my_server(sock))
84+
```
85+
86+
### 2.2 Why This Is Safe
87+
88+
C++17 [expr.call] p5: "The postfix-expression is sequenced
89+
before each expression in the expression-list."
90+
91+
In `with_frame_allocator(&pool)(my_server(sock))`:
92+
93+
1. `with_frame_allocator(&pool)` executes first - sets TLS,
94+
returns a callable
95+
2. `my_server(sock)` executes second - `operator new` reads
96+
TLS, frame allocated with the pool
97+
3. The callable is invoked with the task - clears TLS, returns
98+
the task as-is
99+
100+
TLS is set and cleared within a single expression. No guard to
101+
misuse. No way to forget to clear it.
102+
103+
### 2.3 The Change from the Previous Document
104+
105+
The previous document's Section 3.3 said: at connect time, read
106+
`get_allocator(get_env(receiver))` and store it in the promise.
107+
This is replaced.
108+
109+
The promise recovers the frame allocator from its own frame. The
110+
`operator new` in Section 3.2 of the previous document already
111+
stashes the `memory_resource*` at the end of the frame. The
112+
promise reads it back:
113+
114+
```cpp
115+
std::pmr::memory_resource* recover_frame_allocator() noexcept {
116+
auto* self = coroutine_handle<promise_type>::from_promise(*this)
117+
.address();
118+
std::pmr::memory_resource* mr;
119+
std::memcpy(&mr,
120+
static_cast<char*>(self) + frame_size_,
121+
sizeof(mr));
122+
return mr;
123+
}
124+
```
125+
126+
This is the value that the resume-point restoration (Section 3.4
127+
of the previous document) writes to TLS. It comes from the frame,
128+
not from the environment.
129+
130+
The environment is not involved. `get_allocator` in the
131+
environment is untouched. The two allocators are independent.
132+
133+
---
134+
135+
## 3. What the User Sees
136+
137+
Frame allocator only:
138+
139+
```cpp
140+
std::pmr::monotonic_buffer_resource pool;
141+
142+
ex::sync_wait(
143+
with_frame_allocator(&pool)(my_server(sock)));
144+
```
145+
146+
Frame allocator plus general-purpose environment allocator:
147+
148+
```cpp
149+
std::pmr::monotonic_buffer_resource frame_pool;
150+
std::pmr::polymorphic_allocator general_alloc(&app_pool);
151+
152+
ex::sync_wait(
153+
ex::write_env(
154+
with_frame_allocator(&frame_pool)(my_server(sock)),
155+
ex::env{ex::prop{ex::get_allocator, general_alloc}}));
156+
```
157+
158+
Two allocators. Two channels. Independent. The coroutine chain
159+
is unaware of both:
160+
161+
```cpp
162+
ex::task<void> my_server(socket& sock) {
163+
for (;;) {
164+
auto conn = co_await accept(sock);
165+
co_await handle_connection(conn);
166+
}
167+
}
168+
169+
ex::task<void> handle_connection(connection& conn) {
170+
auto req = co_await read_request(conn);
171+
auto resp = process(req);
172+
co_await write_response(conn, resp);
173+
}
174+
```
175+
176+
Every frame in the chain uses the frame pool. If any coroutine
177+
needs the general-purpose allocator for application logic, it
178+
queries the environment:
179+
180+
```cpp
181+
auto alloc = co_await ex::read_env(ex::get_allocator);
182+
std::pmr::vector<char> buf(alloc);
183+
```
184+
185+
The frame allocator is not reachable from this query. It cannot
186+
be misused.
187+
188+
---
189+
190+
## 4. Swappable Implementations
191+
192+
Because `with_frame_allocator` takes a `memory_resource*`, the
193+
user is never locked into a particular frame allocator
194+
implementation. A recycling allocator with size-class buckets is
195+
optimal for coroutine frames - Capy ships one:
196+
197+
https://github.com/cppalliance/capy/blob/18c30d2197ac8804f9426005576d4f9b80a76135/include/boost/capy/ex/recycling_memory_resource.hpp
198+
199+
It uses power-of-two size classes (64 to 2048 bytes), a
200+
thread-local pool for lock-free fast-path allocation, and a
201+
global pool with a mutex for cross-thread block sharing.
202+
Allocations above 2048 bytes bypass the pools. This is the kind
203+
of allocator that exploits the narrow frame allocation pattern -
204+
and the kind that breaks if arbitrary non-frame allocations
205+
pollute it.
206+
207+
But the user can swap in any `memory_resource*` they want:
208+
209+
```cpp
210+
// recycling allocator (production)
211+
with_frame_allocator(get_recycling_memory_resource())(my_server(sock))
212+
213+
// monotonic buffer (testing, bounded memory)
214+
std::pmr::monotonic_buffer_resource buf;
215+
with_frame_allocator(&buf)(my_server(sock))
216+
217+
// tracking allocator (debugging)
218+
tracking_memory_resource tracker;
219+
with_frame_allocator(&tracker)(my_server(sock))
220+
```
221+
222+
One call site, any strategy. The coroutine chain does not change.
223+
224+
---
225+
226+
## 5. Extensibility
227+
228+
How third-party task authors participate in frame allocator
229+
propagation is your design space. The mechanism is in
230+
`promise_type::operator new` - any task type that reads TLS in
231+
its `operator new` and restores TLS at resume points
232+
participates. You can expose the TLS accessors, define a
233+
concept, or leave it as implementation detail.
234+
235+
P4003R0 Section 5 shows one approach. It is not the only one.

0 commit comments

Comments
 (0)