|
| 1 | +--- |
| 2 | +title: "Why Span Is Not Enough" |
| 3 | +document: D4036R0 |
| 4 | +date: 2026-06-15 |
| 5 | +reply-to: |
| 6 | + - "Vinnie Falco <vinnie.falco@gmail.com>" |
| 7 | +audience: LEWG |
| 8 | +--- |
| 9 | + |
| 10 | +## Abstract |
| 11 | + |
| 12 | +C++ has bytes. A contiguous region of bytes needs a type. A sequence of such regions needs another. This paper examines the types that predictably come to mind, and their consequences. |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +## Revision History |
| 17 | + |
| 18 | +### R0: June 2026 |
| 19 | + |
| 20 | +- Initial version. |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## Disclosure |
| 25 | + |
| 26 | +The author maintains [Boost.Beast](https://github.com/boostorg/beast)<sup>[7]</sup>, a published HTTP and WebSocket library built on [Boost.Asio](https://www.boost.org/doc/libs/release/doc/html/boost_asio.html)<sup>[1]</sup>'s buffer model, and develops [Capy, Corosio, Http, Beast2, and Burl](https://github.com/cppalliance)<sup>[10]</sup> - libraries that define or consume buffer abstractions. The author published [P4003R0](https://wg21.link/p4003r0)<sup>[8]</sup>. The author holds a neutral position on the Networking TS (changed from positive). This body of work creates a bias toward dedicated buffer types. Such types have costs: one more vocabulary type to learn, and interoperability friction with code that uses raw `span<byte>`. |
| 27 | + |
| 28 | +## Credit Where Due |
| 29 | + |
| 30 | +`std::span` is a well-established vocabulary type. It turns a pointer and a size into a single thing. Perfectly. The vocabulary need is profound and this paper does not propose to diminish it. |
| 31 | + |
| 32 | +The question is whether `span` is also the right vocabulary for I/O buffer descriptors. |
| 33 | + |
| 34 | +## 1. `span<byte>` |
| 35 | + |
| 36 | +Question. How do we represent a contiguous region of bytes? |
| 37 | + |
| 38 | +Answer. `span<byte>`. A pointer and a size. That works. |
| 39 | + |
| 40 | +However... |
| 41 | + |
| 42 | +Platform I/O requires an array, not one region: |
| 43 | + |
| 44 | +| Platform | Descriptor | Used By | |
| 45 | +| -------- | --------------- | -------------------------------------------------- | |
| 46 | +| POSIX | `struct iovec` | `readv()` / `writev()` | |
| 47 | +| POSIX | `struct msghdr` | `sendmsg()` / `recvmsg()` | |
| 48 | +| Windows | `WSABUF` | `WSARecv()` / `WSASend()` | |
| 49 | +| Windows | `FILE_SEGMENT_ELEMENT` | `ReadFileScatter()` / `WriteFileGather()` | |
| 50 | +| Linux | `struct iovec` | `io_uring_prep_readv()` / `io_uring_prep_writev()` | |
| 51 | + |
| 52 | +`span<byte>` can describe one region. Wrap it in a one-element array and `readv()` accepts it. But I/O rarely involves a single contiguous region. A message has a header and a body. A protocol has framing and payload. Sending two regions with `write()` means two syscalls. Sending them with `writev()` means one - this is scatter/gather I/O. `span<byte>` is an insufficient type for representing an array of buffers. |
| 53 | + |
| 54 | +## 2. `span<span<byte>>` |
| 55 | + |
| 56 | +Question. How do we represent several such regions? |
| 57 | + |
| 58 | +Answer. A span of spans. `span<span<byte>>`. |
| 59 | + |
| 60 | +A single buffer is a view of someone's data. The bytes exist somewhere - in a `vector`, in a memory-mapped page, in a stack array. The buffer borrows them. Non-owning is natural. The data has a natural owner elsewhere. |
| 61 | + |
| 62 | +A buffer sequence is different. Nobody "naturally" has an array of `span<byte>` objects lying around. The sequence is an assembled grouping - a data structure constructed to collect regions together. Making it non-owning means the grouping itself cannot be stored, returned, or passed across an asynchronous boundary. |
| 63 | + |
| 64 | +```cpp |
| 65 | +class message { |
| 66 | + span<span<byte>> buffers_; // borrows... what? |
| 67 | +}; |
| 68 | + |
| 69 | +span<span<byte>> prepare_message(span<byte> hdr, span<byte> body) { |
| 70 | + span<byte> bufs[] = { hdr, body }; |
| 71 | + return { bufs }; // dangling |
| 72 | +} |
| 73 | + |
| 74 | +void start_send(socket& s, span<byte> hdr, span<byte> body) { |
| 75 | + span<byte> bufs[] = { hdr, body }; |
| 76 | + s.async_send(span<span<byte>>(bufs), callback); |
| 77 | + // returns immediately; bufs destroyed; dangling |
| 78 | +} |
| 79 | +``` |
| 80 | +
|
| 81 | +## 3. `range<span<byte>>` |
| 82 | +
|
| 83 | +Question. How do we own a collection of byte regions? |
| 84 | +
|
| 85 | +Answer. Use a range. `vector<span<byte>>`, `array<span<byte>, N>`, any range whose value type is `span<byte>`. |
| 86 | +
|
| 87 | +Ranges solve the ownership problem: a `vector` owns its elements. |
| 88 | +
|
| 89 | +Ranges create a byte consumption problem. Consider a JSON stream arriving in two chunks: |
| 90 | +
|
| 91 | +```cpp |
| 92 | +// chunk 1 [100 bytes]: {"name":"Alice","age":30}{"name":"B |
| 93 | +// chunk 2 [100 bytes]: ob","age":25}... |
| 94 | +
|
| 95 | +range<span<byte>> input = { chunk1, chunk2 }; |
| 96 | +
|
| 97 | +// parser finds first complete object: {"name":"Alice","age":30} |
| 98 | +// that is 26 bytes - consume them |
| 99 | +// |
| 100 | +// views::drop(input, 1) drops all of chunk1 (100 bytes) - too much |
| 101 | +// views::drop(input, 0) drops nothing - too little |
| 102 | +// no standard range operation removes exactly 26 bytes |
| 103 | +``` |
| 104 | + |
| 105 | +The parse boundary (26 bytes) does not align with the buffer boundary (100 bytes). Consuming 26 bytes means advancing chunk 1 by 26 bytes - 74 remain - without touching chunk 2. No range adaptor does this. [`std::ranges`](https://eel.is/c++draft/ranges)<sup>[5]</sup> operates on elements. Parsing operates on bytes, not elements. |
| 106 | + |
| 107 | +Incremental parsers with this need - JSON, XML, CSV, protobuf - go unserved. |
| 108 | + |
| 109 | +## 4. `byte` |
| 110 | + |
| 111 | +Question. What if we add byte-level algorithms to a range of `span<byte>`? |
| 112 | + |
| 113 | +Answer. The range is fine for ownership and iteration. The element type is not. |
| 114 | + |
| 115 | +`span<byte>` already serves too many needs: serialization, cryptography, hashing, memory-mapped regions. If buffer sequences also use `span<byte>`, the type system cannot distinguish a buffer from any other byte span. A concept, an overload, or a constraint that separates "buffer in a sequence" from "hash input" or "encryption key" is impossible to write. |
| 116 | + |
| 117 | +### Boost.Asio |
| 118 | + |
| 119 | +A separate type enables run-time safety checks: |
| 120 | + |
| 121 | +| Capability | Asio `mutable_buffer`<sup>[1]</sup> | `span<byte>` | |
| 122 | +| ---------------------------------- | ------------------------------------ | ------------ | |
| 123 | +| Implementation-defined members | Possible | Closed | |
| 124 | +| Detect dangling after reallocation | Possible | No | |
| 125 | +| Future diagnostic aids | Possible | No | |
| 126 | +| Conditional debug callback | `BOOST_ASIO_ENABLE_BUFFER_DEBUGGING` | No | |
| 127 | + |
| 128 | +Each time `span<byte>` appears in a function signature, it loses the safety capability. |
| 129 | + |
| 130 | +## 5. Six Ecosystems Already Arrived Here |
| 131 | + |
| 132 | +Six I/O ecosystems, designed independently, all arrived at similar solutions: |
| 133 | + |
| 134 | +| Ecosystem | Buffer Type | Layout | |
| 135 | +| --------- | --------------------------------- | --------------------------------------------------------------- | |
| 136 | +| POSIX | `iovec` | `void*` + `size_t` | |
| 137 | +| Windows | `WSABUF` | `ULONG` + `char*` | |
| 138 | +| Asio | `const_buffer` / `mutable_buffer` | `void const*` + `size_t`, with range concepts<sup>[1]</sup> | |
| 139 | +| libuv | `uv_buf_t` | `char*` + `size_t`<sup>[2]</sup> | |
| 140 | +| Go | `net.Buffers` | scatter/gather over `[][]byte`<sup>[3]</sup> | |
| 141 | +| .NET | `ReadOnlySequence<T>` | linked list of discontiguous `Memory<T>` segments<sup>[4]</sup> | |
| 142 | + |
| 143 | +Everybody converged on custom types independently. |
| 144 | + |
| 145 | +## 6. The Final Straw |
| 146 | + |
| 147 | +The committee already endorsed this principle. |
| 148 | + |
| 149 | +[P0298R3](https://wg21.link/p0298r3)<sup>[6]</sup> introduced `std::byte` because `unsigned char` performed triple duty. Neil MacIntosh wrote: |
| 150 | + |
| 151 | +> "these types perform a 'triple duty'. Not only are they used for byte addressing, but also as arithmetic types, and as character types. This multiplicity of roles opens the door for programmer error"<sup>[6]</sup> |
| 152 | + |
| 153 | +> "The key motivation here is to make byte a distinct type - to improve program safety by leveraging the type system."<sup>[6]</sup> |
| 154 | + |
| 155 | +`unsigned char` had the right size and alignment. The committee added `std::byte` anyway - same size, same alignment, but no arithmetic, no implicit conversions. The generic type's operations did not match the domain. The committee restricted the interface. |
| 156 | +
|
| 157 | +`span<byte>` performs double duty - general-purpose byte view and I/O buffer descriptor. A bespoke type restricts the interface to `data()` and `size()`. Same principle, one level of abstraction higher. |
| 158 | +
|
| 159 | +**The precise fit is bespoke.** |
| 160 | +
|
| 161 | +### Almost There |
| 162 | +
|
| 163 | +`std::byte` kept the shift operators despite the stated goal of removing arithmetic. The principle was right. The execution left a gap. |
| 164 | +
|
| 165 | +## 7. Finally Correct |
| 166 | +
|
| 167 | +New buffer types give us the principled option. Only what we need: `data()` and `size()`. |
| 168 | +
|
| 169 | +### `void*`, Not `byte*` |
| 170 | +
|
| 171 | +`void*` is maximally accepting and minimally permissive. Any pointer converts to it implicitly. The user must perform an explicit cast to go back. The asymmetry is by design. |
| 172 | +
|
| 173 | +| Risk | `void*` | `byte*` | Cost | |
| 174 | +| --------------------------- | ------- | ------- | ------------------------------------------ | |
| 175 | +| Requires `reinterpret_cast` | No | Yes | Invites superfluous casts | |
| 176 | +| Dereferenceable | No | Yes | Invites accidental access | |
| 177 | +| Pointer arithmetic | No | Yes | Invites accidental arithmetic | |
| 178 | +| Assignable to `span<byte>` | No | Yes | Invites full span API misuse | |
| 179 | +| Promises byte-level meaning | No | Yes | Invites false type assertions | |
| 180 | +| Contradicts type erasure | No | Yes | Invites type erasure violations | |
| 181 | +| C++17 only | No | Yes | Disinvites C users | |
| 182 | +
|
| 183 | +### A Buffer Sequence Is Distinct |
| 184 | +
|
| 185 | +Buffer sequences are not served by existing concepts. They are a new concept. |
| 186 | +
|
| 187 | +### What the Standard Needs |
| 188 | +
|
| 189 | +- A read-only byte region type (`void const*` + `size_t`) |
| 190 | +- A writable byte region type (`void*` + `size_t`) |
| 191 | +- Concepts for sequences of read-only and writable byte regions |
| 192 | +- Algorithms: total byte count, byte-granular slicing, copy between buffer sequences |
| 193 | +
|
| 194 | +The types already exist: |
| 195 | +
|
| 196 | +```cpp |
| 197 | +class mutable_buffer { |
| 198 | + unsigned char* p_ = nullptr; |
| 199 | + std::size_t n_ = 0; |
| 200 | +public: |
| 201 | + mutable_buffer() = default; |
| 202 | + mutable_buffer(mutable_buffer const&) = default; |
| 203 | + mutable_buffer& operator=(mutable_buffer const&) = default; |
| 204 | + constexpr mutable_buffer(void* data, std::size_t size) noexcept |
| 205 | + : p_(static_cast<unsigned char*>(data)), n_(size) { } |
| 206 | + constexpr void* data() const noexcept { return p_; } |
| 207 | + constexpr std::size_t size() const noexcept { return n_; } |
| 208 | +}; |
| 209 | +
|
| 210 | +class const_buffer { |
| 211 | + unsigned char const* p_ = nullptr; |
| 212 | + std::size_t n_ = 0; |
| 213 | +public: |
| 214 | + const_buffer() = default; |
| 215 | + const_buffer(const_buffer const&) = default; |
| 216 | + const_buffer& operator=(const_buffer const& other) = default; |
| 217 | + constexpr const_buffer(void const* data, std::size_t size) noexcept |
| 218 | + : p_(static_cast<unsigned char const*>(data)), n_(size) { } |
| 219 | + constexpr const_buffer(mutable_buffer const& b) noexcept |
| 220 | + : p_(static_cast<unsigned char const*>(b.data())), n_(b.size()) { } |
| 221 | + constexpr void const* data() const noexcept { return p_; } |
| 222 | + constexpr std::size_t size() const noexcept { return n_; } |
| 223 | +}; |
| 224 | +``` |
| 225 | +
|
| 226 | +These are the [Networking TS](https://wg21.link/n4771)<sup>[9]</sup> types. |
| 227 | +
|
| 228 | +## 8. Side by Side |
| 229 | +
|
| 230 | +| Task | `span<byte>` | `mutable_buffer` | |
| 231 | +| --------------------- | ---------------------------------------------------- | ------------------------------------ | |
| 232 | +| Construct from vector | `span<byte>{reinterpret_cast<byte*>(v.data()), ...}` | `mutable_buffer{v.data(), v.size()}` | |
| 233 | +| Consume N bytes | `buf = span<byte>{buf.data() + n, buf.size() - n}` | `buf += n` | |
| 234 | +| Detect dangling | Requires ABI Break | *see-below* | |
| 235 | +
|
| 236 | +Safety feature: |
| 237 | +
|
| 238 | +```cpp |
| 239 | +class mutable_buffer { |
| 240 | + void* p_ = nullptr; |
| 241 | + size_t n_ = 0; |
| 242 | + void(*check_)() = nullptr; |
| 243 | +public: |
| 244 | + void* data() const { if(check_) check_(); return p_; } |
| 245 | + size_t size() const noexcept { return n_; } |
| 246 | +}; |
| 247 | +``` |
| 248 | +
|
| 249 | +Smaller to write, safer to use, open to diagnostics. |
| 250 | +
|
| 251 | +## 9. But |
| 252 | +
|
| 253 | +### But this is standardizing Asio's types |
| 254 | + |
| 255 | +Yes. They earn their keep. |
| 256 | + |
| 257 | +### But `vector<span<byte>>` is enough |
| 258 | + |
| 259 | +Users opt out of types which do not let them opt out of allocations. |
| 260 | + |
| 261 | +### But `mdspan` is enough |
| 262 | + |
| 263 | +Buffer sequences only need one dimension. [`mdspan`](https://eel.is/c++draft/mdspan.overview)<sup>[5]</sup> provides several. |
| 264 | + |
| 265 | +### But `span<void>` is enough |
| 266 | + |
| 267 | +Even if `span<void>` were possible, what remains after removing the impossible is `data()` and `size()`. That is just a less-capable `mutable_buffer`. |
| 268 | + |
| 269 | +### But `span<byte>` is enough |
| 270 | + |
| 271 | +`span<byte>` is also a less-capable `mutable_buffer`. It is `span<void>` with added harm. |
| 272 | + |
| 273 | +## Suggested Straw Poll |
| 274 | + |
| 275 | +> LEWG agrees that a contiguous byte region descriptor for I/O should be a dedicated type, not `span<byte>`. |
| 276 | + |
| 277 | +--- |
| 278 | + |
| 279 | +# Acknowledgements |
| 280 | + |
| 281 | +The buffer model described here draws on twenty years of Asio's buffer sequence abstractions, due to Chris Kohlhoff. |
| 282 | +
|
| 283 | +--- |
| 284 | +
|
| 285 | +## References |
| 286 | +
|
| 287 | +1. [Boost.Asio](https://www.boost.org/doc/libs/release/doc/html/boost_asio.html) - Buffer types and buffer sequence requirements (Chris Kohlhoff). https://www.boost.org/doc/libs/release/doc/html/boost_asio.html |
| 288 | +2. [libuv](https://docs.libuv.org/en/v1.x/) - `uv_buf_t` buffer type. https://docs.libuv.org/en/v1.x/ |
| 289 | +3. [Go standard library](https://pkg.go.dev/) - `net.Buffers` (https://pkg.go.dev/net#Buffers). https://pkg.go.dev/ |
| 290 | +4. [.NET System.IO.Pipelines](https://learn.microsoft.com/en-us/dotnet/api/system.io.pipelines) - `ReadOnlySequence<T>`. https://learn.microsoft.com/en-us/dotnet/api/system.io.pipelines |
| 291 | +5. [C++ Working Draft](https://eel.is/c++draft/) - `span` ([span.overview](https://eel.is/c++draft/span.overview)), `mdspan` ([mdspan.overview](https://eel.is/c++draft/mdspan.overview)), ranges ([ranges](https://eel.is/c++draft/ranges)). https://eel.is/c++draft/ |
| 292 | +6. [P0298R3](https://wg21.link/p0298r3) - A byte type definition (Neil MacIntosh). https://wg21.link/p0298r3 |
| 293 | +7. [Boost.Beast](https://github.com/boostorg/beast) - HTTP and WebSocket built on Boost.Asio (Vinnie Falco). https://github.com/boostorg/beast |
| 294 | +8. [P4003R0](https://wg21.link/p4003r0) - (Vinnie Falco). https://wg21.link/p4003r0 |
| 295 | +9. [N4771](https://wg21.link/n4771) - Working Draft, C++ Extensions for Networking. https://wg21.link/n4771 |
| 296 | +10. [C++ Alliance](https://github.com/cppalliance) - Capy, Corosio, Http, Beast2, Burl (Vinnie Falco). https://github.com/cppalliance |
0 commit comments