|
| 1 | +.. SPDX-License-Identifier: MIT OR Apache-2.0 |
| 2 | +.. SPDX-FileCopyrightText: The Coding Guidelines Subcommittee Contributors |
| 3 | + |
| 4 | +.. default-domain:: coding-guidelines |
| 5 | + |
| 6 | +.. guideline:: Do not read from union fields that may contain uninitialized bytes |
| 7 | + :id: gui_6JSM7YE7a1KR |
| 8 | + :category: required |
| 9 | + :status: draft |
| 10 | + :release: 1.85.0 |
| 11 | + :fls: fls_6lg0oaaopc26 |
| 12 | + :decidability: undecidable |
| 13 | + :scope: expression |
| 14 | + :tags: unions, initialization, undefined-behavior |
| 15 | + |
| 16 | + Do not read from a union field unless all bytes of that field have been explicitly initialized. |
| 17 | + Partial initialization of a union's composite field leaves some bytes in an uninitialized state, |
| 18 | + and reading those bytes is undefined behavior. |
| 19 | +
|
| 20 | + When working with unions: |
| 21 | + |
| 22 | + * Initialize all bytes of a field before reading from it |
| 23 | + * Do not assume that initializing one variant preserves the initialized state of another |
| 24 | + * Do not rely on prior initialization of a union before reassignment |
| 25 | + * Use ``MaybeUninit`` with proper initialization patterns rather than custom unions for |
| 26 | + managing uninitialized memory |
| 27 | +
|
| 28 | + You can access a field of a union even when the backing bytes of that field are uninitialized provided that: |
| 29 | +
|
| 30 | + - The resulting value has an unspecified but well-defined bit pattern. |
| 31 | + - Interpreting that value must still comply with the requirements of the accessed type |
| 32 | + (e.g., no invalid enum discriminants, no invalid pointer values, etc.). |
| 33 | +
|
| 34 | + For example, reading an uninitialized ``u32`` field of a union is allowed; |
| 35 | + reading an uninitialized bool field is disallowed because not all bit patterns are valid. |
| 36 | +
|
| 37 | + .. rationale:: |
| 38 | + :id: rat_fhrmX0yFIL0L |
| 39 | + :status: draft |
| 40 | +
|
| 41 | + Unions in Rust allow multiple fields to share the same memory. |
| 42 | + When a union field is a composite type (tuple, struct, array), |
| 43 | + writing to only some components leaves the remaining bytes in an indeterminate state. |
| 44 | + Reading these uninitialized bytes is undefined behavior :cite:`gui_6JSM7YE7a1KR:RUST-REF-UB`. |
| 45 | +
|
| 46 | + This issue is particularly insidious because: |
| 47 | +
|
| 48 | + * **Silent data corruption**: The program may appear to work, reading stale or |
| 49 | + garbage values that happen to be *reasonable* in testing. |
| 50 | +
|
| 51 | + * **Optimization interactions**: The compiler may merge, inline, or deduplicate |
| 52 | + functions in ways that change which code paths execute. |
| 53 | + A function that fully initializes a union may be merged with one that partially initializes it, |
| 54 | + causing UB to appear in previously-safe code paths :cite:`gui_6JSM7YE7a1KR:LLVM-MERGE`. |
| 55 | +
|
| 56 | + * **Function pointer comparisons**: Relying on function pointer equality to |
| 57 | + select code paths is unreliable. |
| 58 | + Combined with partial initialization, |
| 59 | + this can lead to UB being introduced through seemingly unrelated optimizations. |
| 60 | +
|
| 61 | + * **Reassignment resets initialization**: Assigning a new value to a union |
| 62 | + (e.g., ``*u = MyUnion { uninit: () }``) does not preserve the initialized state of other fields. |
| 63 | + All fields must be considered uninitialized after such an assignment. |
| 64 | +
|
| 65 | + * **Nested partial initialization**: When a union variant contains a |
| 66 | + ``struct``, initializing only one field of that ``struct`` leaves the remaining |
| 67 | + fields uninitialized. |
| 68 | + The compiler does not warn about the uninitialized fields within the nested ``struct``. |
| 69 | +
|
| 70 | + Fields of a struct can be individually accessed using a raw pointer. |
| 71 | + Reading the entire struct, or forming a reference to that struct, |
| 72 | + requires that all fields be initialized before a typed read occurs. |
| 73 | +
|
| 74 | + The sole exception is that unions work like C unions: |
| 75 | + any union field may be read, even if it was never written. |
| 76 | + The resulting bytes must, however, form a valid representation for the field's type, |
| 77 | + which is not guaranteed if the union contains arbitrary data. |
| 78 | + |
| 79 | + .. non_compliant_example:: |
| 80 | + :id: non_compl_ex_kJEoz8oh6Fig |
| 81 | + :status: draft |
| 82 | + |
| 83 | + This noncompliant example partially initializes a tuple field, leaving the second element uninitialized. |
| 84 | + |
| 85 | + .. rust-example:: |
| 86 | + :miri: expect_ub |
| 87 | + :warn: allow |
| 88 | + |
| 89 | + union MyMaybeUninit { |
| 90 | + uninit: (), |
| 91 | + init: (u8, u8), |
| 92 | + } |
| 93 | + |
| 94 | + fn foo() { |
| 95 | + let mut a = MyMaybeUninit { uninit: () }; |
| 96 | + a.init.0 = 1; // Only initializes the first byte |
| 97 | + |
| 98 | + // Undefined behavior reading uninitialized value |
| 99 | + println!("{}", unsafe { a.init.1 }); // noncompliant |
| 100 | + } |
| 101 | + |
| 102 | + fn main() { |
| 103 | + foo(); |
| 104 | + } |
| 105 | + |
| 106 | + .. non_compliant_example:: |
| 107 | + :id: non_compl_ex_gE095eyVJizR |
| 108 | + :status: draft |
| 109 | + |
| 110 | + This noncompliant example assumes prior initialization is preserved after reassignment. |
| 111 | + |
| 112 | + .. rust-example:: |
| 113 | + :miri: expect_ub |
| 114 | + |
| 115 | + union Data { |
| 116 | + uninit: (), |
| 117 | + init: (u8, u8), |
| 118 | + } |
| 119 | + |
| 120 | + fn reassign(d: &mut Data) { |
| 121 | + // Reassignment invalidates all prior initialization |
| 122 | + *d = Data { uninit: () }; |
| 123 | + } |
| 124 | + |
| 125 | + fn foo() { |
| 126 | + let mut d = Data { init: (0, 0) }; |
| 127 | + reassign(&mut d); |
| 128 | + |
| 129 | + // 'init' is uninitialized after reassignment |
| 130 | + println!("{}", unsafe { d.init.1 }); // noncompliant |
| 131 | + } |
| 132 | + |
| 133 | + fn main() { |
| 134 | + foo(); |
| 135 | + } |
| 136 | + |
| 137 | + .. non_compliant_example:: |
| 138 | + :id: non_compl_ex_BAHKbKIgDFnY |
| 139 | + :status: draft |
| 140 | + |
| 141 | + This noncompliant example combines function pointer comparison with partial initialization, |
| 142 | + creating subtle undefined behavior that may only manifest after optimization. |
| 143 | + |
| 144 | + Note: this example relies on optimizer behavior (function merging can make |
| 145 | + pointer equality succeed). Miri runs without those optimizations, so the |
| 146 | + UB path is not deterministic there. |
| 147 | + |
| 148 | + .. rust-example:: |
| 149 | + :miri: skip |
| 150 | + |
| 151 | + union MyMaybeUninit { |
| 152 | + uninit: (), |
| 153 | + init: (u8, u8), |
| 154 | + } |
| 155 | + |
| 156 | + fn write_first(a: &mut MyMaybeUninit) { |
| 157 | + *a = MyMaybeUninit { uninit: () }; |
| 158 | + a.init.0 = 1; |
| 159 | + } |
| 160 | + |
| 161 | + fn write_both(a: &mut MyMaybeUninit) { |
| 162 | + *a = MyMaybeUninit { uninit: () }; |
| 163 | + a.init.0 = 1; |
| 164 | + a.init.1 = 2; |
| 165 | + } |
| 166 | + |
| 167 | + fn main() { |
| 168 | + let mut a = MyMaybeUninit { init: (0, 0) }; |
| 169 | + |
| 170 | + if write_first as usize == write_both as usize { |
| 171 | + write_first(&mut a); |
| 172 | + } |
| 173 | + |
| 174 | + // UB if the branch was taken (functions may be merged by optimizer) |
| 175 | + println!("{}", unsafe { a.init.1 }); // noncompliant |
| 176 | + } |
| 177 | + |
| 178 | + .. compliant_example:: |
| 179 | + :id: compl_ex_JAR0OI9S07kf |
| 180 | + :status: draft |
| 181 | + |
| 182 | + This compliant examples initializes all bytes of the field before reading. |
| 183 | + |
| 184 | + .. rust-example:: |
| 185 | + :miri: |
| 186 | + |
| 187 | + union MyMaybeUninit { |
| 188 | + uninit: (), |
| 189 | + init: (u8, u8), |
| 190 | + } |
| 191 | + |
| 192 | + fn write_both(a: &mut MyMaybeUninit) { |
| 193 | + *a = MyMaybeUninit { uninit: () }; |
| 194 | + a.init.0 = 1; |
| 195 | + a.init.1 = 2; // Initialize all bytes |
| 196 | + } |
| 197 | + |
| 198 | + fn main() { |
| 199 | + let mut a = MyMaybeUninit { init: (0, 0) }; |
| 200 | + write_both(&mut a); |
| 201 | + |
| 202 | + // Both bytes are initialized |
| 203 | + println!("{}", unsafe { a.init.1 }); // compliant |
| 204 | + } |
| 205 | + |
| 206 | + .. compliant_example:: |
| 207 | + :id: compl_ex_ko80pT9aS8Ge |
| 208 | + :status: draft |
| 209 | + |
| 210 | + This compliant example uses ``MaybeUninit`` with proper initialization patterns. |
| 211 | + |
| 212 | + .. rust-example:: |
| 213 | + :miri: |
| 214 | + |
| 215 | + use std::mem::MaybeUninit; |
| 216 | + |
| 217 | + fn init_tuple() -> (u8, u8) { |
| 218 | + let mut data: MaybeUninit<(u8, u8)> = MaybeUninit::uninit(); |
| 219 | + |
| 220 | + unsafe { |
| 221 | + let ptr = data.as_mut_ptr(); |
| 222 | + (*ptr).0 = 1; |
| 223 | + (*ptr).1 = 2; // Initialize all fields |
| 224 | + // data is fully initialized before call to 'assume_init' |
| 225 | + data.assume_init() |
| 226 | + } |
| 227 | + } |
| 228 | + |
| 229 | + fn main() { |
| 230 | + let result = init_tuple(); |
| 231 | + println!("{}, {}", result.0, result.1); // compliant |
| 232 | + } |
| 233 | + |
| 234 | + .. compliant_example:: |
| 235 | + :id: compl_ex_xnanwe9eU5p5 |
| 236 | + :status: draft |
| 237 | + |
| 238 | + This compliant example initializes through the composite field directly. |
| 239 | + |
| 240 | + .. rust-example:: |
| 241 | + :miri: |
| 242 | + |
| 243 | + union Data { |
| 244 | + raw: [u8; 4], |
| 245 | + value: u32, |
| 246 | + } |
| 247 | + |
| 248 | + fn full_init(d: &mut Data) { |
| 249 | + // Initialize entire field at once |
| 250 | + *d = Data { raw: [0xAB, 0xCD, 0xEF, 0x12] }; |
| 251 | + } |
| 252 | + |
| 253 | + fn main() { |
| 254 | + let mut d = Data { value: 0 }; |
| 255 | + full_init(&mut d); |
| 256 | + |
| 257 | + // All bytes in 'd' are initialized |
| 258 | + println!("{:?}", unsafe { d.raw }); // compliant |
| 259 | + } |
| 260 | + |
| 261 | + .. compliant_example:: |
| 262 | + :id: compl_ex_gdh48eGNdS7e |
| 263 | + :status: draft |
| 264 | + |
| 265 | + This compliant example avoids relying on function pointer comparisons. |
| 266 | + |
| 267 | + .. rust-example:: |
| 268 | + :miri: |
| 269 | + |
| 270 | + union MyMaybeUninit { |
| 271 | + uninit: (), |
| 272 | + init: (u8, u8), |
| 273 | + } |
| 274 | + |
| 275 | + #[allow(dead_code)] |
| 276 | + enum InitLevel { |
| 277 | + Partial, |
| 278 | + Full, |
| 279 | + } |
| 280 | + |
| 281 | + fn write_first(a: &mut MyMaybeUninit) { |
| 282 | + *a = MyMaybeUninit { uninit: () }; |
| 283 | + a.init.0 = 1; |
| 284 | + } |
| 285 | + |
| 286 | + fn write_both(a: &mut MyMaybeUninit) { |
| 287 | + *a = MyMaybeUninit { uninit: () }; |
| 288 | + a.init.0 = 1; |
| 289 | + a.init.1 = 2; |
| 290 | + } |
| 291 | + |
| 292 | + fn main() { |
| 293 | + let mut a = MyMaybeUninit { init: (0, 0) }; |
| 294 | + let level = InitLevel::Full; // Explicit tracking, not pointer comparison |
| 295 | + |
| 296 | + match level { |
| 297 | + InitLevel::Full => { |
| 298 | + write_both(&mut a); |
| 299 | + // Compliant: safe to read both fields |
| 300 | + println!("{}", unsafe { a.init.1 }); |
| 301 | + } |
| 302 | + InitLevel::Partial => { |
| 303 | + write_first(&mut a); |
| 304 | + // Only read the initialized field |
| 305 | + println!("{}", unsafe { a.init.0 }); |
| 306 | + } |
| 307 | + } |
| 308 | + } |
| 309 | + |
| 310 | + .. compliant_example:: |
| 311 | + :id: compl_ex_EU7kO0DtkJxs |
| 312 | + :status: draft |
| 313 | + |
| 314 | + Types such as ``u8``, ``u16``, ``u32``, and ``i128`` allow all possible bit patterns. |
| 315 | + Provided the memory is initialized, there is no undefined behavior. |
| 316 | + |
| 317 | + .. rust-example:: |
| 318 | + :miri: |
| 319 | + |
| 320 | + union U { |
| 321 | + n: u32, |
| 322 | + bytes: [u8; 4], |
| 323 | + } |
| 324 | + |
| 325 | + fn main() { |
| 326 | + let u = U { bytes: [0xFF, 0xEE, 0xDD, 0xCC] }; |
| 327 | + println!("{}", unsafe { u.n }); // OK — all bit patterns valid for u32 |
| 328 | + } |
| 329 | + |
| 330 | + .. compliant_example:: |
| 331 | + :id: compl_ex_V73XRTccrWky |
| 332 | + :status: draft |
| 333 | + |
| 334 | + The following code reads a union field: |
| 335 | + |
| 336 | + .. rust-example:: |
| 337 | + :miri: |
| 338 | + |
| 339 | + union U { |
| 340 | + x: u32, |
| 341 | + y: f32, |
| 342 | + } |
| 343 | + |
| 344 | + fn main() { |
| 345 | + let u = U { x: 123 }; // write to one field |
| 346 | + println!("{}", unsafe { u.y }); // reading the other field is allowed |
| 347 | + } |
| 348 | + |
| 349 | + .. non_compliant_example:: |
| 350 | + :id: non_compl_ex_PMmuoYeT7HsG |
| 351 | + :status: draft |
| 352 | + |
| 353 | + Even though unions allow reads of any field, not all bit patterns are valid for a ``bool``. |
| 354 | + Unions do not relax type validity requirements. |
| 355 | + Only the read itself is allowed; |
| 356 | + the resulting bytes must still be a valid bool. |
| 357 | + |
| 358 | + .. rust-example:: |
| 359 | + :miri: expect_ub |
| 360 | + |
| 361 | + union U { |
| 362 | + b: bool, |
| 363 | + x: u8, |
| 364 | + } |
| 365 | + |
| 366 | + fn main() { |
| 367 | + let u = U { x: 255 }; // 255 is not a valid bool representation |
| 368 | + println!("{}", unsafe { u.b }); // UB — invalid bool |
| 369 | + } |
| 370 | + |
| 371 | + .. bibliography:: |
| 372 | + :id: bib_GDGiC7wRBAYB |
| 373 | + :status: draft |
| 374 | + |
| 375 | + .. list-table:: |
| 376 | + :header-rows: 0 |
| 377 | + :widths: auto |
| 378 | + :class: bibliography-table |
| 379 | + |
| 380 | + * - :bibentry:`gui_6JSM7YE7a1KR:RUST-REF-UB` |
| 381 | + - The Rust Project Developers. "Behavior Considered Undefined." *The Rust Reference*, n.d. https://doc.rust-lang.org/reference/behavior-considered-undefined.html. |
| 382 | + |
| 383 | + * - :bibentry:`gui_6JSM7YE7a1KR:RUST-REF-UNION` |
| 384 | + - The Rust Reference. "Unions." https://doc.rust-lang.org/reference/items/unions.html. |
| 385 | + |
| 386 | + * - :bibentry:`gui_6JSM7YE7a1KR:LLVM-MERGE` |
| 387 | + - LLVM Project. "MergeFunctions Pass." *LLVM Documentation*, n.d. https://llvm.org/docs/MergeFunctions.html. |
| 388 | + |
| 389 | + * - :bibentry:`gui_6JSM7YE7a1KR:UCG-VALIDITY` |
| 390 | + - Rust Unsafe Code Guidelines. "Validity and Safety Invariant." https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#validity-and-safety-invariant. |
0 commit comments