Skip to content

Commit f1352df

Browse files
[Coding Rule]: Do not read from uninitialized union fields (rustfoundation#302)
* [Coding Rule]: Do not read from union fields that may contain uninitialized bytes gui_UnionPartialInit Do not read from union fields that may contain uninitialized bytes * Update index.rst add new guideline * Update guidelines on struct field initialization Clarify rules for accessing struct fields and typed reads. * fix(guidelines): normalize union partial init Standardize IDs and bibliography entries, update example miri/warning handling, and add missing tag definitions for unions/initialization. * Refactor union field initialization guidelines * Refactor main function examples in gui_6JSM7YE7a1KR.rst.inc * Update src/coding-guidelines/types-and-traits/gui_6JSM7YE7a1KR.rst.inc * Update src/coding-guidelines/types-and-traits/gui_6JSM7YE7a1KR.rst.inc * Update src/coding-guidelines/types-and-traits/gui_6JSM7YE7a1KR.rst.inc * Update src/coding-guidelines/types-and-traits/gui_6JSM7YE7a1KR.rst.inc --------- Co-authored-by: Pete LeVasseur <plevasseur@gmail.com>
1 parent 24ab3ef commit f1352df

File tree

2 files changed

+392
-0
lines changed

2 files changed

+392
-0
lines changed
Lines changed: 390 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,390 @@
1+
.. SPDX-License-Identifier: MIT OR Apache-2.0
2+
.. SPDX-FileCopyrightText: The Coding Guidelines Subcommittee Contributors
3+
4+
.. default-domain:: coding-guidelines
5+
6+
.. guideline:: Do not read from union fields that may contain uninitialized bytes
7+
:id: gui_6JSM7YE7a1KR
8+
:category: required
9+
:status: draft
10+
:release: 1.85.0
11+
:fls: fls_6lg0oaaopc26
12+
:decidability: undecidable
13+
:scope: expression
14+
:tags: unions, initialization, undefined-behavior
15+
16+
Do not read from a union field unless all bytes of that field have been explicitly initialized.
17+
Partial initialization of a union's composite field leaves some bytes in an uninitialized state,
18+
and reading those bytes is undefined behavior.
19+
20+
When working with unions:
21+
22+
* Initialize all bytes of a field before reading from it
23+
* Do not assume that initializing one variant preserves the initialized state of another
24+
* Do not rely on prior initialization of a union before reassignment
25+
* Use ``MaybeUninit`` with proper initialization patterns rather than custom unions for
26+
managing uninitialized memory
27+
28+
You can access a field of a union even when the backing bytes of that field are uninitialized provided that:
29+
30+
- The resulting value has an unspecified but well-defined bit pattern.
31+
- Interpreting that value must still comply with the requirements of the accessed type
32+
(e.g., no invalid enum discriminants, no invalid pointer values, etc.).
33+
34+
For example, reading an uninitialized ``u32`` field of a union is allowed;
35+
reading an uninitialized bool field is disallowed because not all bit patterns are valid.
36+
37+
.. rationale::
38+
:id: rat_fhrmX0yFIL0L
39+
:status: draft
40+
41+
Unions in Rust allow multiple fields to share the same memory.
42+
When a union field is a composite type (tuple, struct, array),
43+
writing to only some components leaves the remaining bytes in an indeterminate state.
44+
Reading these uninitialized bytes is undefined behavior :cite:`gui_6JSM7YE7a1KR:RUST-REF-UB`.
45+
46+
This issue is particularly insidious because:
47+
48+
* **Silent data corruption**: The program may appear to work, reading stale or
49+
garbage values that happen to be *reasonable* in testing.
50+
51+
* **Optimization interactions**: The compiler may merge, inline, or deduplicate
52+
functions in ways that change which code paths execute.
53+
A function that fully initializes a union may be merged with one that partially initializes it,
54+
causing UB to appear in previously-safe code paths :cite:`gui_6JSM7YE7a1KR:LLVM-MERGE`.
55+
56+
* **Function pointer comparisons**: Relying on function pointer equality to
57+
select code paths is unreliable.
58+
Combined with partial initialization,
59+
this can lead to UB being introduced through seemingly unrelated optimizations.
60+
61+
* **Reassignment resets initialization**: Assigning a new value to a union
62+
(e.g., ``*u = MyUnion { uninit: () }``) does not preserve the initialized state of other fields.
63+
All fields must be considered uninitialized after such an assignment.
64+
65+
* **Nested partial initialization**: When a union variant contains a
66+
``struct``, initializing only one field of that ``struct`` leaves the remaining
67+
fields uninitialized.
68+
The compiler does not warn about the uninitialized fields within the nested ``struct``.
69+
70+
Fields of a struct can be individually accessed using a raw pointer.
71+
Reading the entire struct, or forming a reference to that struct,
72+
requires that all fields be initialized before a typed read occurs.
73+
74+
The sole exception is that unions work like C unions:
75+
any union field may be read, even if it was never written.
76+
The resulting bytes must, however, form a valid representation for the field's type,
77+
which is not guaranteed if the union contains arbitrary data.
78+
79+
.. non_compliant_example::
80+
:id: non_compl_ex_kJEoz8oh6Fig
81+
:status: draft
82+
83+
This noncompliant example partially initializes a tuple field, leaving the second element uninitialized.
84+
85+
.. rust-example::
86+
:miri: expect_ub
87+
:warn: allow
88+
89+
union MyMaybeUninit {
90+
uninit: (),
91+
init: (u8, u8),
92+
}
93+
94+
fn foo() {
95+
let mut a = MyMaybeUninit { uninit: () };
96+
a.init.0 = 1; // Only initializes the first byte
97+
98+
// Undefined behavior reading uninitialized value
99+
println!("{}", unsafe { a.init.1 }); // noncompliant
100+
}
101+
102+
fn main() {
103+
foo();
104+
}
105+
106+
.. non_compliant_example::
107+
:id: non_compl_ex_gE095eyVJizR
108+
:status: draft
109+
110+
This noncompliant example assumes prior initialization is preserved after reassignment.
111+
112+
.. rust-example::
113+
:miri: expect_ub
114+
115+
union Data {
116+
uninit: (),
117+
init: (u8, u8),
118+
}
119+
120+
fn reassign(d: &mut Data) {
121+
// Reassignment invalidates all prior initialization
122+
*d = Data { uninit: () };
123+
}
124+
125+
fn foo() {
126+
let mut d = Data { init: (0, 0) };
127+
reassign(&mut d);
128+
129+
// 'init' is uninitialized after reassignment
130+
println!("{}", unsafe { d.init.1 }); // noncompliant
131+
}
132+
133+
fn main() {
134+
foo();
135+
}
136+
137+
.. non_compliant_example::
138+
:id: non_compl_ex_BAHKbKIgDFnY
139+
:status: draft
140+
141+
This noncompliant example combines function pointer comparison with partial initialization,
142+
creating subtle undefined behavior that may only manifest after optimization.
143+
144+
Note: this example relies on optimizer behavior (function merging can make
145+
pointer equality succeed). Miri runs without those optimizations, so the
146+
UB path is not deterministic there.
147+
148+
.. rust-example::
149+
:miri: skip
150+
151+
union MyMaybeUninit {
152+
uninit: (),
153+
init: (u8, u8),
154+
}
155+
156+
fn write_first(a: &mut MyMaybeUninit) {
157+
*a = MyMaybeUninit { uninit: () };
158+
a.init.0 = 1;
159+
}
160+
161+
fn write_both(a: &mut MyMaybeUninit) {
162+
*a = MyMaybeUninit { uninit: () };
163+
a.init.0 = 1;
164+
a.init.1 = 2;
165+
}
166+
167+
fn main() {
168+
let mut a = MyMaybeUninit { init: (0, 0) };
169+
170+
if write_first as usize == write_both as usize {
171+
write_first(&mut a);
172+
}
173+
174+
// UB if the branch was taken (functions may be merged by optimizer)
175+
println!("{}", unsafe { a.init.1 }); // noncompliant
176+
}
177+
178+
.. compliant_example::
179+
:id: compl_ex_JAR0OI9S07kf
180+
:status: draft
181+
182+
This compliant examples initializes all bytes of the field before reading.
183+
184+
.. rust-example::
185+
:miri:
186+
187+
union MyMaybeUninit {
188+
uninit: (),
189+
init: (u8, u8),
190+
}
191+
192+
fn write_both(a: &mut MyMaybeUninit) {
193+
*a = MyMaybeUninit { uninit: () };
194+
a.init.0 = 1;
195+
a.init.1 = 2; // Initialize all bytes
196+
}
197+
198+
fn main() {
199+
let mut a = MyMaybeUninit { init: (0, 0) };
200+
write_both(&mut a);
201+
202+
// Both bytes are initialized
203+
println!("{}", unsafe { a.init.1 }); // compliant
204+
}
205+
206+
.. compliant_example::
207+
:id: compl_ex_ko80pT9aS8Ge
208+
:status: draft
209+
210+
This compliant example uses ``MaybeUninit`` with proper initialization patterns.
211+
212+
.. rust-example::
213+
:miri:
214+
215+
use std::mem::MaybeUninit;
216+
217+
fn init_tuple() -> (u8, u8) {
218+
let mut data: MaybeUninit<(u8, u8)> = MaybeUninit::uninit();
219+
220+
unsafe {
221+
let ptr = data.as_mut_ptr();
222+
(*ptr).0 = 1;
223+
(*ptr).1 = 2; // Initialize all fields
224+
// data is fully initialized before call to 'assume_init'
225+
data.assume_init()
226+
}
227+
}
228+
229+
fn main() {
230+
let result = init_tuple();
231+
println!("{}, {}", result.0, result.1); // compliant
232+
}
233+
234+
.. compliant_example::
235+
:id: compl_ex_xnanwe9eU5p5
236+
:status: draft
237+
238+
This compliant example initializes through the composite field directly.
239+
240+
.. rust-example::
241+
:miri:
242+
243+
union Data {
244+
raw: [u8; 4],
245+
value: u32,
246+
}
247+
248+
fn full_init(d: &mut Data) {
249+
// Initialize entire field at once
250+
*d = Data { raw: [0xAB, 0xCD, 0xEF, 0x12] };
251+
}
252+
253+
fn main() {
254+
let mut d = Data { value: 0 };
255+
full_init(&mut d);
256+
257+
// All bytes in 'd' are initialized
258+
println!("{:?}", unsafe { d.raw }); // compliant
259+
}
260+
261+
.. compliant_example::
262+
:id: compl_ex_gdh48eGNdS7e
263+
:status: draft
264+
265+
This compliant example avoids relying on function pointer comparisons.
266+
267+
.. rust-example::
268+
:miri:
269+
270+
union MyMaybeUninit {
271+
uninit: (),
272+
init: (u8, u8),
273+
}
274+
275+
#[allow(dead_code)]
276+
enum InitLevel {
277+
Partial,
278+
Full,
279+
}
280+
281+
fn write_first(a: &mut MyMaybeUninit) {
282+
*a = MyMaybeUninit { uninit: () };
283+
a.init.0 = 1;
284+
}
285+
286+
fn write_both(a: &mut MyMaybeUninit) {
287+
*a = MyMaybeUninit { uninit: () };
288+
a.init.0 = 1;
289+
a.init.1 = 2;
290+
}
291+
292+
fn main() {
293+
let mut a = MyMaybeUninit { init: (0, 0) };
294+
let level = InitLevel::Full; // Explicit tracking, not pointer comparison
295+
296+
match level {
297+
InitLevel::Full => {
298+
write_both(&mut a);
299+
// Compliant: safe to read both fields
300+
println!("{}", unsafe { a.init.1 });
301+
}
302+
InitLevel::Partial => {
303+
write_first(&mut a);
304+
// Only read the initialized field
305+
println!("{}", unsafe { a.init.0 });
306+
}
307+
}
308+
}
309+
310+
.. compliant_example::
311+
:id: compl_ex_EU7kO0DtkJxs
312+
:status: draft
313+
314+
Types such as ``u8``, ``u16``, ``u32``, and ``i128`` allow all possible bit patterns.
315+
Provided the memory is initialized, there is no undefined behavior.
316+
317+
.. rust-example::
318+
:miri:
319+
320+
union U {
321+
n: u32,
322+
bytes: [u8; 4],
323+
}
324+
325+
fn main() {
326+
let u = U { bytes: [0xFF, 0xEE, 0xDD, 0xCC] };
327+
println!("{}", unsafe { u.n }); // OK — all bit patterns valid for u32
328+
}
329+
330+
.. compliant_example::
331+
:id: compl_ex_V73XRTccrWky
332+
:status: draft
333+
334+
The following code reads a union field:
335+
336+
.. rust-example::
337+
:miri:
338+
339+
union U {
340+
x: u32,
341+
y: f32,
342+
}
343+
344+
fn main() {
345+
let u = U { x: 123 }; // write to one field
346+
println!("{}", unsafe { u.y }); // reading the other field is allowed
347+
}
348+
349+
.. non_compliant_example::
350+
:id: non_compl_ex_PMmuoYeT7HsG
351+
:status: draft
352+
353+
Even though unions allow reads of any field, not all bit patterns are valid for a ``bool``.
354+
Unions do not relax type validity requirements.
355+
Only the read itself is allowed;
356+
the resulting bytes must still be a valid bool.
357+
358+
.. rust-example::
359+
:miri: expect_ub
360+
361+
union U {
362+
b: bool,
363+
x: u8,
364+
}
365+
366+
fn main() {
367+
let u = U { x: 255 }; // 255 is not a valid bool representation
368+
println!("{}", unsafe { u.b }); // UB — invalid bool
369+
}
370+
371+
.. bibliography::
372+
:id: bib_GDGiC7wRBAYB
373+
:status: draft
374+
375+
.. list-table::
376+
:header-rows: 0
377+
:widths: auto
378+
:class: bibliography-table
379+
380+
* - :bibentry:`gui_6JSM7YE7a1KR:RUST-REF-UB`
381+
- The Rust Project Developers. "Behavior Considered Undefined." *The Rust Reference*, n.d. https://doc.rust-lang.org/reference/behavior-considered-undefined.html.
382+
383+
* - :bibentry:`gui_6JSM7YE7a1KR:RUST-REF-UNION`
384+
- The Rust Reference. "Unions." https://doc.rust-lang.org/reference/items/unions.html.
385+
386+
* - :bibentry:`gui_6JSM7YE7a1KR:LLVM-MERGE`
387+
- LLVM Project. "MergeFunctions Pass." *LLVM Documentation*, n.d. https://llvm.org/docs/MergeFunctions.html.
388+
389+
* - :bibentry:`gui_6JSM7YE7a1KR:UCG-VALIDITY`
390+
- Rust Unsafe Code Guidelines. "Validity and Safety Invariant." https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#validity-and-safety-invariant.

src/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,8 @@
117117
dict(name="numerics", description="Numerics-related guideline"),
118118
dict(name="undefined-behavior", description="Guideline related to Undefined Behavior"),
119119
dict(name="stack-overflow", description="Guideline related to Stack Overflow"),
120+
dict(name="unions", description="Guideline related to union types and field access"),
121+
dict(name="initialization", description="Guideline related to initialization requirements and uninitialized data"),
120122

121123
dict(name="maintainability", description="How effectively and efficiently a product or system can be modified. This includes improvements, fault corrections, and adaptations to changes in the environment or requirements. It is considered a crucial software quality characteristic."),
122124
dict(name="portability", description="The degree to which a system, product, or component can be effectively and efficiently transferred from one hardware, software, or other operational or usage environment to another."),

0 commit comments

Comments
 (0)