document preservation of padding in operations on pointers#695
document preservation of padding in operations on pointers#695tshepang wants to merge 5 commits into
Conversation
| The representation of the value of a :t:`constant initializer` or :t:`static initializer` must only contain bytes with :t:`provenance` where all bytes of some original :t:`pointer` are in the correct order. | ||
|
|
||
| :dp:`fls_nwgIMLkvD2Ol` | ||
| :dt:`Provenance` is the memory that a :t:`pointer` has permission to access, the timespan during which it can acesss that memory, and if it can access the memory for writes. |
There was a problem hiding this comment.
Where did you get this definition from? To me, "provenance" is a property (of something, TBD). According to Merriam-Webster, "provenance" is the origin or source, or the history of ownership.
There was a problem hiding this comment.
https://doc.rust-lang.org/stable/core/ptr/index.html#provenance
maybe I should just call it pointer provenance
There was a problem hiding this comment.
I think the bigger risk here is not just where this definition came from, but that it commits FLS to a general provenance model that the upstream Reference text did not define in these PRs. The local wording reads more like a paraphrase of pointer access permissions, while the neighboring rules are about bytes carrying provenance and fragments of an original pointer value. Would it be safer to avoid defining provenance locally unless we have upstream wording to anchor it, or else narrow the term explicitly to the byte-level notion needed by this rule?
There was a problem hiding this comment.
I did a quick search across the nightly Reference, and I am not seeing a concrete definition of provenance. Given this, I agree with Pete, if the Reference does not define something, the FLS should not define it either.
Orthogonal to this, this rule should have gone into 4.7. Indirection Types 😁 .
This comment is still relevant.
There was a problem hiding this comment.
I think the concrete fix here is to delete this broad local definition rather than rename it to pointer provenance.
I see the core::ptr provenance documentation you linked, and I think that is exactly where the broader pointer-provenance discussion should stay, given the memory model is not yet settled for Rust.
The Reference handles this by noting the memory model recognizes initialized bytes as carrying optional provenance, and other Reference text points readers to std::ptr / core::ptr for provenance details rather than defining a complete provenance model inline.
Given that, I still think defining Provenance locally this way is too broad for this PR. The rules being ported from reference#2091 and the currently proposed reference#2138 wording talk narrowly about values or bytes carrying provenance, pointer fragments, and original pointer values; they do not define provenance as accessible memory plus access duration and write permission.
Again, the Reference also says the Rust memory model is incomplete and not fully decided, so I think the FLS should avoid introducing a general provenance definition here. Feels premature.
Suggested source-level outcome:
Delete paragraph fls_nwgIMLkvD2Ol entirely.
If FLS needs terms to make the new rules readable, I would replace uses that depend on this local provenance definition with the narrow byte-level terms used by the initializer and const-context provenance rules, such as provenance-carrying byte and whole-pointer group. That lets the PR state the required restrictions using the Reference's byte-level provenance concept, while linking to the standard library core::ptr documentation for the broader pointer-provenance discussion instead of defining general provenance semantics locally.
If definitions are introduced, I agree with @kirtchev-adacore that they should not live under Constants; pointer/provenance terminology likely belongs closer to Indirection Types, or beside the shared const-eval provenance rules if those rules are kept together.
Co-authored-by: Hristian Kirtchev <60669983+kirtchev-adacore@users.noreply.github.com>
Co-authored-by: Hristian Kirtchev <60669983+kirtchev-adacore@users.noreply.github.com>
| .. rubric:: Undefined Behavior | ||
|
|
||
| :dp:`fls_hOIImCr1c6IF` | ||
| It is undefined behavior to convert a :t:`pointer` that has :t:`provenance` into a non-:t:`pointer type` in a :t:`constant context`. |
There was a problem hiding this comment.
I think this sentence captures only one consequence of the upstream rule. In reference#2091, the normative const-eval restriction is broader: integer-like values must not carry provenance, and pointer-like values must either carry no provenance or consist of fragments of the same original pointer value in the correct order. Would it make sense to spec that broader rule, and then treat the ptr-to-non-pointer case here as an implication or example? If the main rule ends up living with shared const/static-initializer material, this sentence could then point back to that broader restriction.
There was a problem hiding this comment.
To make this more concrete, I think we should explore porting the broader const-context validity rule from reference#2091, rather than only the more narrow ptr-to-non-pointer implication currently stated here.
One potential shape would be to replace this paragraph with a rule along these lines, preferably in the Constant Expressions undefined-behavior material or a shared const-eval subsection rather than as a standalone special case here:
:dp:`...`
In a :t:`constant context`, it is a :t:`validity invariant` that a
:t:`value` of an :t:`integer type`, a :t:`floating-point type`,
:c:`bool`, :c:`char`, a :t:`discriminant`, or slice metadata does not
carry provenance.
:dp:`...`
In a :t:`constant context`, it is a :t:`validity invariant` that a
:t:`value` of a :t:`reference type`, :t:`raw pointer type`,
:t:`function pointer type`, or ``dyn Trait`` metadata either carries no
provenance, or has all of its :t:`[provenance-carrying byte]s` form a
single :t:`whole-pointer group`.
:dp:`...`
Transmuting or otherwise reinterpreting a :t:`pointer` with provenance as a
non-:t:`pointer type` in a :t:`constant context` is undefined behavior when the
resulting :t:`value` holds pure integer data, because that :t:`value` carries
provenance in violation of its :t:`validity invariant`.This would pair pretty naturally with narrow definitions for provenance-carrying byte and whole-pointer group near the initializer/provenance rules, rather than a broad definition of provenance itself. Those terms mirror the Reference's byte-level wording without committing FLS to a general provenance model.
The important cases from the Reference that need to be represented are:
- slice metadata is grouped with the integer-data restriction;
dyn Traitmetadata is grouped with the pointer-data restriction.
Without the broader validity-invariant rule, the PR seems to miss the wrong-order pointer-byte case and recursive/nested validity cases that reference#2091 was specifically written to cover, such as provenance-carrying integer data nested inside a field.
| The representation of the value of a :t:`constant initializer` or :t:`static initializer` must only contain bytes with :t:`provenance` where all bytes of some original :t:`pointer` are in the correct order. | ||
|
|
||
| :dp:`fls_nwgIMLkvD2Ol` | ||
| :dt:`Provenance` is the memory that a :t:`pointer` has permission to access, the timespan during which it can acesss that memory, and if it can access the memory for writes. |
There was a problem hiding this comment.
I think the bigger risk here is not just where this definition came from, but that it commits FLS to a general provenance model that the upstream Reference text did not define in these PRs. The local wording reads more like a paraphrase of pointer access permissions, while the neighboring rules are about bytes carrying provenance and fragments of an original pointer value. Would it be safer to avoid defining provenance locally unless we have upstream wording to anchor it, or else narrow the term explicitly to the byte-level notion needed by this rule?
There was a problem hiding this comment.
Thanks for continuing to work through this. I think the remaining issue is that the PR still seems to document only parts of the provenance-related const-eval restrictions.
The main asks in the inline comments are:
- port the full merged rust-lang/reference#2091 const-context provenance validity rule, including slice metadata and
dyn Traitmetadata; - align the final constant/static initializer rule with the current rust-lang/reference#2138 draft shape if this PR is intended to cover the merged rust-lang/rust#148967 behavior before that Reference PR lands;
- delete the broad local
Provenancedefinition and use narrow byte-level terms instead, while leaving broader pointer-provenance discussion tocore::ptr.
This is still a changes-requested from me, but the hope is that the comments are more actionable and a concrete path to aligning the FLS wording without committing FLS to a general provenance model.
| :t:`constant initializer`. | ||
|
|
||
| :dp:`fls_LmPbrh0Cba8g` | ||
| The :t:`type representation` of the :t:`value` of a :t:`constant initializer` or :t:`static initializer` must only contain bytes with :t:`provenance` where all bytes of some original :t:`pointer` are in the correct order. |
There was a problem hiding this comment.
This looks to me to be looser than the wording currently proposed in reference#2138, which this PR cites as an input, and less precise about the restriction implemented by merged rust-lang/rust#148967.
There's potential philosophical difference here we should discuss of:
- The Reference documenting the language.
- The FLS documenting the compiler.
Now, the above distinction is not perfect, but I think it gestures to the idea and asks the question of: "should we document compiler behavior, for which there is not yet settled Reference documentation?" Feels like to me the answer is yes.
I think that since this PR is intended to land before rust-lang/reference#2138, I think we should either align with that draft's current shape or explicitly decide what FLS wording we want to use for the same merged rustc behavior.
I think the cleanest FLS shape is to define the narrow terms that this rule needs in shared const/static-initializer material, then state the rule in those terms. For example:
:dp:`...`
:dt:`Provenance-carrying byte` is an initialized byte in the representation of
a :t:`value` that contains a :c:`u8` value and provenance associated with a
:t:`pointer`. General pointer provenance is discussed by :std:`core::ptr`.
:dp:`...`
:dt:`Whole-pointer group` is an adjacent sequence of
:t:`[provenance-carrying byte]s` that consists exactly of all bytes of one
original :t:`pointer` value in their original order.
:dp:`...`
The representation of the final :t:`value` produced by evaluating a
:t:`constant initializer` or :t:`static initializer` shall only contain
:t:`[provenance-carrying byte]s` that are part of a :t:`whole-pointer group`.This helps enable four things:
- scopes the rule to the final value of a constant or static initializer, matching the Reference's top-level initializer restriction;
- defines
whole-pointer groupas the adjacent, complete, same-original-pointer, correctly ordered group required by the Reference; - uses the Reference memory model's byte-level provenance concept while leaving the broader pointer-provenance discussion to
core::ptr, like the Reference does; - uses FLS static-error wording for a provenance-carrying byte that is only a partial or misordered pointer fragment.
I'd also suggest we move this out of the Constants subsection and into shared const/static-initializer material in Values, since this rule applies equally to constants and statics.
There was a problem hiding this comment.
General pointer provenance is discussed by :std:
core::ptr.
You cannot point the reader/programmer/auditor to an unstable external document. After all, it is the job of the FLS to define the stable semantics of provenance.
Let's step back a bit. We need to define what "provenance" is. This should be done as a new subsection to 4.7. Indirection Types. The suggestion below is based on core::ptr provenance.
4.7. Indirection Types
Legality Rules
An :t:`indirection value` is a :t:`value` of an :t:`indirection type`.
An :t:`original indirection value` is an :t:`indirection value` created via allocation.
A :t:`derived indirection value` is an :t:`indirection value` obtained by performing address
or pointer arithmetic on another :t:`indirection value`.
4.7.4. Provenance
:t:`Provenance` is an optional property of :t:`[indirection value]s which grants the following
permissions:
- :t:`Provenance` restricts the addresses an :t:`indirection value` may point to within a
block of memory.
- :t:`Provenance` restricts the timespan during which an :t:`indirection value` may point to
a block of memory.
- :t:`Provenance` restricts the read/write access an :t:`indirection value` has from/to a
block of memory.
An :t:`original indirection value` always carries :t:`provenance`.
A :t:`derived indirection value` inherits the :t:`provenance` of the :t:`indirection value` it
was created from, if any.
A :t:`well-formed indirection value` is an :t:`indirection value` with :t:`provenance`, where
all bytes that comprise the indirection value are part of the same group.
Undefined Behavior
It is undefined behavior to access memory through an :t:`indirection value` which
does not have :t:`provenance` over that memory.
It is undefined behavior to offset an :t:`indirection value` across the bounds of the
memory block it has :t:`provenance` over.With this in place, we can now tackle the proposed rules:
:dp:`...`
If the final :t:`value` produced by evaluating a :t:`constant initializer` or :t:`static
initializer` denotes an :t:`indirection value`, then the final :t:`value` shall be a
:t:`well-formed indirection value`.| .. rubric:: Undefined Behavior | ||
|
|
||
| :dp:`fls_hOIImCr1c6IF` | ||
| It is undefined behavior to convert a :t:`pointer` that has :t:`provenance` into a non-:t:`pointer type` in a :t:`constant context`. |
There was a problem hiding this comment.
To make this more concrete, I think we should explore porting the broader const-context validity rule from reference#2091, rather than only the more narrow ptr-to-non-pointer implication currently stated here.
One potential shape would be to replace this paragraph with a rule along these lines, preferably in the Constant Expressions undefined-behavior material or a shared const-eval subsection rather than as a standalone special case here:
:dp:`...`
In a :t:`constant context`, it is a :t:`validity invariant` that a
:t:`value` of an :t:`integer type`, a :t:`floating-point type`,
:c:`bool`, :c:`char`, a :t:`discriminant`, or slice metadata does not
carry provenance.
:dp:`...`
In a :t:`constant context`, it is a :t:`validity invariant` that a
:t:`value` of a :t:`reference type`, :t:`raw pointer type`,
:t:`function pointer type`, or ``dyn Trait`` metadata either carries no
provenance, or has all of its :t:`[provenance-carrying byte]s` form a
single :t:`whole-pointer group`.
:dp:`...`
Transmuting or otherwise reinterpreting a :t:`pointer` with provenance as a
non-:t:`pointer type` in a :t:`constant context` is undefined behavior when the
resulting :t:`value` holds pure integer data, because that :t:`value` carries
provenance in violation of its :t:`validity invariant`.This would pair pretty naturally with narrow definitions for provenance-carrying byte and whole-pointer group near the initializer/provenance rules, rather than a broad definition of provenance itself. Those terms mirror the Reference's byte-level wording without committing FLS to a general provenance model.
The important cases from the Reference that need to be represented are:
- slice metadata is grouped with the integer-data restriction;
dyn Traitmetadata is grouped with the pointer-data restriction.
Without the broader validity-invariant rule, the PR seems to miss the wrong-order pointer-byte case and recursive/nested validity cases that reference#2091 was specifically written to cover, such as provenance-carrying integer data nested inside a field.
| The representation of the value of a :t:`constant initializer` or :t:`static initializer` must only contain bytes with :t:`provenance` where all bytes of some original :t:`pointer` are in the correct order. | ||
|
|
||
| :dp:`fls_nwgIMLkvD2Ol` | ||
| :dt:`Provenance` is the memory that a :t:`pointer` has permission to access, the timespan during which it can acesss that memory, and if it can access the memory for writes. |
There was a problem hiding this comment.
I think the concrete fix here is to delete this broad local definition rather than rename it to pointer provenance.
I see the core::ptr provenance documentation you linked, and I think that is exactly where the broader pointer-provenance discussion should stay, given the memory model is not yet settled for Rust.
The Reference handles this by noting the memory model recognizes initialized bytes as carrying optional provenance, and other Reference text points readers to std::ptr / core::ptr for provenance details rather than defining a complete provenance model inline.
Given that, I still think defining Provenance locally this way is too broad for this PR. The rules being ported from reference#2091 and the currently proposed reference#2138 wording talk narrowly about values or bytes carrying provenance, pointer fragments, and original pointer values; they do not define provenance as accessible memory plus access duration and write permission.
Again, the Reference also says the Rust memory model is incomplete and not fully decided, so I think the FLS should avoid introducing a general provenance definition here. Feels premature.
Suggested source-level outcome:
Delete paragraph fls_nwgIMLkvD2Ol entirely.
If FLS needs terms to make the new rules readable, I would replace uses that depend on this local provenance definition with the narrow byte-level terms used by the initializer and const-context provenance rules, such as provenance-carrying byte and whole-pointer group. That lets the PR state the required restrictions using the Reference's byte-level provenance concept, while linking to the standard library core::ptr documentation for the broader pointer-provenance discussion instead of defining general provenance semantics locally.
If definitions are introduced, I agree with @kirtchev-adacore that they should not live under Constants; pointer/provenance terminology likely belongs closer to Indirection Types, or beside the shared const-eval provenance rules if those rules are kept together.
I am still confused how a non-indirection value has provenance... I know that the Reference PRs mention this, but I suspect this might be lax language.
How about In a :t:`constant context`, it is a :t:`validity invariant` that an :t:`indirection value` or
``dyn Trait`` metadata either carries no :t:`provenance`, or is a :t:`well-formed
indirection value`.On a side note, I have no idea what "dyn Trait metadata" is. This implies that
This sounds very tutorial-y. How about It is undefined behavior to reinterpret a :t:`indirection value` into a non-:t:`indirection
value` in a :t:`constant context` when the resulting :t:`value` holds pure integer data. |
This is as far as I "understand" these things, and is based on these 2 PRs