-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of EscapeDebugInner
.
#138237
base: master
Are you sure you want to change the base?
Get rid of EscapeDebugInner
.
#138237
Conversation
e1c078d
to
57c0a80
Compare
57c0a80
to
0854482
Compare
// Note: It’s possible to manually encode the EscapeDebugInner inside of | ||
// EscapeIterInner (e.g. with alive=254..255 indicating that data[0..4] holds | ||
// a char) which would likely result in a more optimised code. For now we use | ||
// the option easier to implement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the comment you reference, so perfrun seems to be in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Should definitely be optimized in terms of size, speed remains to be seen.
library/core/src/escape.rs
Outdated
/// The caller must ensure that `self` contains an escape sequence. | ||
#[inline] | ||
pub(crate) fn as_ascii(&self) -> &[ascii::Char] { | ||
// SAFETY: `self.alive` is guaranteed to be a valid range for indexing `self.data`. | ||
unsafe fn as_ascii(&self) -> &[ascii::Char] { | ||
// SAFETY: `self.data.escaped` contains an escape sequence, as is guaranteed | ||
// by the caller, and `self.alive` is guaranteed to be a valid range for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that the top level safety comment should state the more detailed requirement that callers are expected to uphold, and then the inner unsafe use can say that it relies on that.
Tangentially, I'm wondering if escaped
would not be better as escape_seq
or escape
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second part about self.alive
is guaranteed by the invariant of the type itself, so the existence of self
is enough to satisfy it.
library/core/src/escape.rs
Outdated
// This is the only way to create an `EscapeIterInner` with an unescaped `char`, which | ||
// means the `AlwaysEscaped` marker guarantees that `self` contains an escape sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only way to create an EscapeIterInner<N, MaybeEscaped> with a literal char in its union member, then I don't see how that guarantees anything about the contents of EscapeIterInner<N, AlwaysEscaped>, besides that the union in an EscapeIterInner<N, AlwaysEscaped> contains a [ascii::Char; N] which may or may not be a valid escape sequence...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only way to create any EscapeIterInner<N, T>
that contains a literal char
. EscapeIterInner
can only contain either an escape sequence or a literal char
. So it follows that if it doesn't contain a literal char
, it contains an escape sequence. So EscapeIterInner<N, AlwaysEscaped>
, and any EscapeIterInner<N, T>
where T != MaybeEscaped
for that matter, is guaranteed to contain an escape sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But nothing here guarantees that what you're calling an escape sequence (but is really just a [ascii::Char; N]
), actually contains a valid escape sequence. You don't seem to distinguish between the variant of the union escape sequence and whether the non-char union variant is actually an escape sequence, which makes this confusing. Perhaps I've misunderstood your claim here, and you're only claiming that the union is not the literal char variant? If so I think it would help if you wrote that more explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so it only is a valid escape sequence paired with a range. But technically the [ascii::Char; N]
array still contains a valid escape sequence, even if you don't know it's range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suppose I've been silently assuming that we are talking about the alive
subslice of the [ascii::Char; N]
as the part which must be a valid escape sequence.
But of all the possible values that the [ascii::Char; N]
can have and all possible subslices as indicated by alive
, most do not contain valid escape sequences, right? It is not even guaranteed that alive
is a valid range for the [ascii::Char; N]
. At least not by the reasoning provided here. For such guarantees we would have to look at the code that does construct such objects. This code is for constructing objects with literal char union variant, so it can't tell us anything about the possible contents of the code that does construct objects with an unescape_seq
union variant. Unless you're not claiming any validity of such an unescape_seq
union variant, by which I mean that its alive
range is valid for indexing and that the subslice so indexed is a valid escape sequence that starts with a backslash and continues with the rest of a valid escape sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to take a step back and redefine the invariant, since we lost the relation between [ascii::Char; N]
and alive
somewhere along the way.
Also, alive
does not necessarily need to be the range of the escape sequence, it can actually be a sub-range as well while iterating.
I'm thinking:
// Invariant:
//
// If `alive.end <= 128`, `data.escape_seq` must be valid and
// contain printable ASCII characters in the `alive` range.
// If `alive.end > 128`, `data.literal` must be valid and
// the `alive` range must have a length of at most `1`.
library/core/src/escape.rs
Outdated
|
||
if self.is_unescaped() { | ||
// SAFETY: We just checked that `self` contains an unescaped `char`. | ||
return Some(unsafe { self.as_char() }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just inline as_char?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The job Click to see the possible cause of the failure (guessed by this bot)
|
I'm sure you know this, but if you highlight these lines, you can see that the trailing space is the problem. |
I read the note on
EscapeDebugInner
and thought I'd give it a try.