Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of EscapeDebugInner. #138237

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

reitermarkus
Copy link
Contributor

I read the note on EscapeDebugInner and thought I'd give it a try.

@rustbot
Copy link
Collaborator

rustbot commented Mar 8, 2025

r? @thomcc

rustbot has assigned @thomcc.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 8, 2025
@reitermarkus reitermarkus force-pushed the remove-escape-debug-inner branch 2 times, most recently from e1c078d to 57c0a80 Compare March 8, 2025 20:35
@reitermarkus reitermarkus force-pushed the remove-escape-debug-inner branch from 57c0a80 to 0854482 Compare March 8, 2025 20:52
Comment on lines -309 to -312
// Note: It’s possible to manually encode the EscapeDebugInner inside of
// EscapeIterInner (e.g. with alive=254..255 indicating that data[0..4] holds
// a char) which would likely result in a more optimised code. For now we use
// the option easier to implement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the comment you reference, so perfrun seems to be in order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Should definitely be optimized in terms of size, speed remains to be seen.

Comment on lines 220 to 224
/// The caller must ensure that `self` contains an escape sequence.
#[inline]
pub(crate) fn as_ascii(&self) -> &[ascii::Char] {
// SAFETY: `self.alive` is guaranteed to be a valid range for indexing `self.data`.
unsafe fn as_ascii(&self) -> &[ascii::Char] {
// SAFETY: `self.data.escaped` contains an escape sequence, as is guaranteed
// by the caller, and `self.alive` is guaranteed to be a valid range for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the top level safety comment should state the more detailed requirement that callers are expected to uphold, and then the inner unsafe use can say that it relies on that.

Tangentially, I'm wondering if escaped would not be better as escape_seq or escape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second part about self.alive is guaranteed by the invariant of the type itself, so the existence of self is enough to satisfy it.

Comment on lines 278 to 279
// This is the only way to create an `EscapeIterInner` with an unescaped `char`, which
// means the `AlwaysEscaped` marker guarantees that `self` contains an escape sequence.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only way to create an EscapeIterInner<N, MaybeEscaped> with a literal char in its union member, then I don't see how that guarantees anything about the contents of EscapeIterInner<N, AlwaysEscaped>, besides that the union in an EscapeIterInner<N, AlwaysEscaped> contains a [ascii::Char; N] which may or may not be a valid escape sequence...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only way to create any EscapeIterInner<N, T> that contains a literal char. EscapeIterInner can only contain either an escape sequence or a literal char. So it follows that if it doesn't contain a literal char, it contains an escape sequence. So EscapeIterInner<N, AlwaysEscaped>, and any EscapeIterInner<N, T> where T != MaybeEscaped for that matter, is guaranteed to contain an escape sequence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But nothing here guarantees that what you're calling an escape sequence (but is really just a [ascii::Char; N]), actually contains a valid escape sequence. You don't seem to distinguish between the variant of the union escape sequence and whether the non-char union variant is actually an escape sequence, which makes this confusing. Perhaps I've misunderstood your claim here, and you're only claiming that the union is not the literal char variant? If so I think it would help if you wrote that more explicitly

Copy link
Contributor Author

@reitermarkus reitermarkus Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so it only is a valid escape sequence paired with a range. But technically the [ascii::Char; N] array still contains a valid escape sequence, even if you don't know it's range.

Copy link
Member

@hkBst hkBst Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose I've been silently assuming that we are talking about the alive subslice of the [ascii::Char; N] as the part which must be a valid escape sequence.
But of all the possible values that the [ascii::Char; N] can have and all possible subslices as indicated by alive, most do not contain valid escape sequences, right? It is not even guaranteed that alive is a valid range for the [ascii::Char; N]. At least not by the reasoning provided here. For such guarantees we would have to look at the code that does construct such objects. This code is for constructing objects with literal char union variant, so it can't tell us anything about the possible contents of the code that does construct objects with an unescape_seq union variant. Unless you're not claiming any validity of such an unescape_seq union variant, by which I mean that its alive range is valid for indexing and that the subslice so indexed is a valid escape sequence that starts with a backslash and continues with the rest of a valid escape sequence.

Copy link
Contributor Author

@reitermarkus reitermarkus Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to take a step back and redefine the invariant, since we lost the relation between [ascii::Char; N] and alive somewhere along the way.

Also, alive does not necessarily need to be the range of the escape sequence, it can actually be a sub-range as well while iterating.

I'm thinking:

// Invariant:
// 
// If `alive.end <= 128`, `data.escape_seq` must be valid and
// contain printable ASCII characters in the `alive` range.
// If `alive.end > 128`, `data.literal` must be valid and
// the `alive` range must have a length of at most `1`.


if self.is_unescaped() {
// SAFETY: We just checked that `self` contains an unescaped `char`.
return Some(unsafe { self.as_char() });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just inline as_char?

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@reitermarkus reitermarkus requested a review from hkBst March 10, 2025 17:43
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check-tidy failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
info: removing rustup binaries
info: rustup is uninstalled
##[group]Image checksum input
mingw-check-tidy
# We use the ghcr base image because ghcr doesn't have a rate limit
# and the mingw-check-tidy job doesn't cache docker images in CI.
FROM ghcr.io/rust-lang/ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends \
  g++ \
  make \
---

COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
COPY host-x86_64/mingw-check/validate-error-codes.sh /scripts/

# NOTE: intentionally uses python2 for x.py so we can test it still works.
# validate-toolstate only runs in our CI, so it's ok for it to only support python3.
ENV SCRIPT TIDY_PRINT_DIFF=1 python2.7 ../x.py test \
           --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
#    pip-compile --allow-unsafe --generate-hashes reuse-requirements.in
---
#12 2.821 Building wheels for collected packages: reuse
#12 2.823   Building wheel for reuse (pyproject.toml): started
#12 3.038   Building wheel for reuse (pyproject.toml): finished with status 'done'
#12 3.039   Created wheel for reuse: filename=reuse-4.0.3-cp310-cp310-manylinux_2_35_x86_64.whl size=132720 sha256=0c2fd2aaf7b0bf8d6e131220aff14712a774c2ca462f3204d25460cbcf610b63
#12 3.039   Stored in directory: /tmp/pip-ephem-wheel-cache-9v_h8e51/wheels/3d/8d/0a/e0fc6aba4494b28a967ab5eaf951c121d9c677958714e34532
#12 3.041 Successfully built reuse
#12 3.042 Installing collected packages: boolean-py, binaryornot, tomlkit, reuse, python-debian, markupsafe, license-expression, jinja2, chardet, attrs
#12 3.450 Successfully installed attrs-23.2.0 binaryornot-0.4.4 boolean-py-4.0 chardet-5.2.0 jinja2-3.1.4 license-expression-30.3.0 markupsafe-2.1.5 python-debian-0.1.49 reuse-4.0.3 tomlkit-0.13.0
#12 3.450 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#12 3.990 Collecting virtualenv
#12 4.028   Downloading virtualenv-20.29.3-py3-none-any.whl (4.3 MB)
#12 4.090      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 72.0 MB/s eta 0:00:00
#12 4.144 Collecting platformdirs<5,>=3.9.1
#12 4.147   Downloading platformdirs-4.3.6-py3-none-any.whl (18 kB)
#12 4.167 Collecting distlib<1,>=0.3.7
#12 4.170   Downloading distlib-0.3.9-py2.py3-none-any.whl (468 kB)
#12 4.178      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 469.0/469.0 KB 83.8 MB/s eta 0:00:00
#12 4.213 Collecting filelock<4,>=3.12.2
#12 4.217   Downloading filelock-3.17.0-py3-none-any.whl (16 kB)
#12 4.299 Installing collected packages: distlib, platformdirs, filelock, virtualenv
#12 4.485 Successfully installed distlib-0.3.9 filelock-3.17.0 platformdirs-4.3.6 virtualenv-20.29.3
#12 4.485 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
#12 DONE 4.6s

#13 [7/8] COPY host-x86_64/mingw-check/validate-toolstate.sh /scripts/
#13 DONE 0.0s
---
DirectMap4k:      122816 kB
DirectMap2M:     7217152 kB
DirectMap1G:    11534336 kB
##[endgroup]
Executing TIDY_PRINT_DIFF=1 python2.7 ../x.py test            --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp
+ TIDY_PRINT_DIFF=1 python2.7 ../x.py test --stage 0 src/tools/tidy tidyselftest --extra-checks=py,cpp
##[group]Building bootstrap
    Finished `dev` profile [unoptimized] target(s) in 0.05s
##[endgroup]
WARN: currently no CI rustc builds have rustc debug assertions enabled. Please either set `rust.debug-assertions` to `false` if you want to use download CI rustc or set `rust.download-rustc` to `false`.
downloading https://static.rust-lang.org/dist/2025-02-18/rustfmt-nightly-x86_64-unknown-linux-gnu.tar.xz
---
   Compiling tidy v0.1.0 (/checkout/src/tools/tidy)
    Finished `release` profile [optimized] target(s) in 36.73s
##[endgroup]
fmt check
Diff in /checkout/library/core/src/escape.rs:228:
     /// `self` must contain an escape sequence.
     #[inline]
     unsafe fn as_escape_seq(&self) -> &[ascii::Char] {
-        // SAFETY: `self.data.escape_seq` contains an escape sequence, as is 
-        // guaranteed by the caller, and in that case `self.alive` is guaranteed 
+        // SAFETY: `self.data.escape_seq` contains an escape sequence, as is
+        // guaranteed by the caller, and in that case `self.alive` is guaranteed
         // to be a valid range for indexing it.
         unsafe {
             self.data
fmt: checked 5892 files
Build completed unsuccessfully in 0:01:19
  local time: Mon Mar 10 23:34:13 UTC 2025

@hkBst
Copy link
Member

hkBst commented Mar 11, 2025

-        // SAFETY: `self.data.escape_seq` contains an escape sequence, as is 
-        // guaranteed by the caller, and in that case `self.alive` is guaranteed 
+        // SAFETY: `self.data.escape_seq` contains an escape sequence, as is
+        // guaranteed by the caller, and in that case `self.alive` is guaranteed

I'm sure you know this, but if you highlight these lines, you can see that the trailing space is the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants