Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve triple quotes and prefixes for strings #15818

Merged
merged 29 commits into from
Feb 4, 2025
Merged

Conversation

ntBre
Copy link
Contributor

@ntBre ntBre commented Jan 29, 2025

Summary

This is a follow-up to #15726, #15778, and #15794 to preserve the triple quote and prefix flags in plain strings, bytestrings, and f-strings.

I also added a StringLiteralFlags::without_triple_quotes method to avoid passing along triple quotes in rules like SIM905 where it might not make sense, as discussed here.

Test Plan

Existing tests, plus many new cases in the generator::tests::quote test that should cover all combinations of quotes and prefixes, at least for simple string bodies.

Closes #7799 when combined with #15694, #15726, #15778, and #15794.

Copy link
Contributor

github-actions bot commented Jan 29, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved
Comment on lines 172 to 175
self.p(flags.prefix().as_str());
self.p(flags.quote_str());
self.p(body);
self.p(flags.quote_str());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.p(flags.prefix().as_str());
self.p(flags.quote_str());
self.p(body);
self.p(flags.quote_str());
self.p(&flags.format_string_contents(body));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you think about the str::from_utf8 call above this? As far as I know this should be a safe unwrap or unreachable!, but I went with this approach just in case. I figured it was better to fall back on the normal escaping code than to panic if I were wrong, but it does complicate the flow a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it doesn't seem like it's possible to construct an AST node for a literal-bytes expression that includes a non-ASCII character in it: https://play.ruff.rs/0b3b0efb-74cd-4e26-a885-9180e23e4506

However, our parser supports error recovery and the specifics of exactly what nodes representing invalid syntax will look like are an implementation detail. Similarly, we don't currently run any of our AST-based rules on files that contain invalid syntax (let alone try to autofix them), but that might well change in the future.

All told, I think I prefer your current approach here, though I'd also be interested in @dhruvmanila's thoughts. It might be good to add a comment, whatever we end up doing here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks! I expanded my comment in p_bytes_repr a bit more, hopefully capturing some of this information. Otherwise I'm interested in Dhruv's thoughts too!

This seems a bit separate, but I guess we could even make a stronger assertion that s.iter().all(u8::is_ascii) if we wanted. Although we'd still need to call from_utf8 to get something to write in our buffer anyway.

Copy link
Member

@dhruvmanila dhruvmanila Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. The main goal of error recovery would be to preserve all the information of the original source i.e., the parser shouldn't drop any of the tokens. This would happen on a case by case basis but, for example, in this scenario the parser could just store the raw bytes and set a flag that it contains non-ascii bytes. All downstream tools would then need to account for this additional information.

Another thing to point out is that the generator is mainly used to provide auto-fix and we might decide that even though we would provide diagnostics for source code with syntax errors, we won't be providing any auto-fix for it as it might prove even more difficult to consider all the cases.

I think the current logic is fine for now and we can think about this more when we start working on this.

crates/ruff_python_codegen/src/generator.rs Outdated Show resolved Hide resolved
crates/ruff_python_codegen/src/generator.rs Outdated Show resolved Hide resolved
crates/ruff_python_codegen/src/generator.rs Outdated Show resolved Hide resolved
crates/ruff_python_codegen/src/generator.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/display.rs Outdated Show resolved Hide resolved
@AlexWaygood AlexWaygood added bug Something isn't working fixes Related to suggested fixes for violations labels Jan 29, 2025
crates/ruff_python_codegen/src/generator.rs Outdated Show resolved Hide resolved
crates/ruff_python_literal/src/escape.rs Outdated Show resolved Hide resolved
crates/ruff_python_literal/src/escape.rs Outdated Show resolved Hide resolved
crates/ruff_python_literal/src/escape.rs Outdated Show resolved Hide resolved
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The only other thing I'd do is get rid of the Default implementation for AnyStringFlags, for a consistent interface to the other *Flags structs

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved
@ntBre
Copy link
Contributor Author

ntBre commented Jan 30, 2025

Thank you! Sorry I neglected the Default change from our conversation earlier. It looks like AnyStringFlags::default is only used in AnyStringFlags::new. I added a private empty method instead for now, but I could also make empty public, and add all of the docs from the other Flags. What do you think?

@AlexWaygood
Copy link
Member

I added a private empty method instead for now

That works for me! Or we could just inline it into AnyStringFlags::new()

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved
crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved
ntBre and others added 2 commits January 30, 2025 18:11
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mainly an understand question:

Who's responsibility is it to correctly handle nested quotes? Is it the generator or the code building the AST? If it is the generator, what happens in cases where it's impossible to pick the right quotes?

crates/ruff_python_ast/src/nodes.rs Outdated Show resolved Hide resolved
crates/ruff_python_codegen/src/generator.rs Outdated Show resolved Hide resolved
Comment on lines 153 to 155
if flags.prefix().is_raw() && self.p_raw_bytes(s, flags).is_ok() {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this lead to a syntax error if flags uses a quote that is contained in the string? For example, what if I have b" and 'tert" and flags specify single quotes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is true, but the only way to get into that state is if the user of the Generator hard-coded a request for single quotes. The intention was to pass along flags from input string literals (so the input would have to be br' and 'tert', which is already a syntax error, or to use Checker::default_bytes_flags if there's not an existing literal to preserve, which I think should also handle this case.

Basically you'd have to write code like

BytesLiteral {
            value: Box::from(*b" and 'tert"),
            range: TextRange::default(),
            flags: BytesLiteralFlags::empty()
                .with_quote_style(Quote::Single)
                .with_prefix(ByteStringPrefix::Raw { uppercase_r: false }),
        }

to cause the issue, at least as far as I can think of.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception to this would be if a refactor e.g moves a byte string into an f-string expression and the target python version is 3.12. But arguably, there isn't much the generator can do about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point, I forgot about rules moving these into f-strings. I think UnicodeEscape will still get a chance to run in that case, as long as the outer f-string isn't raw too, but this is good to keep in mind.

@ntBre
Copy link
Contributor Author

ntBre commented Feb 3, 2025

Who's responsibility is it to correctly handle nested quotes? Is it the generator or the code building the AST? If it is the generator, what happens in cases where it's impossible to pick the right quotes?

It's still the generator's responsibility to handle nested quotes. All of the code paths except raw strings end up going through either UnicodeEscape::with_preferred_quote (in the case of normal strings and f-strings) or AsciiEscape::with_preferred_quote (for bytestrings). These try to use the quote you pass in but will change the outer quotes if needed or even start adding escaped quotes in the worst case.

I tested your split case above with SIM905 (after temporarily modifying the rule to preserve triple quotes) and the output was ['''"itemA"''', """'itemB'""", """'''itemC'''"""]. itemA is probably the most interesting because the outer triple double quotes from the input got converted to single quotes to preserve the inner double quotes.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines +1030 to 1055
fn display_contents(self, contents: &str) -> DisplayFlags {
DisplayFlags {
prefix: self.prefix(),
quote_str: self.quote_str(),
contents,
}
}
}

pub struct DisplayFlags<'a> {
prefix: AnyStringPrefix,
quote_str: &'a str,
contents: &'a str,
}

impl std::fmt::Display for DisplayFlags<'_> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"{prefix}{quote}{contents}{quote}",
prefix = self.prefix,
quote = self.quote_str,
contents = self.contents
)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative here might be this (the diff is relative to your branch), which would mean the DisplayFlags struct would take up slightly less memory. Curious what @MichaReiser thinks of the pros and cons. I think there's almost certainly not much practical difference to what you have now:

diff --git a/crates/ruff_python_ast/src/nodes.rs b/crates/ruff_python_ast/src/nodes.rs
index 3471776f2..70b21e7db 100644
--- a/crates/ruff_python_ast/src/nodes.rs
+++ b/crates/ruff_python_ast/src/nodes.rs
@@ -977,7 +977,7 @@ impl Ranged for FStringPart {
     }
 }
 
-pub trait StringFlags: Copy {
+pub trait StringFlags: Copy + Into<AnyStringFlags> {
     /// Does the string use single or double quotes in its opener and closer?
     fn quote_style(self) -> Quote;
 
@@ -1029,16 +1029,14 @@ pub trait StringFlags: Copy {
 
     fn display_contents(self, contents: &str) -> DisplayFlags {
         DisplayFlags {
-            prefix: self.prefix(),
-            quote_str: self.quote_str(),
+            flags: self.into(),
             contents,
         }
     }
 }
 
 pub struct DisplayFlags<'a> {
-    prefix: AnyStringPrefix,
-    quote_str: &'a str,
+    flags: AnyStringFlags,
     contents: &'a str,
 }
 
@@ -1047,8 +1045,8 @@ impl std::fmt::Display for DisplayFlags<'_> {
         write!(
             f,
             "{prefix}{quote}{contents}{quote}",
-            prefix = self.prefix,
-            quote = self.quote_str,
+            prefix = self.flags.prefix(),
+            quote = self.flags.quote_str(),
             contents = self.contents
         )
     }
diff --git a/crates/ruff_python_parser/src/token.rs b/crates/ruff_python_parser/src/token.rs
index ecda19730..d741b15a4 100644
--- a/crates/ruff_python_parser/src/token.rs
+++ b/crates/ruff_python_parser/src/token.rs
@@ -777,6 +777,12 @@ impl TokenFlags {
     }
 }
 
+impl From<TokenFlags> for AnyStringFlags {
+    fn from(flags: TokenFlags) -> Self {
+        flags.as_any_string_flags()
+    }
+}
+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too fond of trait chains. I'd rather have a into_any_string_flags method but I don't think it matters much. Display is rarely used.

}
}

pub struct DisplayFlags<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the name of this struct because it doesn't just display the flags -- it displays the string contents embedded in the prefix and quotes, which are supplied by the flags. DisplayStringLiteralWithQuotesAndPrefix is kinda verbose, though, so I'm not sure what a better name might be. (AKA: I'm fine with it being merged as-is!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point. I also wanted to make the struct private, but it's in a public trait. One thing we could do is just return impl Display from display_contents? Then we could use as verbose an internal name as we want and arguably better capture the intention here without having to name a type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I like that idea!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of that is that the trait is no longer object safe, but it might not matter here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point. I'm somewhat leaning toward leaving the code as it is here, for both the struct name and its contents (re the conversation above) if that's okay with you both.

I don't really love the new trait bound or From impl either, but it does make the struct smaller, so I'm happy either way, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine to leave everything as is!

@ntBre ntBre merged commit b5e5271 into main Feb 4, 2025
21 checks passed
@ntBre ntBre deleted the brent/preserve-triple branch February 4, 2025 13:41
dcreager added a commit that referenced this pull request Feb 4, 2025
* main: (66 commits)
  [red-knot] Use ternary decision diagrams (TDDs) for visibility constraints (#15861)
  [`pyupgrade`] Rename private type parameters in PEP 695 generics (`UP049`) (#15862)
  Simplify the `StringFlags` trait (#15944)
  [`flake8-pyi`] Make `PYI019` autofixable for `.py` files in preview mode as well as stubs (#15889)
  Docs (`linter.md`): clarify that Python files are always searched for in subdirectories (#15882)
  [`flake8-pyi`] Make PEP-695 functions with multiple type parameters fixable by PYI019 again (#15938)
  [red-knot] Use unambiguous invalid-syntax-construct for suppression comment test (#15933)
  Make `Binding::range()` point to the range of a type parameter's name, not the full type parameter (#15935)
  Update black deviations (#15928)
  [red-knot] MDTest: Fix line numbers in error messages (#15932)
  Preserve triple quotes and prefixes for strings (#15818)
  [red-knot] Hand-written MDTest parser (#15926)
  [`pylint`] Fix missing parens in unsafe fix for `unnecessary-dunder-call` (`PLC2801`) (#15762)
  nit: docs for ignore & select (#15883)
  [airflow] `BashOperator` has been moved to `airflow.providers.standard.operators.bash.BashOperator` (AIR302) (#15922)
  [`flake8-logging`] `.exception()` and `exc_info=` outside exception handlers (`LOG004`, `LOG014`) (#15799)
  [red-knot] Enforce specifying paths for mdtest code blocks in a separate preceding line (#15890)
  [red-knot] Internal refactoring of visibility constraints API (#15913)
  [red-knot] Implicit instance attributes (#15811)
  [`flake8-comprehensions`] Handle extraneous parentheses around list comprehension (`C403`) (#15877)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixes Related to suggested fixes for violations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Code generator doesn't preserve individual string quote style
4 participants