-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve triple quotes and prefixes for strings #15818
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
self.p(flags.prefix().as_str()); | ||
self.p(flags.quote_str()); | ||
self.p(body); | ||
self.p(flags.quote_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.p(flags.prefix().as_str()); | |
self.p(flags.quote_str()); | |
self.p(body); | |
self.p(flags.quote_str()); | |
self.p(&flags.format_string_contents(body)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did you think about the str::from_utf8
call above this? As far as I know this should be a safe unwrap
or unreachable!
, but I went with this approach just in case. I figured it was better to fall back on the normal escaping code than to panic if I were wrong, but it does complicate the flow a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now it doesn't seem like it's possible to construct an AST node for a literal-bytes expression that includes a non-ASCII character in it: https://play.ruff.rs/0b3b0efb-74cd-4e26-a885-9180e23e4506
However, our parser supports error recovery and the specifics of exactly what nodes representing invalid syntax will look like are an implementation detail. Similarly, we don't currently run any of our AST-based rules on files that contain invalid syntax (let alone try to autofix them), but that might well change in the future.
All told, I think I prefer your current approach here, though I'd also be interested in @dhruvmanila's thoughts. It might be good to add a comment, whatever we end up doing here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks! I expanded my comment in p_bytes_repr
a bit more, hopefully capturing some of this information. Otherwise I'm interested in Dhruv's thoughts too!
This seems a bit separate, but I guess we could even make a stronger assertion that s.iter().all(u8::is_ascii)
if we wanted. Although we'd still need to call from_utf8
to get something to write in our buffer anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. The main goal of error recovery would be to preserve all the information of the original source i.e., the parser shouldn't drop any of the tokens. This would happen on a case by case basis but, for example, in this scenario the parser could just store the raw bytes and set a flag that it contains non-ascii bytes. All downstream tools would then need to account for this additional information.
Another thing to point out is that the generator is mainly used to provide auto-fix and we might decide that even though we would provide diagnostics for source code with syntax errors, we won't be providing any auto-fix for it as it might prove even more difficult to consider all the cases.
I think the current logic is fine for now and we can think about this more when we start working on this.
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! The only other thing I'd do is get rid of the Default
implementation for AnyStringFlags
, for a consistent interface to the other *Flags
structs
Thank you! Sorry I neglected the |
That works for me! Or we could just inline it into |
Co-authored-by: Alex Waygood <[email protected]>
Co-authored-by: Alex Waygood <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've mainly an understand question:
Who's responsibility is it to correctly handle nested quotes? Is it the generator or the code building the AST? If it is the generator, what happens in cases where it's impossible to pick the right quotes?
crates/ruff_linter/src/rules/flake8_simplify/rules/split_static_string.rs
Show resolved
Hide resolved
if flags.prefix().is_raw() && self.p_raw_bytes(s, flags).is_ok() { | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this lead to a syntax error if flags
uses a quote that is contained in the string? For example, what if I have b" and 'tert"
and flags specify single quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is true, but the only way to get into that state is if the user of the Generator
hard-coded a request for single quotes. The intention was to pass along flags from input string literals (so the input would have to be br' and 'tert'
, which is already a syntax error, or to use Checker::default_bytes_flags
if there's not an existing literal to preserve, which I think should also handle this case.
Basically you'd have to write code like
BytesLiteral {
value: Box::from(*b" and 'tert"),
range: TextRange::default(),
flags: BytesLiteralFlags::empty()
.with_quote_style(Quote::Single)
.with_prefix(ByteStringPrefix::Raw { uppercase_r: false }),
}
to cause the issue, at least as far as I can think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception to this would be if a refactor e.g moves a byte string into an f-string expression and the target python version is 3.12. But arguably, there isn't much the generator can do about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point, I forgot about rules moving these into f-strings. I think UnicodeEscape
will still get a chance to run in that case, as long as the outer f-string isn't raw too, but this is good to keep in mind.
It's still the generator's responsibility to handle nested quotes. All of the code paths except raw strings end up going through either I tested your |
crates/ruff_linter/resources/test/fixtures/flake8_simplify/SIM905.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
fn display_contents(self, contents: &str) -> DisplayFlags { | ||
DisplayFlags { | ||
prefix: self.prefix(), | ||
quote_str: self.quote_str(), | ||
contents, | ||
} | ||
} | ||
} | ||
|
||
pub struct DisplayFlags<'a> { | ||
prefix: AnyStringPrefix, | ||
quote_str: &'a str, | ||
contents: &'a str, | ||
} | ||
|
||
impl std::fmt::Display for DisplayFlags<'_> { | ||
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
write!( | ||
f, | ||
"{prefix}{quote}{contents}{quote}", | ||
prefix = self.prefix, | ||
quote = self.quote_str, | ||
contents = self.contents | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative here might be this (the diff is relative to your branch), which would mean the DisplayFlags
struct would take up slightly less memory. Curious what @MichaReiser thinks of the pros and cons. I think there's almost certainly not much practical difference to what you have now:
diff --git a/crates/ruff_python_ast/src/nodes.rs b/crates/ruff_python_ast/src/nodes.rs
index 3471776f2..70b21e7db 100644
--- a/crates/ruff_python_ast/src/nodes.rs
+++ b/crates/ruff_python_ast/src/nodes.rs
@@ -977,7 +977,7 @@ impl Ranged for FStringPart {
}
}
-pub trait StringFlags: Copy {
+pub trait StringFlags: Copy + Into<AnyStringFlags> {
/// Does the string use single or double quotes in its opener and closer?
fn quote_style(self) -> Quote;
@@ -1029,16 +1029,14 @@ pub trait StringFlags: Copy {
fn display_contents(self, contents: &str) -> DisplayFlags {
DisplayFlags {
- prefix: self.prefix(),
- quote_str: self.quote_str(),
+ flags: self.into(),
contents,
}
}
}
pub struct DisplayFlags<'a> {
- prefix: AnyStringPrefix,
- quote_str: &'a str,
+ flags: AnyStringFlags,
contents: &'a str,
}
@@ -1047,8 +1045,8 @@ impl std::fmt::Display for DisplayFlags<'_> {
write!(
f,
"{prefix}{quote}{contents}{quote}",
- prefix = self.prefix,
- quote = self.quote_str,
+ prefix = self.flags.prefix(),
+ quote = self.flags.quote_str(),
contents = self.contents
)
}
diff --git a/crates/ruff_python_parser/src/token.rs b/crates/ruff_python_parser/src/token.rs
index ecda19730..d741b15a4 100644
--- a/crates/ruff_python_parser/src/token.rs
+++ b/crates/ruff_python_parser/src/token.rs
@@ -777,6 +777,12 @@ impl TokenFlags {
}
}
+impl From<TokenFlags> for AnyStringFlags {
+ fn from(flags: TokenFlags) -> Self {
+ flags.as_any_string_flags()
+ }
+}
+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too fond of trait chains. I'd rather have a into_any_string_flags
method but I don't think it matters much. Display
is rarely used.
} | ||
} | ||
|
||
pub struct DisplayFlags<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the name of this struct because it doesn't just display the flags -- it displays the string contents embedded in the prefix and quotes, which are supplied by the flags. DisplayStringLiteralWithQuotesAndPrefix
is kinda verbose, though, so I'm not sure what a better name might be. (AKA: I'm fine with it being merged as-is!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point. I also wanted to make the struct private, but it's in a public trait. One thing we could do is just return impl Display
from display_contents
? Then we could use as verbose an internal name as we want and arguably better capture the intention here without having to name a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I like that idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of that is that the trait is no longer object safe, but it might not matter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point. I'm somewhat leaning toward leaving the code as it is here, for both the struct name and its contents (re the conversation above) if that's okay with you both.
I don't really love the new trait bound or From
impl either, but it does make the struct smaller, so I'm happy either way, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine to leave everything as is!
* main: (66 commits) [red-knot] Use ternary decision diagrams (TDDs) for visibility constraints (#15861) [`pyupgrade`] Rename private type parameters in PEP 695 generics (`UP049`) (#15862) Simplify the `StringFlags` trait (#15944) [`flake8-pyi`] Make `PYI019` autofixable for `.py` files in preview mode as well as stubs (#15889) Docs (`linter.md`): clarify that Python files are always searched for in subdirectories (#15882) [`flake8-pyi`] Make PEP-695 functions with multiple type parameters fixable by PYI019 again (#15938) [red-knot] Use unambiguous invalid-syntax-construct for suppression comment test (#15933) Make `Binding::range()` point to the range of a type parameter's name, not the full type parameter (#15935) Update black deviations (#15928) [red-knot] MDTest: Fix line numbers in error messages (#15932) Preserve triple quotes and prefixes for strings (#15818) [red-knot] Hand-written MDTest parser (#15926) [`pylint`] Fix missing parens in unsafe fix for `unnecessary-dunder-call` (`PLC2801`) (#15762) nit: docs for ignore & select (#15883) [airflow] `BashOperator` has been moved to `airflow.providers.standard.operators.bash.BashOperator` (AIR302) (#15922) [`flake8-logging`] `.exception()` and `exc_info=` outside exception handlers (`LOG004`, `LOG014`) (#15799) [red-knot] Enforce specifying paths for mdtest code blocks in a separate preceding line (#15890) [red-knot] Internal refactoring of visibility constraints API (#15913) [red-knot] Implicit instance attributes (#15811) [`flake8-comprehensions`] Handle extraneous parentheses around list comprehension (`C403`) (#15877) ...
Summary
This is a follow-up to #15726, #15778, and #15794 to preserve the triple quote and prefix flags in plain strings, bytestrings, and f-strings.
I also added a
StringLiteralFlags::without_triple_quotes
method to avoid passing along triple quotes in rules like SIM905 where it might not make sense, as discussed here.Test Plan
Existing tests, plus many new cases in the
generator::tests::quote
test that should cover all combinations of quotes and prefixes, at least for simple string bodies.Closes #7799 when combined with #15694, #15726, #15778, and #15794.