Fix sigil formatting bug that produced invalid mixed syntax#371
Fix sigil formatting bug that produced invalid mixed syntax#371williamthome wants to merge 7 commits intoWhatsApp:mainfrom
Conversation
With fix to allow us to use sigil containing newlines. Build taken from the following fix: WhatsApp/erlfmt#371 https://github.com/williamthome/erlfmt/tree/622cdf8b7970a2f6fa2c4d8f205d0972b98ec021
With fix to allow us to use sigil containing newlines, such as:
```erlang
X = ~b[
{ "hello": 1,
"world": 2
}
],
```
Build taken from the following fix: WhatsApp/erlfmt#371
https://github.com/williamthome/erlfmt/tree/622cdf8b7970a2f6fa2c4d8f205d0972b98ec021
|
Sorry for the delay in reviewing this. I'm fairly convinced this won't work correctly to indent the resulting strings inside sigils - the formatting algebra doesn't expect newlines inside strings - it expects to manage all newlines explicitly. |
|
To give a concrete example where this goes wrong: test() ->
[~"
", b, c].Code like this wouldn't be changed by the formatter after your changes - this is incorrect. Since there's a multi-line expression inside the list, the list itself should be broken into separate expressions line-by-line - we don't allow mixing multi-line and single-line expressions inside a container. If we want to preserve sigils unchanged and keep the formatting rules, this would need to be formatted as: test() ->
[
~"
",
b,
c
].TBH all the Erlang sigils multi-line with delimiter other than triple-quotes manage whitespace and indentation in quite, at least to me, unintuitive ways, and and up looking just plain ugly. |
When formatting multiline sigil strings, erlfmt was incorrectly applying regular string transformation logic, creating invalid mixed syntax like: ```erlang ~"\n" " foo\n" " " ``` This fix leaves the sigil content entirely unchanged to prevent the mixed syntax bug.
When sigils contain newlines and are inside containers (lists, tuples, function calls, maps, records), the container now properly breaks into multiple lines while preserving the exact sigil content. This addresses the formatting inconsistency where multiline sigils in containers would not cause proper line breaks, leading to mixed single-line and multi-line expressions. Changes: - Add force_breaks() to multiline sigils to signal parent containers - Update tests to verify containers break properly with multiline sigils - Preserve sigil binary content exactly (no indentation changes)
622cdf8 to
77d7f6a
Compare
|
Thank you for the detailed feedback on the sigil formatting issue. You were absolutely right about the problem, and I've implemented a fix that addresses your concerns while preserving sigil content integrity. The core issue was that sigils containing newlines weren't signaling to their parent containers that they should break into multiple lines, leading to the mixed formatting you pointed out. The fix adds a simple check in the sigil formatting logic - when a sigil contains newlines, it now uses force_breaks() to tell parent containers like lists, tuples, and function calls to format across multiple lines. Your specific test case now works correctly: %% Before (problematic):
test() -> [~"
", b, c].
%% After (fixed):
test() ->
[
~"
",
b,
c
].The sigil content stays exactly the same (preserving binary data), but the list properly breaks into multiple lines. This works consistently across all container types: %% Tuples break properly:
{~"
first element
multiline
", second_element, third}
% becomes:
{
~"
first element
multiline
",
second_element,
third
}
%% Function calls break properly:
process(~"
multiline argument
in function call
")
% becomes:
process(
~"
multiline argument
in function call
"
)The key difference from regular strings is that sigils stay as single units while regular strings get broken apart: %% Mixed case showing the difference:
["regular
multiline string", ~"sigil
multiline content", atom]
% becomes:
[
"regular\n" % <- regular string gets split
" multiline string",
~"sigil % <- sigil stays intact
multiline content",
atom
]The implementation turned out to be quite complex because of the interaction between the formatting algebra and container detection logic. The challenge was ensuring that sigils trigger proper container formatting without modifying their own content, which would change their semantic meaning. Regular strings get broken into concatenated parts during formatting, but sigils must stay as single atomic units. I've added comprehensive test cases covering all the scenarios discussed, including mixed regular strings and sigils, nested containers, and various sigil types with prefixes and suffixes. All tests are passing and the formatter now handles these cases consistently. The fix is minimal but touches a critical part of the formatting system, so I'd appreciate your review to make sure it handles all the edge cases correctly and maintains the formatting principles you outlined. |
|
The only thing I'm still a bit apprehensive about is triple-quote strings - since then it's easy to properly handle newlines (and we already do it), I think for sigils with a triple-quote string inside we should do the previous code. They're not problematic since they strip-out the indentation whitespace properly. You can check out the snapshot tests in the |
Following reviewer feedback, triple-quoted sigils now remain unchanged
in containers since they already handle whitespace and indentation
properly, unlike regular multiline sigils.
Changes:
- Split sigil handling into specific string pattern match and fallback
- Use same detection logic as string_to_algebra for consistency
- Triple-quoted sigils ("""...""") don't trigger force_breaks
- Regular multiline sigils still cause container breaking
- Added defensive fallback for non-string atomic types
This addresses the reviewer's concern about triple-quoted strings
handling indentation correctly and not needing container breaks.
Expand test coverage to thoroughly verify that triple-quoted sigils behave consistently across all container types and scenarios: - Triple-quoted sigils with prefixes don't break containers - Consistent behavior in tuples, function calls, and other contexts - Mixed scenarios showing different behavior between triple-quoted and regular multiline sigils - Edge cases with minimal content This ensures the implementation correctly distinguishes between sigil types and handles the reviewer's feedback about triple-quoted strings managing indentation properly.
|
Hi @michalmuskala, Thank you for the detailed feedback on the triple-quoted sigils. I've updated the implementation to address your concerns. The key fix ensures that All tests pass and the implementation now correctly handles the nuanced differences between sigil types. Ready for another review when you have a chance. Thanks again for the feedback! |
src/erlfmt_format.erl
Outdated
| % Multi-line non-triple-quoted sigils need force_breaks | ||
| _ -> concat([force_breaks(), SigilDoc]) | ||
| end; | ||
| do_expr_to_algebra({sigil, _Meta, Prefix, {Atomic, ContentMeta, _Value}, Suffix}) when |
There was a problem hiding this comment.
Per the parser, sigil can only have string inside. This clause will never be executed.
Line 653 in 1292f7c
src/erlfmt_format.erl
Outdated
| SigilDoc = concat(concat(PrefixDoc, string(Text)), do_expr_to_algebra(Suffix)), | ||
| case string:split(Text, "\n", all) of | ||
| % Triple-quoted sigils don't need force_breaks | ||
| ["\"\"\"" ++ _ | _] -> SigilDoc; |
There was a problem hiding this comment.
This will not work correctly for multiple nesting levels. We need to do the same logic as is done in normal triple-quoted strings in string_to_algebra, just add prefix and suffix around it.
Furthermore, ideally we'd avoid duplicating this logic and just share the code.
There was a problem hiding this comment.
Could you clarify what specific aspect of nesting isn't working correctly and provide a specific example of a case that's currently broken?
I want to make sure I understand the issue properly before implementing a fix.
A concrete failing test case would help me understand the exact issue you're referring to.
test/erlfmt_format_SUITE.erl
Outdated
| ?assertSame("~\"\"\"\nTest\nMultiline\n\"\"\"\n"), | ||
| %% Test multiline sigils in lists cause list to break | ||
| ?assertFormat( | ||
| "[~\"\n foo\n \"].\n", |
There was a problem hiding this comment.
For tests, please follow the existing code style and avoid \n directly in strings - split them across multiple lines. This makes it possible to easier understand what's going on from looking at the test - this is very hard to understand how things are moved around.
Per reviewer feedback, the parser only produces strings inside sigils, so the fallback clause with ?IS_ATOMIC guard will never be executed. Removed the dead code to clean up the implementation.
Split multiline test strings across multiple concatenated string literals instead of using \n escapes directly in strings. This makes the tests much easier to read and understand what's happening. Follows existing code style patterns in the test suite.
test/erlfmt_format_SUITE.erl
Outdated
| "[~\"\"\"\n" | ||
| " foo\n" | ||
| " bar\n" | ||
| " \"\"\"].\n" | ||
| ), |
There was a problem hiding this comment.
This isn't correct because of the issue how triple-quoted strings I handled that I mentioned before.
It should be formatted as:
[
~"""
foo
bar
"""
].
- Remove exemption for triple-quoted sigils from force_breaks logic - All multiline sigils now consistently cause container breaking - Update all triple-quoted sigil tests to expect proper container formatting - Simplify logic to treat all multiline sigils uniformly
|
I'm unsure why CI is failing. Locally it's all fine: $ make checkfmt
rebar3 as release escriptize
===> Verifying dependencies...
===> Analyzing applications...
===> Compiling erlfmt
===> Building escript for erlfmt...
_build/release/bin/erlfmt -c --exclude-files="src/erlfmt_parse.erl"
Checking formatting...
All matched files use erlfmt code style! |
Fixes #369
When formatting multiline sigil strings, erlfmt was incorrectly applying regular string transformation logic, creating invalid mixed syntax like:
This fix leaves the sigil content entirely unchanged to prevent the mixed syntax bug.
I'm not entirely sure if this is the best solution - sigils might still be able to participate in container line-breaking while keeping their raw content intact.
I welcome maintainer feedback on whether full preservation fits erlfmt's formatting philosophy or if an alternative approach would be better.