Skip to content

📎 Embedded language formatting #3334

Open
Open
@ah-yu

Description

@ah-yu

Preface

Some popular libraries allow code snippets in other languages to be embedded within JavaScript code. Users want to format these embedded code snippets within JavaScript to enhance the development experience.

Design

Simply put, the idea is to extract the code snippets from template strings, format them using the respective language's formatter, and then replace them back into the template string.

Handling Interpolation

We need to parse the entire template string and then format it based on the parsing results. However, template strings with interpolations are not valid CSS code (using CSS as an example here). Therefore, we need to preprocess the interpolations, turning the template string into a more valid CSS code. We plan to replace interpolations with a special string and then reinsert them after formatting.

To maximize parsing success, we chose to replace interpolations with grit metavariables. The reason for this choice you can find in #3228 (comment)

Changes to the Public API

Since JavaScript formatters cannot directly format code in other languages, we need to use external tools to format these other languages' code. To achieve this, we designed a generic trait instead of relying on specific implementations, maximizing the decoupling between different language formatters.

enum JsForeignLanguage {
    Css,
}

trait JsForeignLanguageFormatter {
    fn format(&self, language: JsForeignLanguage, source: &str) -> FormatResult<Document>;
}

Then we can add a new parameter to the format_node function to pass in the formatter for other languages.

pub fn format_node(
    options: JsFormatOptions,
+   foreign_language_formatter: impl JsForeignLanguageFormatter,
    root: &JsSyntaxNode,
) -> FormatResult<Formatted<JsFormatContext>> {
    biome_formatter::format_node(
        root,
        JsFormatLanguage::new(options, foreign_language_formatter),
    )
}

CLI

When formatting JavaScript files, we need to be aware of other languages' settings. For example, when formatting CSS code, we need to know the CSS formatter's settings.

LSP

The LSP provides a feature called format_range that formats code snippets. This feature relies on SourceMarkers generated during the printing process. Generating a SourceMarker depends on the position information of tokens in the source code. This position information is contained in the following two FormatElements:

DynamicText {
/// There's no need for the text to be mutable, using `Box<str>` safes 8 bytes over `String`.
text: Box<str>,
/// The start position of the dynamic token in the unformatted source code
source_position: TextSize,
},
/// A token for a text that is taken as is from the source code (input text and formatted representation are identical).
/// Implementing by taking a slice from a `SyntaxToken` to avoid allocating a new string.
LocatedTokenText {
/// The start position of the token in the unformatted source code
source_position: TextSize,
/// The token text
slice: TokenText,
},

Since the formatting of embedded languages is done by extracting, preprocessing, and then separately parsing and formatting them, the source_position in these two FormatElement is inaccurate, and the entire template string is handled as a whole. Therefore, I recommend erasing these inaccurate source_position. It is acceptable to erase them because the format_range function will still be able to find the SourceMarker closest to the range start and end. If there is a need to format parts of the embedded code in the future, we can revisit this issue.

Tasks

  • introduce grit metavariable in CSS
  • change the public API
  • preprocess template strings and handle the generated format elements.

Metadata

Metadata

Assignees

Labels

A-FormatterArea: formatterL-JavaScriptLanguage: JavaScript and super languagesS-FeatureStatus: new feature to implement

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions