Skip to content

feat: Determine comment tokens from injection layers #12759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 53 commits into
base: master
Choose a base branch
from

Conversation

nik-rev
Copy link
Contributor

@nik-rev nik-rev commented Feb 2, 2025

Previously, if you had a file like this:

<p>Some text 1234</p>
<script type="text/javascript">
  // bar();
  foo();
</script>

Pressing Space + c (toggle comment) on the JavaScript comment would've used the HTML comment token:

<p>Some text 1234</p>
<script type="text/javascript">
  <!-- // bar(); -->
  foo();
</script>

This PR fixes that. Now, the comment token is properly recognized:

<p>Some test 1234</p>
<script type="text/javascript">
  bar();
  foo();
</script>

It also works for continue comment functionality (when pressing o or adding a newline, for example)

Additionally, the PR adds a new :tree-sitter-injections command:

`:tree-sitter-injections` output

With this file:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Multi-language Example</title>
    <style>
      /* CSS Example */
      body {
        font-family: Arial, sans-serif;
        background-color: #f0f0f0;
        margin: 0;
        padding: 0;
      }
      .container {
        max-width: 800px;
        margin: 20px auto;
        background: #fff;
        border: 1px solid #ddd;
        border-radius: 8px;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
        padding: 20px;
      }
      .highlight {
        color: #ff5722;
        font-weight: bold;
      }
    </style>
    ### Limitations If you have a file like this:
  </head>
  <body>
    <div class="container">
      <h1>Welcome to Multi-language HTML</h1>
      <p>
        This is an example of <span class="highlight">embedded languages</span>.
      </p>
      <button id="regexButton">Test Regex</button>
      <pre id="regexResult"></pre>
    </div>
    <script>
      // JavaScript Example with JSDoc and Regex
      /**
       * Validates an email using a regex pattern.
       * @param {string} email - The email to validate.
       * @returns {boolean} True if valid, false otherwise.
       */
      function validateEmail(email) {
        const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/; // Regex Example
        return emailRegex.test(email);
      }

      // Event Listener Example
      document.getElementById("regexButton").addEventListener("click", () => {
        const testEmails = ["[email protected]", "invalid-email", "user@domain"];
        const results = testEmails.map(
          (email) => `${email}: ${validateEmail(email)}`,
        );
        document.getElementById("regexResult").textContent = results.join("\n");
      });
    </script>
  </body>
</html>

We get the following when we run :tree-sitter-injections:

image

Languages Tested

I've added a lot of tests, each of which actually helped me fix a bug or two while I was implementing this. So I ended up extracting all comment-relating integration tests into a separate module

The core functionality works. What's left is to make sure that we do all we can to have individual languages work as well. For example, Svelte's comment tokens had to be adjusted in this PR for the best experience.

I've tested these languages manually as well. If you have a language which relies on injections, be sure to test it and comment here, I'll add it to the list.

  • html
    • css
    • javascript
  • svelte
    • javascript
    • css
    • typescript
JSX and TSX

These languages don't use tree-sitter injections:

"use client";

import { useState } from "react";

export default function Home() {
  const [clickCount, setClickCount] = useState(0);

  return (
    <div>
      My counter:
      <button onClick={() => setClickCount(4)}>{clickCount}</button>
    </div>
  );
}

Has just this injection:

image

Which means this PR won't affect them
You can use Space + C to comment JSX at the moment. It'd be nice if you could use Space + c to make line comments that use the tokens {/* and */} though, as it uses // right now.

But to do this we could have .tsx and .jsx languages use ts and js as the file's "main" language, and then when we inject tsx and jsx languages when we encounter a tag.

Not exactly sure how to accomplish this though, and it's not really related to this PR (can be done as a follow up if I or someone figures out how to do this)

Implementation

Before:

  1. Get the comment tokens for the file
  2. Map each Selection's Range to a Change
  3. Generate a Transaction from those changes

Now

  1. Get the comment tokens for the file
  2. For each Range in Selection get its comment tokens
  3. ...then flatmap each Range to a Vec<Change>
  4. Generate a Transaction from those changes

Getting an injection layer's comment tokens

  • We have a Syntax.
  • Syntax has a Map<LayerId, LanguageId> and a Loader
  • Loader has a Map<LanguageId, LanguageConfiguration>
  • LanguageConfiguration has the comment tokens

Closes #7364
Closes #11647
Related to to #9425

@nik-rev nik-rev force-pushed the determine-comment-tokens branch from 63e26be to 3281c81 Compare February 2, 2025 22:36
@nik-rev nik-rev changed the title feat: Determine comment tokens from injection layers fix: Determine comment tokens from injection layers Feb 2, 2025
@nik-rev nik-rev changed the title fix: Determine comment tokens from injection layers feat: Determine comment tokens from injection layers Feb 2, 2025
@nik-rev nik-rev marked this pull request as draft February 2, 2025 23:14
@nik-rev nik-rev marked this pull request as ready for review February 2, 2025 23:14
@nik-rev nik-rev force-pushed the determine-comment-tokens branch from 019fe67 to c585fca Compare February 2, 2025 23:19
@nik-rev nik-rev marked this pull request as draft February 2, 2025 23:23
@nik-rev nik-rev marked this pull request as ready for review February 3, 2025 13:04
@nik-rev nik-rev force-pushed the determine-comment-tokens branch from c160b1e to dd5d4e8 Compare February 4, 2025 16:54
@nik-rev nik-rev force-pushed the determine-comment-tokens branch from 1afd4c5 to 8f9fcc1 Compare February 28, 2025 14:57
@Nikita0x
Copy link
Contributor

Nikita0x commented Mar 20, 2025

Thank you for this PR! I really hope it lands ASAP and its gonna be finally truly possible to work with .vue files with Helix!
I found some issues with comments

It uses incorrect comment tokens if there are whitespaces in beginning or end? Not sure.

css.example.mp4
when.line.wraps.incorrect.tokens.mp4

@nik-rev
Copy link
Contributor Author

nik-rev commented Mar 20, 2025

Thank you for this PR! I really hope it lands ASAP and its gonna be finally truly possible to work with .vue files with Helix!
I found some issues with comments

It uses incorrect comment tokens if there are whitespaces in beginning or end? Not sure.

css.example.mp4
when.line.wraps.incorrect.tokens.mp4

Thanks for checking out the PR!

When you select the full line like that, what actually happens is that your line has 2 injections in it: the CSS (inner) and the HTML (outer)

That's because the CSS injection doesnt start at the beginning of the line. It starts after a few spaces (Its weird, but it isnt a bug)

Here, the PR chooses comment tokens based upon the most tightly encompassing injection range. Because the CSS injection isnt fully encompassing the selection, it chooses the HTML comment tokens because that layer is fully encompassing

@nik-rev
Copy link
Contributor Author

nik-rev commented Mar 20, 2025

You can verify this with :tree-sitter-injections

@Nikita0x
Copy link
Contributor

Nikita0x commented Mar 20, 2025

You can verify this with :tree-sitter-injections

Yeah, I checked that, indeed.
If it is not a bug, maybe it can be improved, because in Zed and Neovim (which both use tree-sitter) it comments correctly even with whitespaces

or maybe they comment correctly with the help of their LSP which is more sophisticated? (Im just guessing, I have no idea)

yo.mp4

@the-mikedavis the-mikedavis added A-core Area: Helix core improvements S-waiting-on-pr Status: This is waiting on another PR to be merged first labels Mar 23, 2025
@nik-rev nik-rev force-pushed the determine-comment-tokens branch from 8f9fcc1 to d09f773 Compare March 25, 2025 14:41
@Necryl
Copy link

Necryl commented Mar 26, 2025

waiting for this one to land...it is a much needed quality of life in web dev

@nik-rev
Copy link
Contributor Author

nik-rev commented Mar 31, 2025

waiting for this one to land...it is a much needed quality of life in web dev

just fyi, this PR is waiting on #12972 at the moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements S-waiting-on-pr Status: This is waiting on another PR to be merged first
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comments in JSX/TSX Determine comment tokens from injection layers
4 participants