Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(transformers): fix matching indices for word-highlight #909

Merged
merged 4 commits into from
Feb 18, 2025

Conversation

artt
Copy link
Contributor

@artt artt commented Jan 27, 2025

Description

Partially fixes #908 and add tests.

Additional context

Users still cannot freely choose which instance the highlight should be on (regex will choose whatever comes up first).

It would be nice to allow users to manually specify which character (by index) the highlight should be. For example, `js {1[6:12],3-4} to highlight characters 6 through 12 of the first line, and lines 3+4. But that seems like a lot more work. At least this PR will eliminate the error.

Copy link

netlify bot commented Jan 27, 2025

Deploy Preview for shiki-next ready!

Name Link
🔨 Latest commit 91a2ba6
🔍 Latest deploy log https://app.netlify.com/sites/shiki-next/deploys/67b44c3ae119500008b92593
😎 Deploy Preview https://deploy-preview-909--shiki-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Jan 27, 2025

Deploy Preview for shiki-matsu ready!

Name Link
🔨 Latest commit 91a2ba6
🔍 Latest deploy log https://app.netlify.com/sites/shiki-matsu/deploys/67b44c3a0535820008e72c43
😎 Deploy Preview https://deploy-preview-909--shiki-matsu.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

// eslint-disable-next-line no-cond-assign
while ((i = str.indexOf(substr, i + 1)) !== -1)
indexes.push(i)
const re = new RegExp(substr, 'g')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to properly escape the substr if we want to use that approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is already handled by this function parseMetaHighlightWords. If you could help give some examples that would be awesome.

https://github.com/shikijs/shiki/blob/0b9137f8413eb1f276928707c0ac92de0664a314/packages/transformers/src/transformers/meta-highlight-word.ts#L3C1-L13C2

And, do we want to use this approach?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safest way to escape regex special characters is to use Regex+ and interpolate the string to escape. Nearly as safe but with noisier output would be to use native RegExp.escape (an ES proposal), which isn't supported by browsers yet. The most popular way to escape regex special characters (based on npm download stats) is to use the unsafe but lightweight escape-string-regexp. Context safety isn't relevant when the entire regex pattern is an escaped string, though, so no worries.

If it was me, I might just add an inline .replace(/[|\\{}()[\]^$+*?.]/g, '\\$&') with no libraries. This doesn't escape chars that might need to be escaped based on context (-, ,, digits, etc.), since no context awareness is needed when the escaped string is being used as the entire regex pattern.

But then, I don't understand why you're moving from string search to regex search in the first place.

parseMetaHighlightWords isn't escaping special characters; it seems to be matching JS regex literals? I also notice it has a couple minor issue issues with that: it will inappropriately match /foo\/ (no unescaped trailing /) and it will not correctly match /[/]/ (unescaped / is allowed in JS character classes, unless the regex uses flag v).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then, I don't understand why you're moving from string search to regex search in the first place.

Problem from #908. But yes, this is one out of the many ways to solve it. And it doesn't solve all the problems. Maybe I'll close this first and move it to discussion to see what people think first.

// eslint-disable-next-line no-cond-assign
while ((i = str.indexOf(substr, i + 1)) !== -1)
indexes.push(i)
const re = new RegExp(substr, 'g')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safest way to escape regex special characters is to use Regex+ and interpolate the string to escape. Nearly as safe but with noisier output would be to use native RegExp.escape (an ES proposal), which isn't supported by browsers yet. The most popular way to escape regex special characters (based on npm download stats) is to use the unsafe but lightweight escape-string-regexp. Context safety isn't relevant when the entire regex pattern is an escaped string, though, so no worries.

If it was me, I might just add an inline .replace(/[|\\{}()[\]^$+*?.]/g, '\\$&') with no libraries. This doesn't escape chars that might need to be escaped based on context (-, ,, digits, etc.), since no context awareness is needed when the escaped string is being used as the entire regex pattern.

But then, I don't understand why you're moving from string search to regex search in the first place.

parseMetaHighlightWords isn't escaping special characters; it seems to be matching JS regex literals? I also notice it has a couple minor issue issues with that: it will inappropriately match /foo\/ (no unescaped trailing /) and it will not correctly match /[/]/ (unescaped / is allowed in JS character classes, unless the regex uses flag v).

indexes.push(i)
const re = new RegExp(substr, 'g')
let match = re.exec(str)
while (match !== null) {
Copy link
Collaborator

@slevithan slevithan Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will loop forever if re matches an empty string. Is that possible here?

@antfu antfu changed the title fix(transformerMetaWordHighlight): use regex instead of substring to find matching indices fix(transformerMetaWordHighlight): fix matching indices Feb 18, 2025
@antfu antfu changed the title fix(transformerMetaWordHighlight): fix matching indices fix(transformers): fix matching indices for word-highlight Feb 18, 2025
@antfu
Copy link
Member

antfu commented Feb 18, 2025

Now I see the problem. It runs out to be a bug, which we don't need regex for fixing

@antfu antfu merged commit 57a09ad into shikijs:main Feb 18, 2025
12 checks passed
antfu added a commit that referenced this pull request Feb 18, 2025
* fix(transformerMetaWordHighlight): use regex instead of substring to find matching indices; add tests

* chore: fix

* chore: update

---------

Co-authored-by: Anthony Fu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

transformerMetaWordHighlight fail to highlight overlapping text instances
3 participants