Skip to content

PEAR/Squiz/InlineComment: document handling of emoji hash comments #802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jrfnl
Copy link
Member

@jrfnl jrfnl commented Jan 31, 2025

Description

Based on some rumblings on the interwebs, I've done some investigating.

It was basically claimed that the emoji keycap sign for the hash character could be used for hash comments.
Based on my investigation, this turns out to be true.
Technical explanation: as the emoji keycap sign is a combined unicode character of which the first codepoint is the # character, the PHP tokenizer will see whatever comes after as a comment and tokenize it as such.

In a follow-up message, it was claimed that using the emoji keycap sign for the hash character could also be used for attributes.
Based on my investigation, this (unsurprisingly) turns out to be false.
Technical explanation: for attributes, the # sign and the [ bracket need to be next to each without any characters between them for the syntax to be recognized as the start of an attribute. This was implemented like so to keep the BC-break with hash comments as small as possible when attributes were introduced in PHP 8.0.
As the emoji keycap sign is a combined character (U+23 U+FE0F U+20E3) , there are multiple other codepoints between the # and the [, which means that this will not tokenize as T_ATTRIBUTE in PHP and therefore shouldn't in PHPCS either.

For now, I'm just adding a test to both the PEAR and Squiz InlineComment sniffs to document how emoji-hash comments are handled by PHPCS.
I am aware that the fixer output is not "clean", i.e. it leaves the second and third code point of the emoji in place.

I did consider adding special handling in the fixers, but decided against this for the following reasons:

  1. This is likely to be a rare edge case.
  2. I suspect the most reliable way to handle this would require the intl extension to use graphmeme functions. I'm not inclined to make the intl extension a requirement for PHPCS at this time.
  3. An alternative way to handle this could be via unicode escape codes, but those are a PHP 7.0+ feature and cannot be used in PHPCS (yet).

For now, I believe documenting the handling will need to suffice.

Suggested changelog entry

N/A

Related issues/external references

Inspired by the following tweet: https://x.com/christophrumpel/status/1862568698730401986 and it's response: https://x.com/joshmanders/status/1862884555910160451 and their mention in the January 2025 PHP Annotated.

Copy link
Member

@fredden fredden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One query regarding a new error on an unchanged line within the test file.

@jrfnl jrfnl force-pushed the feature/pear-squiz-inlinecomments-add-tests-with-emoji-hash-comments branch from 10ac2a0 to 31bee90 Compare January 31, 2025 23:01
Copy link
Member

@fredden fredden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move that "must be last in file" test to its own file - like we do for intentional parse errors?

I'm not a fan of the swearing.

Based on some rumblings on the interwebs, I've done some investigating.

It was basically claimed that the emoji keycap sign for the hash character could be used for hash comments.
Based on my investigation, this turns out to be **true**.
Technical explanation: as the emoji keycap sign is a combined unicode character of which the first codepoint is the `#` character, the PHP tokenizer will see whatever comes after as a comment and tokenize it as such.

In a follow-up message, it was claimed that using the emoji keycap sign for the hash character could also be used for attributes.
Based on my investigation, this (unsurprisingly) turns out to be **false**.
Technical explanation: for attributes, the `#` sign and the `[` bracket need to be next to each without any characters between them for the syntax to be recognized as the start of an attribute. This was implemented like so to keep the BC-break with hash comments as small as possible when attributes were introduced in PHP 8.0.
As the emoji keycap sign is a combined character (U+23 U+FE0F U+20E3) , there are multiple other codepoints between the `#` and the `[`, which means that this will not tokenizer as `T_ATTRIBUTE` in PHP and therefore shouldn't in PHPCS either.

For now, I'm just adding a test to both the PEAR and Squiz `InlineComment` sniffs to document how emoji-hash comments are handled by PHPCS.
I am aware that the fixer output is not "clean", i.e. it leaves the second and third code point of the emoji in place.

I did consider adding special handling in the fixers, but decided against this for the following reasons:
1. This is likely to be a rare edge case.
2. I suspect the most reliable way to handle this would require the `intl` extension to use `graphmeme` functions. I'm not inclined to make the `intl` extension a requirement for PHPCS at this time.
3. An alternative way to handle this could be via unicode escape codes, but those are a PHP 7.0+ feature and cannot be used in PHPCS (yet).

For now, I believe documenting the handling will need to suffice.

---

Inspired by the following tweet: https://x.com/christophrumpel/status/1862568698730401986 and it's response: https://x.com/joshmanders/status/1862884555910160451 and their mention in the [January 2025 PHP Annotated](https://blog.jetbrains.com/phpstorm/2025/01/php-annotated-january-2025/).
@jrfnl
Copy link
Member Author

jrfnl commented Feb 5, 2025

Should we move that "must be last in file" test to its own file - like we do for intentional parse errors?

I think that would be a good idea, but outside the scope of this PR. Probably good if this is done when the sniff is reviewed for code coverage and missing tests at some point in the future.

I'm not a fan of the swearing.

Fair point and you're right, I shouldn't let my frustration about the silly things people do with PHP get the better of me. Fixed now.

@jrfnl jrfnl force-pushed the feature/pear-squiz-inlinecomments-add-tests-with-emoji-hash-comments branch from 31bee90 to fdedc77 Compare February 5, 2025 17:28
@jrfnl jrfnl merged commit fc1f7c0 into master Feb 5, 2025
57 checks passed
@jrfnl jrfnl deleted the feature/pear-squiz-inlinecomments-add-tests-with-emoji-hash-comments branch February 5, 2025 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants