-
-
Notifications
You must be signed in to change notification settings - Fork 75
PEAR/Squiz/InlineComment: document handling of emoji hash comments #802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEAR/Squiz/InlineComment: document handling of emoji hash comments #802
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One query regarding a new error on an unchanged line within the test file.
10ac2a0
to
31bee90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move that "must be last in file" test to its own file - like we do for intentional parse errors?
I'm not a fan of the swearing.
Based on some rumblings on the interwebs, I've done some investigating. It was basically claimed that the emoji keycap sign for the hash character could be used for hash comments. Based on my investigation, this turns out to be **true**. Technical explanation: as the emoji keycap sign is a combined unicode character of which the first codepoint is the `#` character, the PHP tokenizer will see whatever comes after as a comment and tokenize it as such. In a follow-up message, it was claimed that using the emoji keycap sign for the hash character could also be used for attributes. Based on my investigation, this (unsurprisingly) turns out to be **false**. Technical explanation: for attributes, the `#` sign and the `[` bracket need to be next to each without any characters between them for the syntax to be recognized as the start of an attribute. This was implemented like so to keep the BC-break with hash comments as small as possible when attributes were introduced in PHP 8.0. As the emoji keycap sign is a combined character (U+23 U+FE0F U+20E3) , there are multiple other codepoints between the `#` and the `[`, which means that this will not tokenizer as `T_ATTRIBUTE` in PHP and therefore shouldn't in PHPCS either. For now, I'm just adding a test to both the PEAR and Squiz `InlineComment` sniffs to document how emoji-hash comments are handled by PHPCS. I am aware that the fixer output is not "clean", i.e. it leaves the second and third code point of the emoji in place. I did consider adding special handling in the fixers, but decided against this for the following reasons: 1. This is likely to be a rare edge case. 2. I suspect the most reliable way to handle this would require the `intl` extension to use `graphmeme` functions. I'm not inclined to make the `intl` extension a requirement for PHPCS at this time. 3. An alternative way to handle this could be via unicode escape codes, but those are a PHP 7.0+ feature and cannot be used in PHPCS (yet). For now, I believe documenting the handling will need to suffice. --- Inspired by the following tweet: https://x.com/christophrumpel/status/1862568698730401986 and it's response: https://x.com/joshmanders/status/1862884555910160451 and their mention in the [January 2025 PHP Annotated](https://blog.jetbrains.com/phpstorm/2025/01/php-annotated-january-2025/).
I think that would be a good idea, but outside the scope of this PR. Probably good if this is done when the sniff is reviewed for code coverage and missing tests at some point in the future.
Fair point and you're right, I shouldn't let my frustration about the silly things people do with PHP get the better of me. Fixed now. |
31bee90
to
fdedc77
Compare
Description
Based on some rumblings on the interwebs, I've done some investigating.
It was basically claimed that the emoji keycap sign for the hash character could be used for hash comments.
Based on my investigation, this turns out to be true.
Technical explanation: as the emoji keycap sign is a combined unicode character of which the first codepoint is the
#
character, the PHP tokenizer will see whatever comes after as a comment and tokenize it as such.In a follow-up message, it was claimed that using the emoji keycap sign for the hash character could also be used for attributes.
Based on my investigation, this (unsurprisingly) turns out to be false.
Technical explanation: for attributes, the
#
sign and the[
bracket need to be next to each without any characters between them for the syntax to be recognized as the start of an attribute. This was implemented like so to keep the BC-break with hash comments as small as possible when attributes were introduced in PHP 8.0.As the emoji keycap sign is a combined character (U+23 U+FE0F U+20E3) , there are multiple other codepoints between the
#
and the[
, which means that this will not tokenize asT_ATTRIBUTE
in PHP and therefore shouldn't in PHPCS either.For now, I'm just adding a test to both the PEAR and Squiz
InlineComment
sniffs to document how emoji-hash comments are handled by PHPCS.I am aware that the fixer output is not "clean", i.e. it leaves the second and third code point of the emoji in place.
I did consider adding special handling in the fixers, but decided against this for the following reasons:
intl
extension to usegraphmeme
functions. I'm not inclined to make theintl
extension a requirement for PHPCS at this time.For now, I believe documenting the handling will need to suffice.
Suggested changelog entry
N/A
Related issues/external references
Inspired by the following tweet: https://x.com/christophrumpel/status/1862568698730401986 and it's response: https://x.com/joshmanders/status/1862884555910160451 and their mention in the January 2025 PHP Annotated.