Open
Description
Repost from squizlabs/PHP_CodeSniffer#3601:
The following code sample:
<?php // comment. function foo() {}... will tokenize as follows:
Ptr | Ln | Col | Cond | ( #) | Token Type | [len]: Content ------------------------------------------------------------------------- 0 | L1 | C 1 | CC 0 | ( 0) | T_OPEN_TAG | [ 5]: <?php 1 | L2 | C 1 | CC 0 | ( 0) | T_WHITESPACE | [ 0]: 2 | L3 | C 1 | CC 0 | ( 0) | T_WHITESPACE | [ 4]: ⸱⸱⸱⸱ 3 | L3 | C 5 | CC 0 | ( 0) | T_COMMENT | [ 11]: // comment. 4 | L4 | C 1 | CC 0 | ( 0) | T_WHITESPACE | [ 4]: ⸱⸱⸱⸱ 5 | L4 | C 5 | CC 0 | ( 0) | T_FUNCTION | [ 8]: function 6 | L4 | C 13 | CC 0 | ( 0) | T_WHITESPACE | [ 1]: ⸱ 7 | L4 | C 14 | CC 0 | ( 0) | T_STRING | [ 3]: foo 8 | L4 | C 17 | CC 0 | ( 0) | T_OPEN_PARENTHESIS | [ 1]: ( 9 | L4 | C 18 | CC 0 | ( 0) | T_CLOSE_PARENTHESIS | [ 1]: ) 10 | L4 | C 19 | CC 0 | ( 0) | T_WHITESPACE | [ 1]: ⸱ 11 | L4 | C 20 | CC 0 | ( 0) | T_OPEN_CURLY_BRACKET | [ 1]: { 12 | L4 | C 21 | CC 0 | ( 0) | T_CLOSE_CURLY_BRACKET | [ 1]: } 13 | L4 | C 22 | CC 0 | ( 0) | T_WHITESPACE | [ 0]:
Looking at the above, raised some questions for me regarding the
length
provided in the token array as it does not seem to include new line characters, Is this intentional ?
I never did get an answer to this question.
Some background behind the question:
- The
length
key is typically an efficiency feature, which can prevent lots of calls to thestrlen()
function. - The
length
key is often used instead of an indent calculation, however, for tokens which can include a new line character and may include leading indentation (like, amongst others,T_INLINE_HTML
andT_COMMENT
), an extra calculation is needed -length - strlen(ltrim(content))
, however, thelength
is unreliable as the actual length may be longer due to the new line character not being included, meaning that if these type of tokens could be the target of the length determination, you will still always need to do a call tostrlen()
instead of being able to use the pre-calculatedlength
.
This is counter-intuitive, inefficient and means that contributors need to have that detailed token knowledge.