-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve preg_split()
function ReturnType
#3757
base: 2.1.x
Are you sure you want to change the base?
Conversation
This pull request has been marked as ready for review. |
} else { | ||
$flagType = $scope->getType($flagArg->value); | ||
$flags = $flagType->getConstantScalarValues(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By replacing it as follows, type checking within multiple Constant loops will no longer be necessary.
$flags = [];
$flagType = $scope->getType($flagArg->value);
foreach ($flagType->getConstantScalarValues() as $flag) {
if (!is_int()) {
return new ErrorType();
}
$flags[] = $flag;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved 8cb3030
resources/functionMap.php
Outdated
@@ -9081,7 +9081,7 @@ | |||
'preg_replace' => ['string|array|null', 'regex'=>'string|array', 'replace'=>'string|array', 'subject'=>'string|array', 'limit='=>'int', '&w_count='=>'int'], | |||
'preg_replace_callback' => ['string|array|null', 'regex'=>'string|array', 'callback'=>'callable(array<int|string, string>):string', 'subject'=>'string|array', 'limit='=>'int', '&w_count='=>'int'], | |||
'preg_replace_callback_array' => ['string|array|null', 'pattern'=>'array<string,callable>', 'subject'=>'string|array', 'limit='=>'int', '&w_count='=>'int'], | |||
'preg_split' => ['list<string>|false', 'pattern'=>'string', 'subject'=>'string', 'limit='=>'?int', 'flags='=>'int'], | |||
'preg_split' => ['__benevolent<list<string>|list<array{string, int<0, max>}>|false>', 'pattern'=>'string', 'subject'=>'string', 'limit='=>'?int', 'flags='=>'int'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other preg_ method are not using benevolent union, so I would think more consistent to not use a benevolent union here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. I would like to remove benevolent union.
On the other hand, I think that preg_split should not return false unless there is an issue with the regular expression. Furthermore, in this PR, I have modified the code so that if the regular expression is incorrect, an error is returned early in the parsing process.
Therefore, if the regular expression is correct, I am considering not adding false as a Union.
(In this case, this bug can also be fixed.
phpstan-src/tests/PHPStan/Analyser/AnalyserIntegrationTest.php
Lines 890 to 900 in 76740fd
public function testBug7554(): void | |
{ | |
$errors = $this->runAnalyse(__DIR__ . '/data/bug-7554.php'); | |
$this->assertCount(2, $errors); | |
$this->assertSame(sprintf('Parameter #1 $%s of function count expects array|Countable, list<array<int, int<0, max>|string>>|false given.', PHP_VERSION_ID < 80000 ? 'var' : 'value'), $errors[0]->getMessage()); | |
$this->assertSame(26, $errors[0]->getLine()); | |
$this->assertSame('Cannot access offset int<1, max> on list<array{string, int<0, max>}>|false.', $errors[1]->getMessage()); | |
$this->assertSame(27, $errors[1]->getLine()); | |
} |
If you think not to use benevolent union, do you think it would be fine to remove false? I would like to hear your opinion on this. I would like to get your opinion before making any modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like any other preg method I think it can returns false if an internal error occurs like
- some memory limit is reached
- too many recursion
- some invalid encoding
And in the pho.ini there is some config like pcre.recursion_limit or pcre.backtrack_limit.
So I would keep a non-benevolent union AND false.
If we decide to remove false from the signature it should be removed from all the preg methods. But I dont think we should go this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand.
I modified it to keep a non-benevolent union AND false.
@staabm As this is about regexes, can you please review it? Thank you. |
|
||
if ( | ||
count($patternConstantTypes) > 0 | ||
&& @preg_match($patternConstantTypes[0]->getValue(), '') === false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually us Strings::match
$subjectType = $scope->getType($subjectArg->value); | ||
$subjectConstantTypes = $subjectType->getConstantStrings(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move the subject type declartion below the early returning if in the next block
$flagType = $scope->getType($flagArg->value); | ||
foreach ($flagType->getConstantScalarValues() as $flag) { | ||
if (!is_int($flag)) { | ||
return new ErrorType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this case is missing a test
$limitType = $scope->getType($limitArg->value); | ||
foreach ($limitType->getConstantScalarValues() as $limit) { | ||
if (!is_int($limit)) { | ||
return new ErrorType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this case is missing a test
foreach ($subjectConstantTypes as $subjectConstantType) { | ||
foreach ($limits as $limit) { | ||
foreach ($flags as $flag) { | ||
$result = @preg_split($patternConstantType->getValue(), $subjectConstantType->getValue(), $limit, $flag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Strings::split
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using Strings::split here is not right because the limit is fixed to -1.
} else { | ||
$limitType = $scope->getType($limitArg->value); | ||
foreach ($limitType->getConstantScalarValues() as $limit) { | ||
if (!is_int($limit)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numeric-string $limit is not an error
|
||
if ( | ||
count($patternConstantTypes) > 0 | ||
&& @preg_match($patternConstantTypes[0]->getValue(), '') === false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to check all patterns not only the first
} else { | ||
$flagType = $scope->getType($flagArg->value); | ||
foreach ($flagType->getConstantScalarValues() as $flag) { | ||
if (!is_int($flag)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent with limit, this might also allow numeric-string
} | ||
|
||
return null; | ||
return TypeCombinator::union(...$resultTypes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are missing false
. in the preg_match
inference we decided this can get false even if all args a valid and static analysis time known, because a regex pattern might be super inefficient (or pattern based attacks might trick the regex engine into return false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the above comment is still true and we are missing the false
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, does this mean that every possible result of preg_split includes the possibility of false, and therefore we need to add false to the union type?
I had implemented it to return an Error if preg_split returns false, as a warning.
So, does this mean I should include false in all cases, instead of returning Error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type of false
at runtime is necessary because the preg_split
call can fail even if we know everything IIRC.
The current "return ErrorType" could be turned into "return null" in case other rules will already report a phpstan error for the code examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed about that in the following commit.
307cf54
Additionally, since handling for the false case is no longer necessary, I have removed if ($result === false)
.
} | ||
} | ||
|
||
if (count($patternConstantTypes) === 0 || count($subjectConstantTypes) === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if-branch might be factored out into a private method for readability
if ($result === false) { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if one of the static analysis time values make preg_split
return false
we should give-up instead of ignoring this fact
@staabm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I set __benevolent<list|list<array{string, int<0, max>}>|false> to the basic return type.
it seems this is no longer true
$limits = [-1]; | ||
} else { | ||
$limitType = $scope->getType($limitArg->value); | ||
if (!$limitType->isInteger()->yes() && !$limitType->isString()->yes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this won't work for a limit like 5|'17'
. could work with ->toInteger()
beforehand.
same for $flag
please add tests for this cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that using toInteger() beforehand wasn't ideal because it would convert array, null or some inputs into int(0), which is incorrect. So, I added a check using isConstantScalarValue() to ensure we can accurately handle values like 5|'14'.
|
||
$returnInternalValueType = $returnStringType; | ||
if ($flagArg !== null) { | ||
$flagState = $this->bitwiseFlagAnalyser->bitwiseOrContainsConstant($flagArg->value, $scope, 'PREG_SPLIT_OFFSET_CAPTURE'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$flagState = $this->bitwiseFlagAnalyser->bitwiseOrContainsConstant($flagArg->value, $scope, 'PREG_SPLIT_OFFSET_CAPTURE'); | |
$capturesOffset = $this->bitwiseFlagAnalyser->bitwiseOrContainsConstant($flagArg->value, $scope, 'PREG_SPLIT_OFFSET_CAPTURE'); |
$returnNonEmptyStrings = $flagArg !== null && $this->bitwiseFlagAnalyser->bitwiseOrContainsConstant($flagArg->value, $scope, 'PREG_SPLIT_NO_EMPTY')->yes(); | ||
if ($returnNonEmptyStrings) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would inline this only once used variable to ease reading the code
} | ||
|
||
return null; | ||
return TypeCombinator::union(...$resultTypes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the above comment is still true and we are missing the false
here
I developed further extended the extension for the
preg_split()
function.The
preg_split()
function can specify ReturnType with the following cases:$pattern
argument is invalid.$pattern
and$subject
arguments are constants.$subject
argument is non-empty-string.$flag
argument is set to one or more of the following:PREG_SPLIT_OFFSET_CAPTURE
,PREG_SPLIT_NO_EMPTY
, orPREG_SPLIT_DELIM_CAPTURE
.The detailed cases are specified in the test cases.