feat: Support parsing all describable nodes in gherkin-32 mode#361
feat: Support parsing all describable nodes in gherkin-32 mode#361acoulton merged 8 commits intoBehat:masterfrom
gherkin-32 mode#361Conversation
gherkin-32 modegherkin-32 mode
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #361 +/- ##
============================================
- Coverage 95.86% 95.60% -0.26%
- Complexity 669 675 +6
============================================
Files 44 44
Lines 1934 1980 +46
============================================
+ Hits 1854 1893 +39
- Misses 80 87 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
See #362 for my proposal for the ArrayLoader. |
318caa5 to
b87921e
Compare
gherkin-32 modegherkin-32 mode
| * @internal | ||
| */ | ||
| public function shouldRemoveFeatureDescriptionPadding(): bool | ||
| public function shouldRemoveDescriptionPadding(): bool |
There was a problem hiding this comment.
Renamed because in legacy mode, padding is also removed from descriptions that get combined into title on elements that did not previously support a description.
b87921e to
8ea63e4
Compare
| $title = trim($token['value'] ?? ''); | ||
| $description = null; | ||
| ['title' => $title, 'description' => $description] = $this->parseTitleAndDescription($token); |
There was a problem hiding this comment.
Trimming and coalescing to & from null now all happens in parseTitleAndDescription depending on the compatibility mode.
| if (is_string($node)) { | ||
| if ($this->compatibilityMode->shouldRemoveFeatureDescriptionPadding()) { | ||
| $text = preg_replace('/^\s{0,' . ($token['indent'] + 2) . '}|\s*$/', '', $node); | ||
| $description .= ($description !== null ? "\n" : '') . $text; | ||
| continue; | ||
| } | ||
|
|
||
| if ($node === "\n" && $description === null) { | ||
| // Ignore empty lines before the start of the description | ||
| continue; | ||
| } | ||
|
|
||
| // It must be part of the feature description (text & newlines later in the document will be consumed as | ||
| // part of parsing Background / Scenario before execution returns to this loop). | ||
| $description .= $node; | ||
| if ($node !== "\n") { | ||
| // Text nodes do not end with a newline, add one. The final trailing newline is rtrimmed below. | ||
| $description .= "\n"; | ||
| } |
There was a problem hiding this comment.
Any Text lines that are validly part of the Feature description will have been consumed by parseTitleAndDescription before we enter the loop.
Once in the loop, the only string lines we can get are blank lines - everything else will be consumed (or rejected) as part of processing a Background or Scenario / Scenario Outline child.
| $allowedTokenTypes = ['Step', 'Newline', 'Text', 'Comment']; | ||
| while (in_array($this->predictTokenType(), $allowedTokenTypes)) { | ||
| // NB: Technically, we do not support `Text` inside this loop. However, there is no situation where `Text` | ||
| // can be a direct child or immediately following a Scenario. Therefore, we consume it here as the most | ||
| // logical context for throwing an UnexpectedParserNodeException. | ||
|
|
There was a problem hiding this comment.
For all of the child node types, as with parseFeature any Text lines that are validly part of the description will now be consumed before we enter the loop.
However, I have left the loops accepting Text nodes (which will then trigger the UnexpectedParserNodeException) for cases where there is text at an invalid position in the feature file e.g.:
# tests/Cucumber/extra_testdata/bad/unknown_step_type.feature
Feature:
Scenario:
Given some step
Aaand some step
In cases like this:
a) IMO it is anyway clearer to throw Expected Step, Examples table, or end of Scenario but got ... as now, than exiting the loop and doing the throw from parseFeature with a message like Expected Background, Scenario or Outline
b) There is much less risk of unexpected impact (in particular on legacy mode) if we keep consuming and validating the same node types in each method, so that control flow moves between parser methods on the same lines of each file as it did before.
| \assert(\array_key_exists('keyword', $token)); | ||
| $keyword = $token['keyword']; | ||
| $tags = empty($this->tags) ? [] : $this->popTags(); | ||
| ['title' => $title, 'description' => $description] = $this->parseTitleAndDescription($token); |
There was a problem hiding this comment.
Note, this will actually produce a small behaviour change in legacy mode.
Previously, we ignored any title on an Examples: line (the $token['value'] was parsed but just discarded). Control flow then moves to parseTableRows which only accepts TableRow|Newline|Comment.
Therefore if there was text on the line below Examples: this would have caused a ParserException due to the unexpected Text content.
Now, even in legacy mode, those lines will be read and the ExampleTableNode will be created with a (possibly multiline) name and a null description (the same as we parse Scenario: etc in legacy mode).
I think this is OK since any files with syntax like that would have been invalid until now?
There was a problem hiding this comment.
Accepting files that were previously parse errors is indeed OK to me from a BC point of view.
| }; | ||
|
|
||
| if ($this->compatibilityMode->shouldRemoveDescriptionPadding()) { | ||
| $text = preg_replace('/^\s{0,' . ($keywordToken['indent'] + 2) . '}|\s*$/', '', $text); |
There was a problem hiding this comment.
This is the regex that the individual methods were originally using to un-indent / trim text lines when parsing their nodes.
|
The code coverage errors are due to the new @stof do you have an opinion on whether
I am leaning to matching both, so that scenarios are filtered the same as now regardless of the parser mode, but I'm not sure? |
8ea63e4 to
8abb1f6
Compare
|
I think NameFilter needs to match the name and description combined, to match the same thing than before (when the name was multiline). And we can think about the future handling of filters in Behat as a separate point. |
8abb1f6 to
7408ded
Compare
| /** | ||
| * @return ?string | ||
| */ | ||
| public function getDescription(); |
There was a problem hiding this comment.
Unfortunately this interface method can only have a soft typehint to avoid a BC break when applying it to the existing FeatureNode::getDescription().
I have, however, given all the new implementations of the method a hard typehint.
|
@stof I've rebased this and finished the work on the filters : I think it is now complete. |
|
Based on code coverage reports, I have the feeling that the case of empty lines between steps of the background is not covered by tests. |
|
@acoulton any chance to add the missing test coverage there ? |
|
@stof yeah, I'm hoping to get back to this asap - I also want to check if we have enough coverage of the parsing in |
|
Waiting for #392 first |
Adds the properties, getters, and supporting code ahead of implementing support for parsing these values from the feature file.
We can now parse descriptions for Background, Scenario, Scenario Outline and Examples nodes in the gherkin file. Previously, these were parsed but were included as a multiline title.
This is an edge case, but cucumber/gherkin allows the `Examples:` keyword and table rows to appear within feature and background descriptions. This is because these elements are parsed as text unless already in a context that supports Examples (e.g. a Scenario / Scenario Outline).
When parsing in `GherkinCompatibilityMode::GHERKIN_32`, multiline text for a Scenario (or Outline) will be split across the title and description. This could cause the NameFilter to unexpectedly stop matching scenarios with multiline text if the new parsing mode has been enabled. This would be a breaking change (and could cause false positives in CI systems if scenarios that were assumed to be running were no longer tested). Therefore, NameFilter is instead changed to consider both title and description to mirror existing behaviour. This required introducing a new interface to identify nodes that may have a description, since `NameFilter::isScenarioMatch` accepts any `ScenarioInterface` and that interface (as well as e.g. ExampleNode) does not contain a `getDescription` method. I have added that interface to all nodes - even those that are not relevant for this feature - to avoid future confusion.
7408ded to
d6a66f7
Compare
| // The only time we use $token['value'] is if we got a `Text` token. | ||
| // ->expectTokenType('Text') is tagged as returning a `TStringValueToken`, where 'value' cannot be null | ||
| // However PHPStan cannot follow the chain through predictTokenType -> expectTokenType -> $token['type'] | ||
| assert($text !== null, 'Text token value should not be null'); |
There was a problem hiding this comment.
Without this, PHPStan was complaining that the $text argument to preg_replace could be null (when in fact it cannot be). I'm not sure if there's a cleverer way to solve this.
| name: |- | ||
| examples with description | ||
| This is an examples description | ||
| description: ~ |
There was a problem hiding this comment.
As discussed originally (e.g. https://github.com/Behat/Gherkin/pull/361/files#r2107874660) there are now some cases where legacy mode will now parse files that would previously have caused an Exception, due to multiline text or reserved words in unexpected places.
In those cases, they'll be parsed with a multiline name / title and a null description, consistent with how we parse other nodes in legacy mode.
| Background: That is described | ||
|
|
||
| With multiple paragraphs | ||
|
|
||
| Of description | ||
|
|
||
| Given something | ||
|
|
||
| And something else |
There was a problem hiding this comment.
This new feature file covers presence of blank lines between steps in a Background (and in all other parts of a gherkin file).
| if ($node === "\n") { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
This line is not covered by tests.
I don't think it can actually ever happen:
- Before we get to this point,
parseTitleAndDescriptionhas already consumed any comments or newlines after theBackground:line and before the firstStep. - Therefore, the only ones of our
$allowedTokenTypesthat we can actually receive for the first iteration is aSteporText. - If it is a
Text, we throw. - If it is a
Step, we callparseExpression(), which internally callsparseStep(). parseStep()itself consumes any comments or blank lines that follow the step as part of scanning ahead to find potential PyString / TableRow nodes.- Therefore on our subsequent iterations the only
$allowedTokenTypeswe can actually receive are againSteporText.
If I change $allowedTypes to just ['Step', 'Text'] and remove this line, all the tests still pass.
I don't think there are any edge cases that could break this - we could potentially change it now, or keep it as un-covered code for absolute safety.
My temptation is to leave it as-is in this branch, then remove it in a separate PR for clarity.
|
@stof this is updated and extra test case added for a feature with blank lines everywhere but it hasn't increased the coverage - see this explanation The other missing coverage is just the new getters. |
Historically, we have not supported parsing descriptions as a separate concept for Examples, Background, Scenario or Scenario Outline. Text following the keyword line was instead parsed as a multi-line title.
This PR will implement support for capturing these as standalone properties when running in cucumber/gherkin mode.
I have borrowed from the work done by @jojo1981 in #254, but I have started from a clean branch as there have been a lot of changes to constructors, property types etc since then. I'll ensure their input is credited in the final release notes.
Fixes #154 #211