Skip to content

feat: Support parsing all describable nodes in gherkin-32 mode#361

Merged
acoulton merged 8 commits intoBehat:masterfrom
acoulton:feat-describable-nodes
Nov 24, 2025
Merged

feat: Support parsing all describable nodes in gherkin-32 mode#361
acoulton merged 8 commits intoBehat:masterfrom
acoulton:feat-describable-nodes

Conversation

@acoulton
Copy link
Contributor

@acoulton acoulton commented May 26, 2025

Historically, we have not supported parsing descriptions as a separate concept for Examples, Background, Scenario or Scenario Outline. Text following the keyword line was instead parsed as a multi-line title.

This PR will implement support for capturing these as standalone properties when running in cucumber/gherkin mode.

I have borrowed from the work done by @jojo1981 in #254, but I have started from a clean branch as there have been a lot of changes to constructors, property types etc since then. I'll ensure their input is credited in the final release notes.

Fixes #154 #211

@acoulton acoulton changed the title [WIPfeat: Support parsing all describable nodes in gherkin-32 mode [WIP] feat: Support parsing all describable nodes in gherkin-32 mode May 26, 2025
@codecov
Copy link

codecov bot commented May 26, 2025

Codecov Report

❌ Patch coverage is 91.35802% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.60%. Comparing base (9dec914) to head (a1fb0be).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/Node/ExampleTableNode.php 63.63% 4 Missing ⚠️
src/Node/BackgroundNode.php 0.00% 2 Missing ⚠️
src/Parser.php 97.61% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #361      +/-   ##
============================================
- Coverage     95.86%   95.60%   -0.26%     
- Complexity      669      675       +6     
============================================
  Files            44       44              
  Lines          1934     1980      +46     
============================================
+ Hits           1854     1893      +39     
- Misses           80       87       +7     
Flag Coverage Δ
php8.1 95.60% <91.35%> (-0.26%) ⬇️
php8.1--with=symfony/yaml:^5.4 95.60% <91.35%> (-0.26%) ⬇️
php8.1--with=symfony/yaml:^6.4 95.60% <91.35%> (-0.26%) ⬇️
php8.2 95.60% <91.35%> (-0.26%) ⬇️
php8.3 95.60% <91.35%> (-0.26%) ⬇️
php8.4 95.60% <91.35%> (-0.26%) ⬇️
php8.5 95.60% <91.35%> (-0.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stof
Copy link
Member

stof commented May 26, 2025

See #362 for my proposal for the ArrayLoader.

@acoulton acoulton force-pushed the feat-describable-nodes branch 2 times, most recently from 318caa5 to b87921e Compare May 26, 2025 21:23
@acoulton acoulton changed the title [WIP] feat: Support parsing all describable nodes in gherkin-32 mode feat: Support parsing all describable nodes in gherkin-32 mode May 26, 2025
* @internal
*/
public function shouldRemoveFeatureDescriptionPadding(): bool
public function shouldRemoveDescriptionPadding(): bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed because in legacy mode, padding is also removed from descriptions that get combined into title on elements that did not previously support a description.

@acoulton acoulton force-pushed the feat-describable-nodes branch from b87921e to 8ea63e4 Compare May 26, 2025 21:27
Comment on lines -205 to +238
$title = trim($token['value'] ?? '');
$description = null;
['title' => $title, 'description' => $description] = $this->parseTitleAndDescription($token);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trimming and coalescing to & from null now all happens in parseTitleAndDescription depending on the compatibility mode.

Comment on lines -219 to -269
if (is_string($node)) {
if ($this->compatibilityMode->shouldRemoveFeatureDescriptionPadding()) {
$text = preg_replace('/^\s{0,' . ($token['indent'] + 2) . '}|\s*$/', '', $node);
$description .= ($description !== null ? "\n" : '') . $text;
continue;
}

if ($node === "\n" && $description === null) {
// Ignore empty lines before the start of the description
continue;
}

// It must be part of the feature description (text & newlines later in the document will be consumed as
// part of parsing Background / Scenario before execution returns to this loop).
$description .= $node;
if ($node !== "\n") {
// Text nodes do not end with a newline, add one. The final trailing newline is rtrimmed below.
$description .= "\n";
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any Text lines that are validly part of the Feature description will have been consumed by parseTitleAndDescription before we enter the loop.

Once in the loop, the only string lines we can get are blank lines - everything else will be consumed (or rejected) as part of processing a Background or Scenario / Scenario Outline child.

Comment on lines 282 to +317
$allowedTokenTypes = ['Step', 'Newline', 'Text', 'Comment'];
while (in_array($this->predictTokenType(), $allowedTokenTypes)) {
// NB: Technically, we do not support `Text` inside this loop. However, there is no situation where `Text`
// can be a direct child or immediately following a Scenario. Therefore, we consume it here as the most
// logical context for throwing an UnexpectedParserNodeException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of the child node types, as with parseFeature any Text lines that are validly part of the description will now be consumed before we enter the loop.

However, I have left the loops accepting Text nodes (which will then trigger the UnexpectedParserNodeException) for cases where there is text at an invalid position in the feature file e.g.:

# tests/Cucumber/extra_testdata/bad/unknown_step_type.feature
Feature:

    Scenario:
        Given some step
        Aaand some step

In cases like this:
a) IMO it is anyway clearer to throw Expected Step, Examples table, or end of Scenario but got ... as now, than exiting the loop and doing the throw from parseFeature with a message like Expected Background, Scenario or Outline
b) There is much less risk of unexpected impact (in particular on legacy mode) if we keep consuming and validating the same node types in each method, so that control flow moves between parser methods on the same lines of each file as it did before.

\assert(\array_key_exists('keyword', $token));
$keyword = $token['keyword'];
$tags = empty($this->tags) ? [] : $this->popTags();
['title' => $title, 'description' => $description] = $this->parseTitleAndDescription($token);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, this will actually produce a small behaviour change in legacy mode.

Previously, we ignored any title on an Examples: line (the $token['value'] was parsed but just discarded). Control flow then moves to parseTableRows which only accepts TableRow|Newline|Comment.

Therefore if there was text on the line below Examples: this would have caused a ParserException due to the unexpected Text content.

Now, even in legacy mode, those lines will be read and the ExampleTableNode will be created with a (possibly multiline) name and a null description (the same as we parse Scenario: etc in legacy mode).

I think this is OK since any files with syntax like that would have been invalid until now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accepting files that were previously parse errors is indeed OK to me from a BC point of view.

};

if ($this->compatibilityMode->shouldRemoveDescriptionPadding()) {
$text = preg_replace('/^\s{0,' . ($keywordToken['indent'] + 2) . '}|\s*$/', '', $text);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the regex that the individual methods were originally using to un-indent / trim text lines when parsing their nodes.

@acoulton
Copy link
Contributor Author

The code coverage errors are due to the new getDescription getters which are never called in tests as the assertions read the DTO state directly.

@stof do you have an opinion on whether NameFilter should:

  • match the title and description combined
  • only match the title (which will obviously include the description if the file was parsed in legacy mode)

I am leaning to matching both, so that scenarios are filtered the same as now regardless of the parser mode, but I'm not sure?

@acoulton acoulton marked this pull request as ready for review May 26, 2025 21:54
@acoulton acoulton requested a review from stof May 26, 2025 21:54
@acoulton acoulton force-pushed the feat-describable-nodes branch from 8ea63e4 to 8abb1f6 Compare May 27, 2025 08:48
@stof
Copy link
Member

stof commented Jun 3, 2025

I think NameFilter needs to match the name and description combined, to match the same thing than before (when the name was multiline).

And we can think about the future handling of filters in Behat as a separate point.

@acoulton acoulton force-pushed the feat-describable-nodes branch from 8abb1f6 to 7408ded Compare June 21, 2025 13:13
/**
* @return ?string
*/
public function getDescription();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this interface method can only have a soft typehint to avoid a BC break when applying it to the existing FeatureNode::getDescription().

I have, however, given all the new implementations of the method a hard typehint.

@acoulton
Copy link
Contributor Author

@stof I've rebased this and finished the work on the filters : I think it is now complete.

@stof
Copy link
Member

stof commented Sep 11, 2025

Based on code coverage reports, I have the feeling that the case of empty lines between steps of the background is not covered by tests.

@stof
Copy link
Member

stof commented Nov 3, 2025

@acoulton any chance to add the missing test coverage there ?

@acoulton
Copy link
Contributor Author

acoulton commented Nov 3, 2025

@stof yeah, I'm hoping to get back to this asap - I also want to check if we have enough coverage of the parsing in legacy mode to be sure this hasn't changed anything...

@acoulton
Copy link
Contributor Author

acoulton commented Nov 5, 2025

Waiting for #392 first

Adds the properties, getters, and supporting code ahead of implementing
support for parsing these values from the feature file.
We can now parse descriptions for Background, Scenario, Scenario Outline
and Examples nodes in the gherkin file. Previously, these were parsed
but were included as a multiline title.
This is an edge case, but cucumber/gherkin allows the `Examples:`
keyword and table rows to appear within feature and background
descriptions. This is because these elements are parsed as text unless
already in a context that supports Examples (e.g. a Scenario / Scenario
Outline).
When parsing in `GherkinCompatibilityMode::GHERKIN_32`, multiline text
for a Scenario (or Outline) will be split across the title and
description.

This could cause the NameFilter to unexpectedly stop matching scenarios
with multiline text if the new parsing mode has been enabled. This would
be a breaking change (and could cause false positives in CI systems if
scenarios that were assumed to be running were no longer tested).

Therefore, NameFilter is instead changed to consider both title and
description to mirror existing behaviour.

This required introducing a new interface to identify nodes that may
have a description, since `NameFilter::isScenarioMatch` accepts any
`ScenarioInterface` and that interface (as well as e.g. ExampleNode)
does not contain a `getDescription` method.

I have added that interface to all nodes - even those that are not
relevant for this feature - to avoid future confusion.
@acoulton acoulton force-pushed the feat-describable-nodes branch from 7408ded to d6a66f7 Compare November 6, 2025 11:10
Comment on lines +681 to +684
// The only time we use $token['value'] is if we got a `Text` token.
// ->expectTokenType('Text') is tagged as returning a `TStringValueToken`, where 'value' cannot be null
// However PHPStan cannot follow the chain through predictTokenType -> expectTokenType -> $token['type']
assert($text !== null, 'Text token value should not be null');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, PHPStan was complaining that the $text argument to preg_replace could be null (when in fact it cannot be). I'm not sure if there's a cleverer way to solve this.

Comment on lines +140 to +143
name: |-
examples with description
This is an examples description
description: ~
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed originally (e.g. https://github.com/Behat/Gherkin/pull/361/files#r2107874660) there are now some cases where legacy mode will now parse files that would previously have caused an Exception, due to multiline text or reserved words in unexpected places.

In those cases, they'll be parsed with a multiline name / title and a null description, consistent with how we parse other nodes in legacy mode.

Comment on lines +13 to +21
Background: That is described

With multiple paragraphs

Of description

Given something

And something else
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new feature file covers presence of blank lines between steps in a Background (and in all other parts of a gherkin file).

Comment on lines 325 to 327
if ($node === "\n") {
continue;
}
Copy link
Contributor Author

@acoulton acoulton Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not covered by tests.

I don't think it can actually ever happen:

If I change $allowedTypes to just ['Step', 'Text'] and remove this line, all the tests still pass.

I don't think there are any edge cases that could break this - we could potentially change it now, or keep it as un-covered code for absolute safety.

My temptation is to leave it as-is in this branch, then remove it in a separate PR for clarity.

@acoulton
Copy link
Contributor Author

acoulton commented Nov 6, 2025

@stof this is updated and extra test case added for a feature with blank lines everywhere but it hasn't increased the coverage - see this explanation

The other missing coverage is just the new getters.

@acoulton acoulton merged commit fad3283 into Behat:master Nov 24, 2025
9 of 11 checks passed
@acoulton acoulton deleted the feat-describable-nodes branch November 24, 2025 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Examples description not supported

2 participants