feat: Support parsing all describable nodes in `gherkin-32` mode by acoulton · Pull Request #361 · Behat/Gherkin

acoulton · 2025-05-26T16:56:15Z

Historically, we have not supported parsing descriptions as a separate concept for Examples, Background, Scenario or Scenario Outline. Text following the keyword line was instead parsed as a multi-line title.

This PR will implement support for capturing these as standalone properties when running in cucumber/gherkin mode.

I have borrowed from the work done by @jojo1981 in #254, but I have started from a clean branch as there have been a lot of changes to constructors, property types etc since then. I'll ensure their input is credited in the final release notes.

Fixes #154 #211

Finish Parser implementation
Review impact of / on Some keywords should be allowed as free text within node descriptions #329
Consider whether we need to support descriptions in the ArrayLoader
Consider impact on filters (e.g. should name filter continue to match only on title, or should it match full title & description)

codecov · 2025-05-26T16:57:10Z

Codecov Report

❌ Patch coverage is 91.35802% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.60%. Comparing base (9dec914) to head (a1fb0be).
⚠️ Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
src/Node/ExampleTableNode.php	63.63%	4 Missing ⚠️
src/Node/BackgroundNode.php	0.00%	2 Missing ⚠️
src/Parser.php	97.61%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master     #361      +/-   ##
============================================
- Coverage     95.86%   95.60%   -0.26%     
- Complexity      669      675       +6     
============================================
  Files            44       44              
  Lines          1934     1980      +46     
============================================
+ Hits           1854     1893      +39     
- Misses           80       87       +7

Flag	Coverage Δ
php8.1	`95.60% <91.35%> (-0.26%)`	⬇️
php8.1--with=symfony/yaml:^5.4	`95.60% <91.35%> (-0.26%)`	⬇️
php8.1--with=symfony/yaml:^6.4	`95.60% <91.35%> (-0.26%)`	⬇️
php8.2	`95.60% <91.35%> (-0.26%)`	⬇️
php8.3	`95.60% <91.35%> (-0.26%)`	⬇️
php8.4	`95.60% <91.35%> (-0.26%)`	⬇️
php8.5	`95.60% <91.35%> (-0.26%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stof · 2025-05-26T17:17:21Z

See #362 for my proposal for the ArrayLoader.

acoulton · 2025-05-26T21:26:54Z

src/GherkinCompatibilityMode.php

     * @internal
     */
-    public function shouldRemoveFeatureDescriptionPadding(): bool
+    public function shouldRemoveDescriptionPadding(): bool


Renamed because in legacy mode, padding is also removed from descriptions that get combined into title on elements that did not previously support a description.

acoulton · 2025-05-26T21:29:16Z

src/Parser.php

-        $title = trim($token['value'] ?? '');
-        $description = null;
+        ['title' => $title, 'description' => $description] = $this->parseTitleAndDescription($token);


Trimming and coalescing to & from null now all happens in parseTitleAndDescription depending on the compatibility mode.

acoulton · 2025-05-26T21:31:49Z

src/Parser.php

-            if (is_string($node)) {
-                if ($this->compatibilityMode->shouldRemoveFeatureDescriptionPadding()) {
-                    $text = preg_replace('/^\s{0,' . ($token['indent'] + 2) . '}|\s*$/', '', $node);
-                    $description .= ($description !== null ? "\n" : '') . $text;
-                    continue;
-                }
-
-                if ($node === "\n" && $description === null) {
-                    // Ignore empty lines before the start of the description
-                    continue;
-                }
-
-                // It must be part of the feature description (text & newlines later in the document will be consumed as
-                // part of parsing Background / Scenario before execution returns to this loop).
-                $description .= $node;
-                if ($node !== "\n") {
-                    // Text nodes do not end with a newline, add one. The final trailing newline is rtrimmed below.
-                    $description .= "\n";
-                }


Any Text lines that are validly part of the Feature description will have been consumed by parseTitleAndDescription before we enter the loop.

Once in the loop, the only string lines we can get are blank lines - everything else will be consumed (or rejected) as part of processing a Background or Scenario / Scenario Outline child.

acoulton · 2025-05-26T21:39:37Z

src/Parser.php

        $allowedTokenTypes = ['Step', 'Newline', 'Text', 'Comment'];
        while (in_array($this->predictTokenType(), $allowedTokenTypes)) {
+            // NB: Technically, we do not support `Text` inside this loop. However, there is no situation where `Text`
+            // can be a direct child or immediately following a Scenario. Therefore, we consume it here as the most
+            // logical context for throwing an UnexpectedParserNodeException.
+


For all of the child node types, as with parseFeature any Text lines that are validly part of the description will now be consumed before we enter the loop.

However, I have left the loops accepting Text nodes (which will then trigger the UnexpectedParserNodeException) for cases where there is text at an invalid position in the feature file e.g.:

# tests/Cucumber/extra_testdata/bad/unknown_step_type.feature Feature: Scenario: Given some step Aaand some step

In cases like this:
a) IMO it is anyway clearer to throw Expected Step, Examples table, or end of Scenario but got ... as now, than exiting the loop and doing the throw from parseFeature with a message like Expected Background, Scenario or Outline
b) There is much less risk of unexpected impact (in particular on legacy mode) if we keep consuming and validating the same node types in each method, so that control flow moves between parser methods on the same lines of each file as it did before.

acoulton · 2025-05-26T21:48:01Z

src/Parser.php

        \assert(\array_key_exists('keyword', $token));
        $keyword = $token['keyword'];
        $tags = empty($this->tags) ? [] : $this->popTags();
+        ['title' => $title, 'description' => $description] = $this->parseTitleAndDescription($token);


Note, this will actually produce a small behaviour change in legacy mode.

Previously, we ignored any title on an Examples: line (the $token['value'] was parsed but just discarded). Control flow then moves to parseTableRows which only accepts TableRow|Newline|Comment.

Therefore if there was text on the line below Examples: this would have caused a ParserException due to the unexpected Text content.

Now, even in legacy mode, those lines will be read and the ExampleTableNode will be created with a (possibly multiline) name and a null description (the same as we parse Scenario: etc in legacy mode).

I think this is OK since any files with syntax like that would have been invalid until now?

Accepting files that were previously parse errors is indeed OK to me from a BC point of view.

src/Parser.php

acoulton · 2025-05-26T21:50:08Z

src/Parser.php

+            };
+
+            if ($this->compatibilityMode->shouldRemoveDescriptionPadding()) {
+                $text = preg_replace('/^\s{0,' . ($keywordToken['indent'] + 2) . '}|\s*$/', '', $text);


This is the regex that the individual methods were originally using to un-indent / trim text lines when parsing their nodes.

acoulton · 2025-05-26T21:54:05Z

The code coverage errors are due to the new getDescription getters which are never called in tests as the assertions read the DTO state directly.

@stof do you have an opinion on whether NameFilter should:

match the title and description combined
only match the title (which will obviously include the description if the file was parsed in legacy mode)

I am leaning to matching both, so that scenarios are filtered the same as now regardless of the parser mode, but I'm not sure?

stof · 2025-06-03T16:48:57Z

I think NameFilter needs to match the name and description combined, to match the same thing than before (when the name was multiline).

And we can think about the future handling of filters in Behat as a separate point.

acoulton · 2025-06-21T13:17:35Z

src/Node/DescribableNodeInterface.php

+    /**
+     * @return ?string
+     */
+    public function getDescription();


Unfortunately this interface method can only have a soft typehint to avoid a BC break when applying it to the existing FeatureNode::getDescription().

I have, however, given all the new implementations of the method a hard typehint.

acoulton · 2025-06-21T13:22:38Z

@stof I've rebased this and finished the work on the filters : I think it is now complete.

stof · 2025-09-11T12:00:52Z

Based on code coverage reports, I have the feeling that the case of empty lines between steps of the background is not covered by tests.

stof · 2025-11-03T17:35:13Z

@acoulton any chance to add the missing test coverage there ?

acoulton · 2025-11-03T17:55:36Z

@stof yeah, I'm hoping to get back to this asap - I also want to check if we have enough coverage of the parsing in legacy mode to be sure this hasn't changed anything...

acoulton · 2025-11-05T10:40:30Z

Waiting for #392 first

Adds the properties, getters, and supporting code ahead of implementing support for parsing these values from the feature file.

We can now parse descriptions for Background, Scenario, Scenario Outline and Examples nodes in the gherkin file. Previously, these were parsed but were included as a multiline title.

This is an edge case, but cucumber/gherkin allows the `Examples:` keyword and table rows to appear within feature and background descriptions. This is because these elements are parsed as text unless already in a context that supports Examples (e.g. a Scenario / Scenario Outline).

When parsing in `GherkinCompatibilityMode::GHERKIN_32`, multiline text for a Scenario (or Outline) will be split across the title and description. This could cause the NameFilter to unexpectedly stop matching scenarios with multiline text if the new parsing mode has been enabled. This would be a breaking change (and could cause false positives in CI systems if scenarios that were assumed to be running were no longer tested). Therefore, NameFilter is instead changed to consider both title and description to mirror existing behaviour. This required introducing a new interface to identify nodes that may have a description, since `NameFilter::isScenarioMatch` accepts any `ScenarioInterface` and that interface (as well as e.g. ExampleNode) does not contain a `getDescription` method. I have added that interface to all nodes - even those that are not relevant for this feature - to avoid future confusion.

acoulton · 2025-11-06T11:36:06Z

src/Parser.php

+            // The only time we use $token['value'] is if we got a `Text` token.
+            // ->expectTokenType('Text') is tagged as returning a `TStringValueToken`, where 'value' cannot be null
+            // However PHPStan cannot follow the chain through predictTokenType -> expectTokenType -> $token['type']
+            assert($text !== null, 'Text token value should not be null');


Without this, PHPStan was complaining that the $text argument to preg_replace could be null (when in fact it cannot be). I'm not sure if there's a cleverer way to solve this.

acoulton · 2025-11-06T11:39:52Z

tests/Cucumber/expected_variants/legacy/descriptions.feature.expected.yaml

+                            name: |-
+                                examples with description
+                                This is an examples description
+                            description: ~


As discussed originally (e.g. https://github.com/Behat/Gherkin/pull/361/files#r2107874660) there are now some cases where legacy mode will now parse files that would previously have caused an Exception, due to multiline text or reserved words in unexpected places.

In those cases, they'll be parsed with a multiline name / title and a null description, consistent with how we parse other nodes in legacy mode.

acoulton · 2025-11-06T11:41:33Z

tests/Cucumber/extra_testdata/good/extra_blank_lines_everywhere.feature

+    Background: That is described
+
+        With multiple paragraphs
+
+        Of description
+
+        Given something
+
+        And something else


This new feature file covers presence of blank lines between steps in a Background (and in all other parts of a gherkin file).

acoulton · 2025-11-06T12:22:54Z

src/Parser.php

            if ($node === "\n") {
                continue;
            }


This line is not covered by tests.

I don't think it can actually ever happen:

Before we get to this point, parseTitleAndDescription has already consumed any comments or newlines after the Background: line and before the first Step.

Therefore, the only ones of our $allowedTokenTypes that we can actually receive for the first iteration is a Step or Text.

If it is a Text, we throw.

If it is a Step, we call parseExpression(), which internally calls parseStep().

parseStep() itself consumes any comments or blank lines that follow the step as part of scanning ahead to find potential PyString / TableRow nodes.

Therefore on our subsequent iterations the only $allowedTokenTypes we can actually receive are again Step or Text.

If I change $allowedTypes to just ['Step', 'Text'] and remove this line, all the tests still pass.

I don't think there are any edge cases that could break this - we could potentially change it now, or keep it as un-covered code for absolute safety.

My temptation is to leave it as-is in this branch, then remove it in a separate PR for clarity.

acoulton · 2025-11-06T12:26:01Z

@stof this is updated and extra test case added for a feature with blank lines everywhere but it hasn't increased the coverage - see this explanation

The other missing coverage is just the new getters.

acoulton changed the title ~~[WIPfeat: Support parsing all describable nodes in gherkin-32 mode~~ [WIP] feat: Support parsing all describable nodes in gherkin-32 mode May 26, 2025

acoulton marked this pull request as draft May 26, 2025 17:00

acoulton mentioned this pull request May 26, 2025

Add descriptions support for the nodes: ExampleTable, Outline and Sce… #254

Closed

acoulton force-pushed the feat-describable-nodes branch 2 times, most recently from 318caa5 to b87921e Compare May 26, 2025 21:23

acoulton changed the title ~~[WIP] feat: Support parsing all describable nodes in gherkin-32 mode~~ feat: Support parsing all describable nodes in gherkin-32 mode May 26, 2025

acoulton commented May 26, 2025

View reviewed changes

acoulton force-pushed the feat-describable-nodes branch from b87921e to 8ea63e4 Compare May 26, 2025 21:27

acoulton commented May 26, 2025

View reviewed changes

acoulton marked this pull request as ready for review May 26, 2025 21:54

acoulton requested a review from stof May 26, 2025 21:54

acoulton force-pushed the feat-describable-nodes branch from 8ea63e4 to 8abb1f6 Compare May 27, 2025 08:48

acoulton force-pushed the feat-describable-nodes branch from 8abb1f6 to 7408ded Compare June 21, 2025 13:13

acoulton commented Jun 21, 2025

View reviewed changes

acoulton added 7 commits November 6, 2025 11:06

refactor: Add wither method to clone ExampleTableNode with new table

5e11a0c

feat: Support name and description properties on describable Nodes

79eeb5c

Adds the properties, getters, and supporting code ahead of implementing support for parsing these values from the feature file.

feat: Parse descriptions for all describable nodes in gherkin-32 mode

6e07716

We can now parse descriptions for Background, Scenario, Scenario Outline and Examples nodes in the gherkin file. Previously, these were parsed but were included as a multiline title.

test: Update expected parsing results for cucumber variants

659bcc0

test: Cover behaviour when feature files contain extra blank lines

d6a66f7

acoulton force-pushed the feat-describable-nodes branch from 7408ded to d6a66f7 Compare November 6, 2025 11:10

fix: PHPStan failures

a1fb0be

acoulton commented Nov 6, 2025

View reviewed changes

stof approved these changes Nov 23, 2025

View reviewed changes

acoulton merged commit fad3283 into Behat:master Nov 24, 2025
9 of 11 checks passed

acoulton deleted the feat-describable-nodes branch November 24, 2025 08:31

This was referenced Nov 24, 2025

Background and scenario descriptions not supported #211

Closed

Comments before description is not parsed the same than in cucumber #330

Closed

Some keywords should be allowed as free text within node descriptions #329

Closed

Uh oh!

Conversation

acoulton commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stof commented May 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acoulton commented May 26, 2025

Uh oh!

stof commented Jun 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acoulton commented Jun 21, 2025

Uh oh!

stof commented Sep 11, 2025

Uh oh!

stof commented Nov 3, 2025

Uh oh!

acoulton commented Nov 3, 2025

Uh oh!

acoulton commented Nov 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acoulton Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acoulton commented Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

acoulton commented May 26, 2025 •

edited

Loading

codecov bot commented May 26, 2025 •

edited

Loading

acoulton Nov 6, 2025 •

edited

Loading