Skip to content

fix: Parse tags correctly in gherkin-32 compatibility mode#400

Merged
acoulton merged 3 commits intoBehat:masterfrom
acoulton:feat-tag-parsing-parity
Nov 26, 2025
Merged

fix: Parse tags correctly in gherkin-32 compatibility mode#400
acoulton merged 3 commits intoBehat:masterfrom
acoulton:feat-tag-parsing-parity

Conversation

@acoulton
Copy link
Contributor

When parsing in gherkin-32 compatibility mode:

  • The leading @ will be retained in the tag value in the parsed nodes (in legacy mode, we remove this).
  • We will throw a ParserException if a tag contains whitespace (in legacy mode, we trigger a deprecation).

To accommodate this, TagFilter will now match tags with or without the leading @ so that it works as expected in either parsing mode.

This may produce a behavioural change (for users who have opted-in to the new parsing mode) if they are interacting with the tags array directly. There is no change to behaviour in legacy mode.

Fixes #336.

@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.66%. Comparing base (fad3283) to head (2972973).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #400      +/-   ##
============================================
+ Coverage     95.60%   95.66%   +0.06%     
- Complexity      675      682       +7     
============================================
  Files            44       45       +1     
  Lines          1980     2009      +29     
============================================
+ Hits           1893     1922      +29     
  Misses           87       87              
Flag Coverage Δ
php8.1 95.66% <100.00%> (+0.06%) ⬆️
php8.1--with=symfony/yaml:^5.4 95.66% <100.00%> (+0.06%) ⬆️
php8.1--with=symfony/yaml:^6.4 95.66% <100.00%> (+0.06%) ⬆️
php8.2 95.66% <100.00%> (+0.06%) ⬆️
php8.3 95.66% <100.00%> (+0.06%) ⬆️
php8.4 95.66% <100.00%> (+0.06%) ⬆️
php8.5 95.66% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines -124 to -125
$tag = str_replace('@', '', trim($tag));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need to remove the @ from the tag filter expression, because the tags we're comparing to will now always have their leading @.

Comment on lines +838 to +844
$tags = preg_split('/(?=@)/u', $line);
assert($tags !== false);
// Remove the empty content before the first tag prefix
array_shift($tags);

// Note: checking for whitespace in tags is done in the Parser to fit with existing logic
$token['tags'] = array_map(trim(...), $tags);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broadly follows the logic in cucumber/gherkin except I have used a lookahead regex to save having to add the @ back onto the split parts. It also omits checking for whitespace here as it seemed more logical to put that alongside the existing deprecation in the Parser, at least until we remove that.


trigger_error(
sprintf('Whitespace in tags is deprecated, found "%s"', $tag),
sprintf('Whitespace in tags is deprecated, found "%s" in %s', $tag, $this->file ?? 'unknown file'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the deprecation message did not include the name of the file (if available) - I've added that to help users trace any of these before enabling the new mode.

@acoulton acoulton requested a review from stof November 25, 2025 02:10
* @return list<FeatureNode>
*/
public function load(string $resource): array
public function load(string $resource, GherkinCompatibilityMode $compatibilityMode): array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use a setter like for the Parser, to avoid having to pass it around everywhere ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good plan, that's much cleaner. It took me a moment to notice that although load is an instance method, the rest of the implementation it calls was all in static methods.

In theory for our current usage of the class I could have used a static setter & static property to hold the mode, but that felt like a bit of a hack. So I've converted all the static methods / calls to instance methods (almost all of them are involved in a call chain that ultimately needs to call getTags). That'll make it cleaner / easier to access the mode from other parts of the loader in future if we need to.

@acoulton acoulton force-pushed the feat-tag-parsing-parity branch 2 times, most recently from 6cc9cf2 to 501fbc4 Compare November 25, 2025 22:38
cucumber/gherkin parses tags exactly as they appear in the file,
including the leading `@` which we have historically stripped. Make our
parsing the same as cucumber's when in gherkin-32 compatibility mode.
When parsing tags, we (and cucumber/gherkin) split on the `@` character.
So `@some tag @values` will parse as `['@some tag', '@value']`. This may
be unexpected if e.g. a user has simply missed a `@` character.

Therefore cucumber/gherkin will throw if a tag contains any whitespace.
Historically we have accepted this, although we have triggered a
deprecation since v4.9.0.

Update the parser to throw if in `gherkin-32` compatibility mode.
Update the filtering to match tags regardless of the gherkin parsing
mode.
@acoulton acoulton force-pushed the feat-tag-parsing-parity branch from 501fbc4 to 2972973 Compare November 25, 2025 22:54
@acoulton acoulton requested a review from stof November 25, 2025 22:55
@acoulton acoulton merged commit d9957c7 into Behat:master Nov 26, 2025
11 checks passed
@acoulton acoulton deleted the feat-tag-parsing-parity branch November 26, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tags are not parsed the same than in cucumber

2 participants