Skip to content

java: Optimize GherkinLine performance #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
Apr 4, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
9a1d709
fix: corrected misc performance issues for #361
Feb 11, 2025
f895ae1
fix: improved encoding detection performance for #361
Feb 11, 2025
17c54fc
feat: added release info for #361
Feb 11, 2025
2e7829b
fix: corrected Parser razor generation for #361
Feb 12, 2025
a842703
fix: corrected Parser razor generation (2) for #361
Feb 12, 2025
ffb825c
fix: corrected Parser razor generation (3) for #361
Feb 12, 2025
3ff3ae7
fix: corrected Parser razor generation (4) for #361
Feb 12, 2025
999af84
Merge branch 'refs/heads/main' into gerkinline_optimization
Feb 12, 2025
6e0de0d
fix: improved performance to get the default dialect for #361
Feb 27, 2025
17f35df
fix: variable reusing and minor rewrite for #361
Feb 27, 2025
240351d
fix: misc optimizations for #361
Mar 11, 2025
e1aaa11
Merge branch 'refs/heads/main' into gerkinline_optimization
Mar 11, 2025
086cd36
fix: additional optimizations for #361
Mar 11, 2025
80aad9c
fix: improved charset detection and string concatenation performance …
Mar 20, 2025
a26229b
Merge branch 'refs/heads/main' into gerkinline_optimization
Mar 20, 2025
4d294ff
fix: corrected code according to PR comments and minor optimization f…
Mar 20, 2025
6da49c6
fix: removed unused import for #361
Mar 20, 2025
bd6e580
fix: corrected according to PR comments for #361
Mar 20, 2025
ae86a67
fix: corrected according to PR comments for #361
Mar 20, 2025
c5ae66d
Optimize and review GherkinDialect
mpkorstanje Mar 28, 2025
8ca2f2b
Optimize GherkinDialect
mpkorstanje Mar 28, 2025
b66ce6f
Update CHANGELOG
mpkorstanje Mar 28, 2025
de7365b
Naming
mpkorstanje Mar 28, 2025
af4861b
Merge branch 'java-optimize-gherkin-dialect' into gerkinline_optimiza…
mpkorstanje Mar 28, 2025
e173337
Merge remote-tracking branch 'origin/main' into gerkinline_optimization
mpkorstanje Mar 28, 2025
18a6b98
Merge remote-tracking branch 'origin/main' into gerkinline_optimization
mpkorstanje Mar 30, 2025
f3e16e7
Merge remote-tracking branch 'origin/main' into gerkinline_optimization
mpkorstanje Mar 30, 2025
3843111
Clean up merge
mpkorstanje Mar 30, 2025
be7458c
Only use package private
mpkorstanje Mar 30, 2025
c093edf
Merge remote-tracking branch 'origin/main' into gerkinline_optimization
mpkorstanje Mar 30, 2025
ca11927
fix: applied PR comments for #372
Apr 2, 2025
a522c01
fix: applied copilot PR comments for #387
Apr 3, 2025
a32cab4
Merge remote-tracking branch 'origin/main' into gerkinline_optimization
mpkorstanje Apr 4, 2025
30b1bd3
Touchups
mpkorstanje Apr 4, 2025
f374f81
Touchups
mpkorstanje Apr 4, 2025
0458cd1
Remove unnecessary null checks
mpkorstanje Apr 4, 2025
efa5071
Remove unnecessary null checks
mpkorstanje Apr 4, 2025
e7b5e4f
Merge remote-tracking branch 'origin/main' into gerkinline_optimization
mpkorstanje Apr 4, 2025
e62d985
Use tuple for result of trimAndIndent
mpkorstanje Apr 4, 2025
f0ee341
Polish
mpkorstanje Apr 4, 2025
2190d1b
Polish GherkinLine
mpkorstanje Apr 4, 2025
c8abad5
Polish GherkinLine
mpkorstanje Apr 4, 2025
44331de
Preserve trailing whitespace in wrongly indented docstrings
mpkorstanje Apr 4, 2025
b4af01a
Polish GherkinLine
mpkorstanje Apr 4, 2025
e1ac003
Note codepoint mismatch
mpkorstanje Apr 4, 2025
fb9aeb2
Polish GherkinLine
mpkorstanje Apr 4, 2025
e0dddf3
Minimize diff
mpkorstanje Apr 4, 2025
d47f22a
Merge remote-tracking branch 'origin/main' into gerkinline_optimization
mpkorstanje Apr 4, 2025
c060883
Update CHANGELOG
mpkorstanje Apr 4, 2025
650624a
Remove redundant test prefix
mpkorstanje Apr 4, 2025
8560031
Minimize diff
mpkorstanje Apr 4, 2025
a4253d3
java: Remove redundant public modifiers from package private classes
mpkorstanje Apr 4, 2025
d91ca4a
java: Remove redundant public modifiers from package private classes
mpkorstanje Apr 4, 2025
6f5d923
Merge remote-tracking branch 'origin/main' into gerkinline_optimization
mpkorstanje Apr 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ This document is formatted according to the principles of [Keep A CHANGELOG](htt

## [Unreleased]
### Changed
- [Java] Optimize GherkinLine performance ([#361](https://github.com/cucumber/gherkin/issues/361))
- [Java] Optimize number of array copies ([#388](https://github.com/cucumber/gherkin/pull/388))
- [Java] Optimize Location performance ([#385](https://github.com/cucumber/gherkin/pull/385))
- [Java] Optimize AstNode performance ([#383](https://github.com/cucumber/gherkin/pull/383))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

interface GherkinLanguageConstants {
String TAG_PREFIX = "@";
String COMMENT_PREFIX = "#";
char COMMENT_PREFIX_CHAR = '#';
String COMMENT_PREFIX = "" + COMMENT_PREFIX_CHAR;
String TITLE_KEYWORD_SEPARATOR = ":";
String TABLE_CELL_SEPARATOR = "|";
String DOCSTRING_SEPARATOR = "\"\"\"";
Expand Down
114 changes: 70 additions & 44 deletions java/src/main/java/io/cucumber/gherkin/GherkinLine.java
Original file line number Diff line number Diff line change
Expand Up @@ -4,86 +4,113 @@

import java.util.ArrayList;
import java.util.List;
import java.util.Map.Entry;
import java.util.PrimitiveIterator;

import static io.cucumber.gherkin.GherkinLanguageConstants.COMMENT_PREFIX;
import static io.cucumber.gherkin.GherkinLanguageConstants.TAG_PREFIX;
import static io.cucumber.gherkin.StringUtils.ltrim;
import static io.cucumber.gherkin.StringUtils.ltrimKeepNewLines;
import static io.cucumber.gherkin.GherkinLanguageConstants.TITLE_KEYWORD_SEPARATOR;
import static io.cucumber.gherkin.Locations.COLUMN_OFFSET;
import static io.cucumber.gherkin.StringUtils.containsWhiteSpace;
import static io.cucumber.gherkin.StringUtils.rtrim;
import static io.cucumber.gherkin.StringUtils.rtrimKeepNewLines;
import static io.cucumber.gherkin.StringUtils.symbolCount;
import static io.cucumber.gherkin.StringUtils.trim;
import static io.cucumber.gherkin.StringUtils.trimAndIndent;
import static io.cucumber.gherkin.StringUtils.trimAndIndentKeepNewLines;
import static java.util.Collections.emptyList;
import static java.util.Objects.requireNonNull;

class GherkinLine {
// TODO: set this to 0 when/if we change to 0-indexed columns
private static final int OFFSET = 1;
private final String lineText;
private final String trimmedLineText;

/**
* The line text, including all leading and trailing whitespace characters.
*/
private final String rawText;
private final Location location;
private final boolean empty;

/**
* The line text with any whitespace characters trimmed.
*/
private final String text;

/**
* The offset in code-points of the first non-whitespace character in this
* line.
*/
private final int indent;
private final Location line;

public GherkinLine(String lineText, Location line) {
this.lineText = requireNonNull(lineText);
this.trimmedLineText = trim(lineText);
this.line = requireNonNull(line);
indent = symbolCount(lineText) - symbolCount(ltrim(lineText));
GherkinLine(String rawText, Location location) {
this.rawText = requireNonNull(rawText);
this.location = requireNonNull(location);
Entry<String, Integer> trimmedIndent = trimAndIndent(rawText);
this.text = trimmedIndent.getKey();
this.indent = trimmedIndent.getValue();
this.empty = text.isEmpty();
}

public int indent() {
int getIndent() {
return indent;
}

public String getLineText(int indentToRemove) {
if (indentToRemove < 0 || indentToRemove > indent())
return trimmedLineText;
return lineText.substring(indentToRemove);
String getText() {
return text;
}

String getRawText() {
return rawText;
}

public boolean isEmpty() {
return trimmedLineText.isEmpty();
String getRawTextSubstring(int beginIndex) {
return rawText.substring(beginIndex);
}

public boolean startsWith(String prefix) {
return trimmedLineText.startsWith(prefix);
boolean isEmpty() {
return empty;
}

public String getRestTrimmed(int length) {
return trimmedLineText.substring(length).trim();
boolean startsWith(String prefix) {
return text.startsWith(prefix);
}

public List<GherkinLineSpan> getTags() {
String substringTrimmed(int beginIndex) {
return text.substring(beginIndex).trim();
}

String uncommentedLine = trimmedLineText.split("\\s" + COMMENT_PREFIX, 2)[0];
List<GherkinLineSpan> tags = new ArrayList<>();
List<GherkinLineSpan> parseTags() {
// in most cases, the line contains no tag, so the code is optimized for this situation
if (empty) {
return emptyList();
}
String uncommentedLine = StringUtils.removeComments(text);
int indexInUncommentedLine = 0;

String[] elements = uncommentedLine.split(TAG_PREFIX);
if (elements.length == 0) {
return emptyList();
}
List<GherkinLineSpan> tags = new ArrayList<>(elements.length);
for (String element : elements) {
String token = rtrim(element);
if (token.isEmpty()) {
continue;
}
int symbolLength = uncommentedLine.codePointCount(0, indexInUncommentedLine);
int column = indent() + symbolLength + 1;
if (!token.matches("^\\S+$")) {
throw new ParserException("A tag may not contain whitespace", Locations.atColumn(line, column));
int column = indent + symbolLength + COLUMN_OFFSET;
if (containsWhiteSpace(token)) {
throw new ParserException("A tag may not contain whitespace", Locations.atColumn(location, column));
}
tags.add(new GherkinLineSpan(column, TAG_PREFIX + token));
indexInUncommentedLine += element.length() + 1;
}
return tags;
}

public List<GherkinLineSpan> getTableCells() {
List<GherkinLineSpan> parseTableCells() {
List<GherkinLineSpan> lineSpans = new ArrayList<>();
StringBuilder cellBuilder = new StringBuilder();
boolean beforeFirst = true;
int col = 0;
int cellStart = 0;
boolean escape = false;
PrimitiveIterator.OfInt iterator = lineText.codePoints().iterator();
PrimitiveIterator.OfInt iterator = text.codePoints().iterator();
while (iterator.hasNext()) {
int c = iterator.next();
if (escape) {
Expand Down Expand Up @@ -112,10 +139,9 @@ public List<GherkinLineSpan> getTableCells() {
// Skip the first empty span
beforeFirst = false;
} else {
String cell = cellBuilder.toString();
String leftTrimmedCell = ltrimKeepNewLines(cell);
int cellIndent = symbolCount(cell) - symbolCount(leftTrimmedCell);
lineSpans.add(new GherkinLineSpan(cellStart + cellIndent + OFFSET, rtrimKeepNewLines(leftTrimmedCell)));
Entry<String, Integer> trimmedCellIndent = trimAndIndentKeepNewLines(cellBuilder.toString());
int column = indent + cellStart + trimmedCellIndent.getValue() + COLUMN_OFFSET;
lineSpans.add(new GherkinLineSpan(column, trimmedCellIndent.getKey()));
}
cellBuilder = new StringBuilder();
cellStart = col + 1;
Expand All @@ -128,11 +154,11 @@ public List<GherkinLineSpan> getTableCells() {
return lineSpans;
}

public boolean startsWithTitleKeyword(String text) {
int textLength = text.length();
return trimmedLineText.length() > textLength &&
trimmedLineText.startsWith(text) &&
trimmedLineText.startsWith(GherkinLanguageConstants.TITLE_KEYWORD_SEPARATOR, textLength);
boolean startsWithTitleKeyword(String keyword) {
int keywordLength = keyword.length();
return text.length() > keywordLength &&
text.startsWith(keyword) &&
text.startsWith(TITLE_KEYWORD_SEPARATOR, keywordLength);
}

}
14 changes: 9 additions & 5 deletions java/src/main/java/io/cucumber/gherkin/GherkinLineSpan.java
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
package io.cucumber.gherkin;

class GherkinLineSpan {
// One-based line position
public final int column;
/**
* Index-1 based position in codepoints.
*/
final int column;

// text part of the line
public final String text;
/**
* Text part of the line
*/
final String text;

public GherkinLineSpan(int column, String text) {
GherkinLineSpan(int column, String text) {
this.column = column;
this.text = text;
}
Expand Down
5 changes: 5 additions & 0 deletions java/src/main/java/io/cucumber/gherkin/Locations.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@

class Locations {

/**
* Columns are index-1 based.
*/
static final int COLUMN_OFFSET = 1;

/**
* Cache of Long objects for the range 0-4095. This is used
* to avoid creating a huge amount of Long objects in getLocation().
Expand Down
12 changes: 8 additions & 4 deletions java/src/main/java/io/cucumber/gherkin/ParserException.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

import io.cucumber.messages.types.Location;

import static io.cucumber.gherkin.Locations.COLUMN_OFFSET;
import static io.cucumber.gherkin.Locations.atColumn;

class ParserException extends RuntimeException {
Expand Down Expand Up @@ -58,13 +59,16 @@ static class UnexpectedTokenException extends ParserException {
private static String getMessage(Token receivedToken, List<String> expectedTokenTypes) {
return String.format("expected: %s, got '%s'",
String.join(", ", expectedTokenTypes),
receivedToken.getTokenValue().trim());
receivedToken.getTokenValue()
);
}

private static Location getLocation(Token receivedToken) {
return receivedToken.location.getColumn().isPresent()
? receivedToken.location
: atColumn(receivedToken.location, receivedToken.line.indent() + 1);
if (receivedToken.location.getColumn().isPresent()) {
return receivedToken.location;
}
int column = COLUMN_OFFSET + receivedToken.line.getIndent();
return atColumn(receivedToken.location, column);
}
}

Expand Down
1 change: 0 additions & 1 deletion java/src/main/java/io/cucumber/gherkin/PickleCompiler.java
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.EnumMap;
import java.util.List;
import java.util.Map;
Expand Down
Loading