Skip to content

Add filename tracking on Token, TokenSource, and Lexer (Java runtime)#4935

Open
veriktig wants to merge 1 commit into
antlr:devfrom
veriktig:filename-support
Open

Add filename tracking on Token, TokenSource, and Lexer (Java runtime)#4935
veriktig wants to merge 1 commit into
antlr:devfrom
veriktig:filename-support

Conversation

@veriktig
Copy link
Copy Markdown

What

Adds an optional filename property to the Java runtime so tokens can carry the source file they originated from.

  • Token.getFile() — new default method returning "" (backwards-compatible for third-party Token implementations).
  • TokenSource.getFile() — new default method returning "".
  • WritableToken.setFile(String) — new default no-op method.
  • CommonToken: file field, getter/setter, propagated through the (Pair, ...) and copy constructors.
  • Lexer: getFile/setFile delegating to the interpreter.
  • LexerATNSimulator: file field saved/restored across speculative predicate evaluation; threaded through SimState. The protected accept(...) signature is unchanged — subclasses overriding it read file via the simulator's field.
  • ListTokenSource.getFile(): mirrors the existing line/column logic.

Why

Grammars that preprocess multiple input files (Verilog `include, SystemVerilog package files, C-style #include-style preludes, etc.) want to surface the originating filename in diagnostics without maintaining a parallel stream-to-filename map. We've been carrying this delta on a fork since 2018 and would like to upstream it.

Compatibility

  • No source/binary break. New methods on Token, TokenSource, WritableToken are default. Any third-party implementor of these interfaces continues to compile and link unchanged.
  • No subclass break. LexerATNSimulator.accept(CharStream, LexerActionExecutor, int, int, int, int) keeps its existing signature. The previous (saved) file is written to this.file immediately before accept is invoked, so subclass overrides see the right value through the field.
  • No allocation regression on the hot path. One extra reference field on CommonToken and one on the simulator. LexerATNSimulator initialises file to the interned literal \"\" (not new String(\"\")).

Tests

runtime-testsuite/test/org/antlr/v4/test/runtime/java/api/TestTokenFile.java adds 8 cases:

  1. Lexer.setFile propagates to tokens emitted afterward.
  2. CommonToken copy constructor preserves file.
  3. CommonToken(Pair, ...) carries file from the source TokenSource.
  4. ListTokenSource.getFile() reflects the current position's file.
  5. ListTokenSource.getFile() returns \"\" (not null) when empty.
  6. WritableToken.setFile/getFile round-trip on CommonToken.
  7. A minimal Token impl with no getFile override returns \"\" via the default method.
  8. A minimal WritableToken impl with no setFile override accepts the call as a no-op.

mvn -pl runtime-testsuite test -Dtest='TestTokenStream,TestTokenStreamRewriter,TestExpectedTokens,TestVisitors,TestTokenFile' → 61/61 pass locally.

Cross-runtime

Scoped to the Java runtime intentionally. Happy to land follow-ups for C++/C#/Python/JS/Go/Swift/PHP/Dart once the API shape is approved here, or to pivot to a marker sub-interface (SourcedToken extends Token) if that's more palatable than adding default methods to the existing interfaces.

Out of scope (intentionally not included)

  • A no-arg ParserRuleContext.getTokens() overload — separate concern, will file separately if useful.
  • A Cpp.stg tweak — unrelated to filename support; current upstream Cpp work likely supersedes it.

Adds an optional filename property to the Java runtime so tokens can
carry the source file they originated from. Useful for grammars that
preprocess multiple input files (e.g. Verilog `include, SystemVerilog
package files) and want to surface the originating filename in
diagnostics without maintaining a parallel stream-to-filename map.

API surface:
- Token.getFile() - new default method returning "" for backwards
  compatibility with third-party Token implementations.
- TokenSource.getFile() - new default method returning "".
- WritableToken.setFile(String) - new default no-op method.
- CommonToken: file field, getter/setter, propagated through the
  (Pair, ...) ctor and the (Token oldToken) copy ctor.
- Lexer: getFile/setFile delegating to the interpreter.
- LexerATNSimulator: file field saved/restored across speculative
  predicate evaluation; threaded through SimState. The protected
  accept(...) signature is unchanged - subclasses overriding it
  read the file via the simulator's field.
- ListTokenSource.getFile() - mirrors the line/column logic.

Tests in runtime-testsuite cover Lexer.setFile propagation,
CommonToken copy + Pair-ctor propagation, ListTokenSource.getFile
empty/non-empty, WritableToken.setFile round-trip, and the
default-method paths on Token and WritableToken (i.e. that
implementations predating these methods continue to work).

Scope: Java runtime only. Other runtimes can land in follow-ups
once the API shape is approved.

Signed-off-by: Veriktig <veriktig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant