Add filename tracking on Token, TokenSource, and Lexer (Java runtime)#4935
Open
veriktig wants to merge 1 commit into
Open
Add filename tracking on Token, TokenSource, and Lexer (Java runtime)#4935veriktig wants to merge 1 commit into
veriktig wants to merge 1 commit into
Conversation
Adds an optional filename property to the Java runtime so tokens can carry the source file they originated from. Useful for grammars that preprocess multiple input files (e.g. Verilog `include, SystemVerilog package files) and want to surface the originating filename in diagnostics without maintaining a parallel stream-to-filename map. API surface: - Token.getFile() - new default method returning "" for backwards compatibility with third-party Token implementations. - TokenSource.getFile() - new default method returning "". - WritableToken.setFile(String) - new default no-op method. - CommonToken: file field, getter/setter, propagated through the (Pair, ...) ctor and the (Token oldToken) copy ctor. - Lexer: getFile/setFile delegating to the interpreter. - LexerATNSimulator: file field saved/restored across speculative predicate evaluation; threaded through SimState. The protected accept(...) signature is unchanged - subclasses overriding it read the file via the simulator's field. - ListTokenSource.getFile() - mirrors the line/column logic. Tests in runtime-testsuite cover Lexer.setFile propagation, CommonToken copy + Pair-ctor propagation, ListTokenSource.getFile empty/non-empty, WritableToken.setFile round-trip, and the default-method paths on Token and WritableToken (i.e. that implementations predating these methods continue to work). Scope: Java runtime only. Other runtimes can land in follow-ups once the API shape is approved. Signed-off-by: Veriktig <veriktig@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an optional filename property to the Java runtime so tokens can carry the source file they originated from.
Token.getFile()— new default method returning""(backwards-compatible for third-partyTokenimplementations).TokenSource.getFile()— new default method returning"".WritableToken.setFile(String)— new default no-op method.CommonToken:filefield, getter/setter, propagated through the(Pair, ...)and copy constructors.Lexer:getFile/setFiledelegating to the interpreter.LexerATNSimulator:filefield saved/restored across speculative predicate evaluation; threaded throughSimState. The protectedaccept(...)signature is unchanged — subclasses overriding it readfilevia the simulator's field.ListTokenSource.getFile(): mirrors the existing line/column logic.Why
Grammars that preprocess multiple input files (Verilog
`include, SystemVerilog package files, C-style#include-style preludes, etc.) want to surface the originating filename in diagnostics without maintaining a parallel stream-to-filename map. We've been carrying this delta on a fork since 2018 and would like to upstream it.Compatibility
Token,TokenSource,WritableTokenaredefault. Any third-party implementor of these interfaces continues to compile and link unchanged.LexerATNSimulator.accept(CharStream, LexerActionExecutor, int, int, int, int)keeps its existing signature. The previous (saved)fileis written tothis.fileimmediately beforeacceptis invoked, so subclass overrides see the right value through the field.CommonTokenand one on the simulator.LexerATNSimulatorinitialisesfileto the interned literal\"\"(notnew String(\"\")).Tests
runtime-testsuite/test/org/antlr/v4/test/runtime/java/api/TestTokenFile.javaadds 8 cases:Lexer.setFilepropagates to tokens emitted afterward.CommonTokencopy constructor preservesfile.CommonToken(Pair, ...)carriesfilefrom the sourceTokenSource.ListTokenSource.getFile()reflects the current position's file.ListTokenSource.getFile()returns\"\"(not null) when empty.WritableToken.setFile/getFileround-trip onCommonToken.Tokenimpl with nogetFileoverride returns\"\"via the default method.WritableTokenimpl with nosetFileoverride accepts the call as a no-op.mvn -pl runtime-testsuite test -Dtest='TestTokenStream,TestTokenStreamRewriter,TestExpectedTokens,TestVisitors,TestTokenFile'→ 61/61 pass locally.Cross-runtime
Scoped to the Java runtime intentionally. Happy to land follow-ups for C++/C#/Python/JS/Go/Swift/PHP/Dart once the API shape is approved here, or to pivot to a marker sub-interface (
SourcedToken extends Token) if that's more palatable than adding default methods to the existing interfaces.Out of scope (intentionally not included)
ParserRuleContext.getTokens()overload — separate concern, will file separately if useful.Cpp.stgtweak — unrelated to filename support; current upstream Cpp work likely supersedes it.