Problem
The tokenizer has no protection against pathological input. If a grammar has a catastrophic backtracking regex or the user pastes an extremely long line, tokenizeString() blocks indefinitely. On Android this causes an ANR.
vscode-textmate solves this with an optional timeLimit parameter on tokenizeLine(). If a single line exceeds the limit, tokenization stops early and returns partial results with stoppedEarly: true.
Tokenizer.kt line 23 already acknowledges this: Pending: injection grammars, time limits.
What to implement
Grammar API
// Add optional timeLimit parameter
fun tokenizeLine(
lineText: String,
prevState: StateStack?,
timeLimit: Int = 0 // milliseconds, 0 = no limit
): TokenizeLineResult
data class TokenizeLineResult(
val tokens: List<Token>,
val ruleStack: StateStack,
val stoppedEarly: Boolean = false // true if timeLimit exceeded
)
Tokenizer loop
In tokenizeString(), check elapsed time periodically (e.g., every N iterations or after each match cycle):
val deadline = if (timeLimit > 0) System.nanoTime() + timeLimit * 1_000_000L else Long.MAX_VALUE
// Inside the main loop:
if (System.nanoTime() > deadline) {
// Produce tokens for remaining text under current scope
// Return with stoppedEarly = true
}
Compose-ui layer
CodeHighlighter.highlight() should pass a reasonable default time limit (e.g., 500ms per line) to avoid freezing the UI thread. When stoppedEarly is true, the remaining text on that line gets the current scope's style (graceful degradation).
Reference
- vscode-textmate:
timeLimit parameter in Grammar._tokenize() and _tokenizeString() in src/grammar/grammar.ts
- Current stub:
core/src/main/kotlin/dev/textmate/grammar/tokenize/Tokenizer.kt line 23
Problem
The tokenizer has no protection against pathological input. If a grammar has a catastrophic backtracking regex or the user pastes an extremely long line,
tokenizeString()blocks indefinitely. On Android this causes an ANR.vscode-textmate solves this with an optional
timeLimitparameter ontokenizeLine(). If a single line exceeds the limit, tokenization stops early and returns partial results withstoppedEarly: true.Tokenizer.ktline 23 already acknowledges this:Pending: injection grammars, time limits.What to implement
Grammar API
Tokenizer loop
In
tokenizeString(), check elapsed time periodically (e.g., every N iterations or after each match cycle):Compose-ui layer
CodeHighlighter.highlight()should pass a reasonable default time limit (e.g., 500ms per line) to avoid freezing the UI thread. WhenstoppedEarlyis true, the remaining text on that line gets the current scope's style (graceful degradation).Reference
timeLimitparameter inGrammar._tokenize()and_tokenizeString()insrc/grammar/grammar.tscore/src/main/kotlin/dev/textmate/grammar/tokenize/Tokenizer.ktline 23