Skip to content

Add tokenization time limits to prevent ANR on pathological input #17

@ivan-magda

Description

@ivan-magda

Problem

The tokenizer has no protection against pathological input. If a grammar has a catastrophic backtracking regex or the user pastes an extremely long line, tokenizeString() blocks indefinitely. On Android this causes an ANR.

vscode-textmate solves this with an optional timeLimit parameter on tokenizeLine(). If a single line exceeds the limit, tokenization stops early and returns partial results with stoppedEarly: true.

Tokenizer.kt line 23 already acknowledges this: Pending: injection grammars, time limits.

What to implement

Grammar API

// Add optional timeLimit parameter
fun tokenizeLine(
    lineText: String,
    prevState: StateStack?,
    timeLimit: Int = 0  // milliseconds, 0 = no limit
): TokenizeLineResult

data class TokenizeLineResult(
    val tokens: List<Token>,
    val ruleStack: StateStack,
    val stoppedEarly: Boolean = false  // true if timeLimit exceeded
)

Tokenizer loop

In tokenizeString(), check elapsed time periodically (e.g., every N iterations or after each match cycle):

val deadline = if (timeLimit > 0) System.nanoTime() + timeLimit * 1_000_000L else Long.MAX_VALUE

// Inside the main loop:
if (System.nanoTime() > deadline) {
    // Produce tokens for remaining text under current scope
    // Return with stoppedEarly = true
}

Compose-ui layer

CodeHighlighter.highlight() should pass a reasonable default time limit (e.g., 500ms per line) to avoid freezing the UI thread. When stoppedEarly is true, the remaining text on that line gets the current scope's style (graceful degradation).

Reference

  • vscode-textmate: timeLimit parameter in Grammar._tokenize() and _tokenizeString() in src/grammar/grammar.ts
  • Current stub: core/src/main/kotlin/dev/textmate/grammar/tokenize/Tokenizer.kt line 23

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-value features

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions