Skip to content

Expose ts_parser_set_encoding on the Parser API #61

@saket

Description

@saket

Tree-sitter's C API supports ts_parser_set_encoding, which makes node offsets line up with utf16 code units instead of utf8 bytes. This is a natural fit for JVM languages, where String is already utf16 internally.

ktreesitter v0.24.1 hardcodes utf8 in both parse paths:

parse(source: String)
parse(oldTree, callback)

This forces Kotlin callers have to maintain a byte to char offset table on every parse to bridge tree-sitter's utf8 offsets to Kotlin's utf16 string indices. Exposing a knob would let us skip that entirely.

Is there interest in this? Happy to contribute a PR if the direction sounds reasonable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions