RFC 0001: Weaver for Apache Thrift Tooling Platform (thriftfmt, thriftlint, thriftls, VS Code Extension)
- Status: Accepted
- Authors: Dmytro Shteflyuk
- Created: 2026-02-23
- Target release: Beta (date TBD)
This RFC proposes Weaver for Apache Thrift, a standalone tooling project for Apache Thrift IDL editing and formatting, consisting of:
thriftfmt: a stable, lossless-aware formatter for.thriftfilesthriftlint: a diagnostics-oriented linter for.thriftfilesthriftls: an LSP server for editor integrations- a VS Code extension with syntax highlighting and LSP integration
The project will be implemented primarily in Go and designed around a reusable syntax/formatting engine. Parsing will use a tree-sitter grammar (for incremental, error-tolerant parsing suitable for LSP) plus a custom lossless lexer/token-trivia layer (for formatter fidelity).
The public product name is Weaver for Apache Thrift. Repository and module identifiers remain thrift-weaver.
The current Apache Thrift C++ compiler frontend is optimized for semantic compilation and code generation, not source-preserving formatting:
- whitespace and regular comments are discarded early
- some syntax is normalized into semantic representations
- top-level declarations are stored in typed collections rather than source order
These are good compiler design choices but they are poor foundations for a modern formatter/LSP stack.
Building a dedicated tooling project allows:
- lossless parsing/trivia preservation for formatting
- error-tolerant incremental parsing for editors
- a cleaner Go-based developer experience
- independent release cadence from the Apache Thrift compiler
- Provide a deterministic, idempotent Apache Thrift formatter (
thriftfmt) - Provide an Apache Thrift linter CLI (
thriftlint) that reuses parser diagnostics and lint rules - Provide baseline structural lint rules for duplicate explicit field IDs, duplicate field names, and other deprecated/unsafe constructs detectable within a single document
- Provide bounded single-document semantic diagnostics for locally resolvable type and service constraints without requiring workspace indexing
- Provide a production-quality LSP server (
thriftls) for editors - Provide a VS Code extension with syntax highlighting and LSP client integration
- Preserve comments and syntax fidelity where formatter policy permits
- Support invalid/incomplete code in editor workflows
- Validate formatted output compatibility against the official Apache Thrift compiler in CI
- Replacing the official Apache Thrift compiler
- Whole-program semantic type checking, include-graph resolution, and cross-file indexing in v1
- Cross-file indexing, go-to-definition, rename in v1
- A perfect source-preserving rewriter (formatter may normalize whitespace and selected style choices)
- Embedding formatter/LSP into the existing
thriftbinary
The platform is a shared engine with two frontends (CLI + LSP), plus a VS Code client.
+----------------------+
| VS Code Plugin |
| TextMate + LSP client|
+----------+-----------+
|
| JSON-RPC (LSP)
v
+----------------------+
| thriftls |
| LSP transport/API |
+----------+-----------+
|
+----------v-----------+
| Shared Go Engine |
| lexer + tokens |
| tree-sitter parser |
| CST wrappers |
| diagnostics |
| formatter |
+----------+-----------+
^
|
+----------+-----------+
| thriftfmt |
| CLI (check/write) |
+----------------------+
- Implementation language: Go
- Parser runtime: embedded tree-sitter wasm executed in-process via
wazero - Rationale:
- rapid iteration and testing
- straightforward CLI/LSP packaging
- strong ecosystem for tooling and CI
- Use
tree-sitterfor syntax parsing and incremental updates - Add a custom lossless lexer for trivia and exact token lexemes
Rationale:
tree-sittergives incremental/error-tolerant parsing and node spans- custom lexer gives formatter-grade trivia preservation and lexeme fidelity
- hybrid approach reduces risk versus hand-rolling a fully incremental parser
Normative v1 decision:
tree-sitteris the structural parser only; the custom lexer is the token/trivia source of truth for formatting.- All formatter output decisions must be derived from:
- CST structure (node kinds + spans)
- lossless token/trivia spans
- formatter policy
- No formatter logic may depend on
tree-sittertokenization internals.
- Internal primary representation for formatting/LSP: CST-oriented syntax tree + lossless token stream
- No semantic AST required for v1 formatter/LSP
- Deterministic pretty-printer with doc-algebra style layout
- Preserve comments and token lexemes where policy allows
- Regenerate whitespace/indentation
- Support full-document and range formatting
- Snapshot-based document model keyed by URI+version
- Full reparse on change in v1 (designed to allow incremental optimization later)
- Error-tolerant parsing and partial diagnostics for malformed code
Normative v1 decisions:
- LSP text sync mode:
Incremental(textDocument/didChangewith ranged edits) - Internal parse mode: full reparse from reconstructed document text after each accepted change
- Formatting on invalid syntax:
textDocument/formattingandrangeFormattingmay return an LSP error (RequestFailed) when formatting is unsafe- diagnostics continue to be published asynchronously via
publishDiagnostics
Proposed repository root (new project, separate from Apache Thrift repo):
thrift-weaver/
README.md
LICENSE
go.mod
go.sum
.github/
workflows/
ci.yml
release.yml
docs/
architecture.md
formatting-style.md
release.md
rfcs/
0001-thrift-tooling-platform.md
cmd/
thriftfmt/
main.go
thriftlint/
main.go
thriftls/
main.go
internal/
text/
line_index.go
positions.go
edits.go
lexer/
token.go
trivia.go
lexer.go
lexer_test.go
syntax/
kinds.go
parse.go
diagnostics.go
cst.go
query.go
treesitter/
parser.go
language.go
node.go
format/
doc.go
printer.go
comments.go
policy.go
format.go
range_format.go
format_test.go
lsp/
server.go
handlers.go
transport_stdio.go
snapshots.go
workspace.go
capabilities.go
diagnostics.go
formatting.go
symbols.go
folding.go
semantic_tokens.go
testutil/
corpus.go
goldens.go
thrift_oracle.go
grammar/
tree-sitter-thrift/
grammar.js
src/
queries/
highlights.scm
folds.scm
symbols.scm
editors/
vscode/
package.json
src/
extension.ts
client.ts
config.ts
syntaxes/
thrift.tmLanguage.json
language-configuration.json
scripts/
package-binaries.ts
README.md
CHANGELOG.md
testdata/
corpus/
valid/
invalid/
editor/
format/
input/
expected/
lsp/
scenarios/
scripts/
bootstrap.sh
generate-tree-sitter.sh
sync-thrift-corpus.sh
Purpose:
- Line index and offset math
- Byte offset <-> UTF-8 line/column
- Byte offset <-> LSP UTF-16 positions
- Text edit utilities and diff helpers
Constraints:
- This package is the only place that understands LSP UTF-16 conversions.
- Parser/formatter APIs should use byte offsets internally.
Purpose:
- Produce a lossless token stream with trivia and raw spans
- Provide stable token kinds independent of
tree-sitterinternals
Key responsibilities:
- Exact lexeme slicing from source
- Comment classification (
//,#,/* */,/** */) - Whitespace/newline trivia capture
- Robust handling of malformed strings/comments (emit error tokens + diagnostics)
v1 decision:
- Use a leading-trivia-only storage model unless a concrete formatter bug requires trailing trivia.
Trailingfields in example APIs below are illustrative and may be omitted from implementation.
Purpose:
- Wrap
tree-sitterparse tree with project-specific CST API - Merge tree nodes with token stream
- Produce diagnostics and syntax queries for editor features
Key responsibilities:
- Parse source into
Tree(CST root + token stream + diagnostics) - Provide node iteration/query helpers
- Support parse recovery and error nodes
- Track stable spans for range formatting and editor features
Critical invariant:
- Every non-synthetic CST node span must map to a contiguous source byte range.
FirstToken/LastTokenmust reference tokens whose spans are within the node span.- Error/recovery nodes must still preserve source order in
Children.
Purpose:
- Format source using CST + token/trivia model
- Return full-file output and precise edits
Key responsibilities:
- Doc-algebra builder/printer
- Comment placement and preservation rules
- Full and range formatting
- Idempotence guarantees
Critical invariant:
- Formatter must never emit text outside the input document's declared encoding assumptions (UTF-8 bytes in, UTF-8 bytes out).
Purpose:
- LSP server implementation over shared engine
- Snapshot lifecycle and request routing
Key responsibilities:
- document lifecycle (
didOpen,didChange,didClose) - diagnostics publishing
- formatting handlers
- symbols/folds/selection ranges
- semantic tokens
- cancellation and version consistency
- structured logging/trace hooks for debugging and support
v1 concurrency model:
- requests may be handled concurrently across different documents
- operations for the same document must resolve against a single immutable snapshot version
- stale formatting requests (older version than current snapshot) may return
ContentModified
Purpose:
- VS Code client and packaging
- Syntax highlighting (TextMate baseline plus semantic-token overlay from
thriftls) - Launch managed-install or user-provided
thriftls
Key responsibilities:
- register language and grammar
- spawn server
- configure transport and settings
- surface logs/errors to users
This section defines the core data model for the engine.
package text
type ByteOffset int
type Span struct {
Start ByteOffset // inclusive
End ByteOffset // exclusive
}
type Point struct {
Line int // 0-based
Column int // byte column
}
type Range struct {
Start Point
End Point
}
// LSP-facing UTF-16 position/range, kept at edges only.
type UTF16Position struct {
Line int
Character int
}
type UTF16Range struct {
Start UTF16Position
End UTF16Position
}package lexer
type TokenKind uint16
type TriviaKind uint8
const (
TriviaWhitespace TriviaKind = iota
TriviaNewline
TriviaLineComment
TriviaHashComment
TriviaBlockComment
TriviaDocComment
)
type Trivia struct {
Kind TriviaKind
Span text.Span
}
type Token struct {
Kind TokenKind
Span text.Span
Leading []Trivia
Flags TokenFlags // e.g. malformed, synthesized, recovered
}
type TokenFlags uint8Notes:
- Token text is recovered via
source[token.Span.Start:token.Span.End]. - Trivia also points into source via spans; no duplicated strings by default.
- A leading-trivia-only model is acceptable in v1 if comment placement remains stable.
package syntax
type NodeKind uint16
type NodeID uint32
const NoNode NodeID = 0
// Real node IDs are 1-based. NodeID is not required to equal the slice index.
type ChildRef struct {
IsToken bool
Index uint32 // token index or node index
}
type Node struct {
ID NodeID
Kind NodeKind
Span text.Span
FirstToken uint32 // inclusive token index
LastToken uint32 // inclusive token index
Parent NodeID // NoNode for root
Children []ChildRef // original source order
Flags NodeFlags // error/recovered/synthetic
}
type NodeFlags uint8
type Tree struct {
URI string
Version int32
Source []byte
Tokens []lexer.Token
Nodes []Node
Root NodeID
Diagnostics []Diagnostic
LineIndex *text.LineIndex
}Design notes:
- Tree is immutable after parse.
- Nodes are stored in slices for cache locality and stable indexing.
- Parent pointers enable quick ancestor widening for range formatting.
Childrenpreserve exact syntax order, even for malformed or recovered regions.
package syntax
type Severity uint8
const (
SeverityError Severity = iota + 1
SeverityWarning
SeverityInfo
)
type DiagnosticCode string
type Diagnostic struct {
Code DiagnosticCode
Message string
Severity Severity
Span text.Span
Related []RelatedDiagnostic
Source string // "lexer", "parser", "formatter"
Recoverable bool
}
type RelatedDiagnostic struct {
Message string
Span text.Span
}package format
type Options struct {
LineWidth int
Indent string // default: " "
MaxBlankLines int
PreserveCommentCols bool // v2, experimental
}
type Result struct {
Output []byte
Changed bool
Diagnostics []syntax.Diagnostic
}
type RangeResult struct {
Edits []text.ByteEdit
Diagnostics []syntax.Diagnostic
}text.ByteEdit (referenced above) is defined as:
package text
type ByteEdit struct {
Span Span
NewText []byte
}package lsp
type Snapshot struct {
URI string
Version int32
Tree *syntax.Tree
UpdatedAt time.Time
}
type DocumentStore interface {
Get(uri string) (*Snapshot, bool)
Put(snapshot *Snapshot)
Delete(uri string)
}The examples below define the intended engine/package contracts for implementation. v1 does not commit to a public/stable Go library API; packages remain internal until post-beta.
package syntax
type ParseOptions struct {
URI string
Version int32
IncludeQueries bool // parse tree-sitter query metadata if needed
}
func Parse(ctx context.Context, src []byte, opts ParseOptions) (*Tree, error)
// Future incremental API; v1 may parse from scratch.
func Reparse(ctx context.Context, old *Tree, src []byte, opts ParseOptions) (*Tree, error)Behavior:
- Returns a
Treeeven if syntax errors exist (best-effort), unless parsing infrastructure fails catastrophically. - Parser errors appear in
Tree.Diagnostics. erroris reserved for internal failures (cancellation, parser initialization, invariant violations).Reparseis an optimization API. It must remain behaviorally equivalent toParsefor the same input bytes and options.
package format
func Document(ctx context.Context, tree *syntax.Tree, opts Options) (Result, error)
func Range(ctx context.Context, tree *syntax.Tree, r text.Span, opts Options) (RangeResult, error)
// Convenience wrapper for CLI paths.
func Source(ctx context.Context, src []byte, uri string, opts Options) (Result, error)Behavior:
Documentmay refuse to format if parse errors exceed a safety threshold (configurable policy).Rangewidens to the nearest format-safe ancestor (declaration/block/list node).- Both functions are deterministic and idempotent given the same tree/options.
Formatting refusal contract:
- Refusal due to unsafe syntax is not a process error.
- Engine API will return a typed error (e.g.,
ErrUnsafeToFormat) for unsafe formatting requests. - LSP/CLI layers map
ErrUnsafeToFormatto protocol/UX behavior (LSPRequestFailed, CLI exit code2) while continuing to surface diagnostics from parsing.
package lsp
type ServerOptions struct {
Logf func(string, ...any)
FormatOptions format.Options
}
type Server struct {
// internal state
}
func NewServer(opts ServerOptions) *Server
func (s *Server) RunStdio(ctx context.Context) errorRequest handlers (internal signatures):
func (s *Server) DidOpen(ctx context.Context, p DidOpenParams) error
func (s *Server) DidChange(ctx context.Context, p DidChangeParams) error
func (s *Server) DidClose(ctx context.Context, p DidCloseParams) error
func (s *Server) Formatting(ctx context.Context, p DocumentFormattingParams) ([]TextEdit, error)
func (s *Server) RangeFormatting(ctx context.Context, p DocumentRangeFormattingParams) ([]TextEdit, error)
func (s *Server) DocumentSymbol(ctx context.Context, p DocumentSymbolParams) ([]DocumentSymbol, error)
func (s *Server) FoldingRange(ctx context.Context, p FoldingRangeParams) ([]FoldingRange, error)
func (s *Server) SelectionRange(ctx context.Context, p SelectionRangeParams) ([]SelectionRange, error)
func (s *Server) SemanticTokensFull(ctx context.Context, p SemanticTokensParams) (*SemanticTokens, error) // phase 2LSP protocol contract (normative v1):
initializeadvertises incremental sync, document/range formatting, document symbols, folding ranges, and selection ranges.initializemust not advertise unsupported methods behind placeholders.shutdownis graceful and idempotent;exitterminates process.textDocument/formattingandtextDocument/rangeFormatting:- return
RequestFailedwhen formatting is unsafe (ErrUnsafeToFormat) - return
ContentModifiedwhen request version is stale relative to current snapshot
- return
- Unknown methods return standard JSON-RPC method-not-found behavior.
- Server must remain responsive under cancellation and treat cancellation as non-fatal.
The formatter will:
- normalize indentation
- normalize horizontal spacing
- normalize blank line counts
- preserve comments
- preserve declaration and member order
- preserve token lexemes where possible:
- string quote style and escapes
- hex/decimal literal spelling
- deprecated spellings (
async,byte) unless an explicit normalize option is added
The formatter will not (v1):
- reorder imports/includes/namespaces
- rewrite deprecated syntax
- enforce semantic style (e.g. field ids ordering)
These defaults are normative for the first implementation and for golden tests unless changed by a future RFC:
LineWidth = 100Indent = " "(two spaces)MaxBlankLines = 2- top-level declarations separated by one blank line
- members (fields/functions/enum values) formatted one per line
- preserve existing separator lexeme when syntactically equivalent (
,vs;) in v1 - preserve literal spellings and comment text
- invalid-code formatting in LSP defaults to fail-closed (
RequestFailed) unless formatting is provably safe
If a syntax construct cannot be formatted without choosing a canonical separator, choose semicolon for declarations and comma for list/map/annotation items, and document the exception in tests.
Internal printer primitives:
TextLine(hard break)SoftLine(space or line)IndentGroupConcatIfBreak(optional in v2)
This enables:
- stable wrapping at configurable width
- consistent nested formatting (types, annotations, const literals)
- reuse across full/range formatting
Comment fidelity is a formatter-critical requirement.
Policy:
- comments are lexed as trivia with spans
- formatter emits comments at token boundaries based on trivia ownership
- blank-line preservation is conservative (cap at
MaxBlankLines) - no comment text rewriting in v1
Edge cases to support:
- comments between type and identifier
- trailing comments on fields and enum values
- doc comments preceding declarations and members
- comments inside const maps/lists
Normative rules:
- Input bytes are treated as UTF-8 for parsing/formatting.
- UTF-8 BOM at file start is preserved if present.
- Invalid UTF-8 bytes:
- parser/lexer may emit diagnostics
- formatter must refuse (
ErrUnsafeToFormat) rather than rewrite bytes
- Newline style:
- preserve dominant file newline style (
LForCRLF) for formatter-emitted line breaks - mixed newline input may be normalized to the dominant style and should emit a diagnostic (non-fatal if formatting is otherwise safe)
- preserve dominant file newline style (
- Formatter must not introduce NUL bytes.
The tree-sitter grammar must support:
- current Apache Thrift syntax
- common deprecated syntax forms tolerated in practice (as parseable nodes/tokens)
- error recovery around top-level declarations and container/literal boundaries
grammar/tree-sitter-thrift/queries/ will include:
highlights.scmfor syntax highlighting (future semantic overlay optional)folds.scmfor folding rangessymbols.scmfor declarations (services, structs, enums, typedefs, consts)
tree-sitter integration introduces C code.
Plan:
- vendor/generated parser C sources in repo
- vendor the tree-sitter core C runtime sources used for wasm artifact generation
- build embedded parser wasm artifacts and ship pure-Go (
CGO_ENABLED=0) binaries - test builds on macOS/Linux/Windows in CI before extension packaging work starts
Risk mitigation:
- lock
tree-sitterruntime/parser versions - add a dedicated parser build smoke test in CI
Windows ARM64 note:
- Building
windows/arm64is straightforward for the shipped binaries because runtime execution is pure Go and does not requirecgo. - Grammar wasm generation still needs the pinned wasm toolchain in development/CI, but not in end-user environments.
Because parsing is hybrid (tree-sitter + custom lexer), alignment rules must be explicit:
- all CST node spans are in byte offsets over the same source buffer used by the lexer
- lexer token spans must form a monotonically increasing sequence ending at EOF
- formatter lookup from CST node -> covering token range must be deterministic
- any span mismatch between lexer and parser is a parser bug and should surface as an internal diagnostic/test failure
Implementation note:
- create a small conformance test suite that asserts CST node spans align with expected token boundaries for representative grammar forms (declarations, nested containers, comments, malformed inputs)
initializeshutdown,exittextDocument/didOpentextDocument/didChangetextDocument/didClosetextDocument/publishDiagnosticstextDocument/formattingtextDocument/rangeFormattingtextDocument/documentSymboltextDocument/foldingRangetextDocument/selectionRangetextDocument/semanticTokens/fullworkspace/didChangeConfiguration(configuration reload only; no complex workspace features)
textDocument/onTypeFormatting(optional)- richer diagnostics and quick fixes (e.g., deprecated syntax hints)
- go-to-definition
- references
- rename
- code actions requiring cross-file indexing
- Register
thriftlanguage - Provide TextMate syntax highlighting (
syntaxes/thrift.tmLanguage.json) - Start
thriftlsviavscode-languageclient - Manage
thriftlsinstallation/version selection (managed install tool flow) or use user-provided path - Route formatting requests to LSP
- Expose settings:
thrift.server.paththrift.server.argsthrift.format.lineWidththrift.trace.server
Non-goal in v1:
- Implementing language semantics in the extension. All parsing/formatting/diagnostics logic lives in
thriftls.
v1 decision (managed install):
- Do not bundle
thriftlsbinaries inside the.vsixby default. - Publish per-platform
thriftlsbinaries as release artifacts. - VS Code extension downloads/installs the matching
thriftlsbinary on demand (or via explicit command), similar to established Go tool installation flows. - Store managed binaries in extension-managed storage/cache.
- Allow override via user-specified external path (
thrift.server.path). - Optional in v1 if CI/toolchain is ready: Windows
arm64artifact publication.
Managed install contract (normative v1):
- Extension downloads
thriftlsonly from a trusted release manifest URL or user-configured override endpoint. - Manifest must include:
- manifest schema version
- tool version
- platform/arch tuple
- download URL
- SHA-256 checksum
- file size (bytes)
- Default managed manifest/download endpoints must use HTTPS; non-HTTPS endpoints are allowed only via explicit user override for development or air-gapped mirrors.
- Extension verifies checksum before install and rejects mismatches.
- Install/update is atomic:
- download to temp file
- verify checksum
- replace managed binary via atomic rename where supported
- preserve last-known-good binary for rollback on failed update
- Archive extraction (if used) must reject path traversal entries and unexpected file layouts.
- Extension must clearly surface offline/download/verification errors and allow manual
thrift.server.pathfallback. - Artifact signing/provenance verification (e.g., signatures/attestations) is recommended and may be added before beta if release automation is ready; v1 minimum requirement is checksum verification.
Tradeoffs:
- Managed install keeps
.vsixsmall and aligns with established Go tooling UX - Requires robust download/version/checksum handling in extension
- External path still provides enterprise/offline escape hatch
- v1: TextMate baseline plus
textDocument/semanticTokens/fullfromthriftls - v2: expand semantic-token quality/coverage as needed; keep TextMate as fallback
Primary usage:
thriftfmt path/to/file.thriftthriftfmt --write path/to/file.thriftthriftfmt --check path/to/file.thriftthriftfmt --stdin --assume-filename foo.thriftthriftfmt --line-width 100
Flags:
--write,-w: write result in-place--check: non-zero exit if changes would be made--stdin: read source from stdin--stdout: explicit stdout (default if no-w)--assume-filename: URI/name for diagnostics and parser context--line-width: max width--range start:end(optional in v1 CLI; required by API, not required by CLI)- v1 syntax (if implemented): byte offsets, half-open
[start,end), zero-based (e.g.--range 120:240) - future line/column syntax, if added, must use a distinct flag to avoid ambiguity
- v1 syntax (if implemented): byte offsets, half-open
--debug-tokens--debug-cst
0: success; no changes (or write success)1: formatting changes required in--check2: syntax errors prevented formatting3: internal error
Input/output conflict rules (normative):
--writeand--stdinmay not be used together--checkand--writemay not be used together- formatting multiple files in one invocation is deferred unless explicitly added later
- By default, refuse formatting if syntax tree is too broken to ensure safe output
- Emit syntax diagnostics to stderr
- Return exit code
2
- Always attempt parse and publish diagnostics
- Formatting handlers may:
- return no edits when already formatted
- return LSP
RequestFailed/ContentModifiedwhen unsafe or stale
- Never crash on malformed input
Formatter may refuse when:
- unterminated block/string causes tokenization desync
- root tree is mostly recovery/error nodes
- selected range cannot be widened to a format-safe ancestor
Exact thresholds should be documented in docs/formatting-style.md and covered by tests.
Minimum v1 threshold policy (to avoid implementation ambiguity):
- full-document formatting is allowed if lexer reaches EOF and root parse tree exists, even with recoverable parse diagnostics, unless unterminated string/block comment prevents reliable tokenization
- range formatting requires a format-safe ancestor with fully bounded token coverage
- if refusal occurs, diagnostics must indicate the blocking region when possible
Targets are for local editor interaction and CI formatting runs.
- Parse + diagnostics for typical files (<2k LOC): p95 <50 ms on reference hardware (warm)
- Full document format for typical files: p95 <100 ms on reference hardware (warm)
didChangehandling and diagnostic refresh: perceived responsive under normal typing (debounce allowed); target p95 <75 ms parse+diagnostics on typical files after debounce- No unbounded memory growth across repeated open/change/close cycles in LSP session
These are non-binding v1 targets but required for beta sign-off.
Measurement rules (required for beta sign-off):
- Publish benchmark corpus definitions (at least: small, typical, large Apache Thrift files; malformed-file set).
- Record hardware/OS baseline for reported numbers in CI or release notes.
- Report p50/p95 latency for parse and format benchmarks.
- Track steady-state RSS (or equivalent process memory metric) during repeated LSP open/change/close test loops.
- lexer tokenization and trivia capture
- UTF-16 position mapping
- parser node wrappers and queries
- formatter doc-printer behavior
- LSP handler utilities
input.thrift->expected.thrift- idempotence:
fmt(fmt(x)) == fmt(x) - comment preservation fixtures
- malformed syntax recovery fixtures
- range-format widening fixtures
- Parse large sets of real-world
.thriftfiles - Include compiler fixtures and custom edge-case corpus
- Validate formatted output parses with the official Apache Thrift compiler (
thrift) - CI job should fail if formatter emits syntax not accepted by official compiler
Version pinning requirement:
- CI must pin the oracle compiler version (container image or released binary) to avoid silent behavior drift.
- A separate scheduled job may run against latest upstream for early-warning compatibility signals.
didOpendiagnostics- versioned
didChangeordering - formatting and range formatting responses
- cancellation handling
- UTF-16 edit correctness
initializecapability advertisement matches implemented handlers- formatting request failure semantics (
RequestFailed,ContentModified) are covered by integration tests
- extension activation
- server launch
- diagnostics visible
- formatting command works
- syntax highlighting grammar loads
- managed
thriftlsinstall/update flow works against test manifest - checksum verification failure is surfaced and blocks activation of managed binary
- fuzz lexer and parser for panics/crashes
- fuzz formatting on arbitrary token streams/trees (best effort)
go test ./...golangci-lint(or equivalent)- parser generation drift check (
tree-sittergenerated files committed and up to date) - corpus parse tests
- golden formatter tests
- compatibility oracle tests (with
thriftcompiler installed in job) - VS Code extension build smoke test
- cross-platform binary build smoke (at least compile)
- release manifest/checksum generation and verification smoke (for managed
thriftlsinstall flow)
Recommended additions (required before beta):
- race detector run for LSP/document-store packages (
go test -raceon supported CI runners) - VS Code extension integration smoke against a packaged
.vsix
thriftfmtbinaries (macOS/Linux/Windows)thriftlintbinaries (macOS/Linux/Windows)thriftlsbinaries (macOS/Linux/Windows)thriftlsrelease manifest (machine-readable platform matrix + checksums)- checksums file (SHA-256) for published binaries/artifacts
- VS Code extension package (
.vsix)
- SemVer across CLI, LSP server, and VS Code extension
- v1 uses a shared repo version; the VS Code extension version tracks the repo release version
- Release versions are proposed by a bot-managed release PR derived from merged Conventional Commit-style PR titles
- The release PR is the source of truth for version bumps; merging it creates the
vX.Y.Ztag that starts the publish workflow - VS Code extension user-facing notes are maintained in
editors/vscode/CHANGELOG.mdunderUnreleasedand rolled into the released version during release preparation
Scope:
- repo scaffold
- CI skeleton
- test harness
- RFC + architecture docs
Acceptance criteria:
- repository structure exists and builds
- CI runs lint + unit test placeholders
- golden test harness can execute sample fixtures
tree-sitterparser generation script stubbed and documented- chosen Go
tree-sitterbinding and version are pinned in repo docs/build files
Scope:
- lossless lexer
tree-sittergrammar v1- CST wrapper
- syntax diagnostics
Acceptance criteria:
- parser returns
Treewith tokens and nodes for valid fixtures - parser returns recoverable diagnostics for invalid fixtures
- no panics on corpus parse test
- node spans map correctly to source bytes and LSP positions (tests)
- at least top-level declarations and members are represented in CST query APIs
- parser/lexer alignment invariants are enforced by dedicated tests
Scope:
- full-document formatter
- comment preservation
- CLI
thriftfmt
Acceptance criteria:
- supports includes/namespaces/typedefs/enums/consts/structs/exceptions/services
- formatter is idempotent on formatter corpus
- comments are preserved in output (golden tests)
--checkexit codes behave as specified- formatted output parses with official
thriftcompiler across corpus subset - formatter refusal behavior (
ErrUnsafeToFormator equivalent) is finalized and tested
Scope:
- diagnostics + formatting + range formatting
- document symbols/folding/selection ranges
Acceptance criteria:
thriftlshandles LSP open/change/close lifecycle without crashes- diagnostics update on edits for valid and invalid files
textDocument/formattingandrangeFormattingreturn valid edits- range formatting widens to safe ancestors and is covered by tests
- document symbols and folding ranges are returned for core declarations
- formatting request failure semantics (
RequestFailed,ContentModified) are covered by integration tests initializeadvertises only implemented v1 capabilities
Scope:
- syntax highlighting
- LSP client integration
- server binary management/install flow (managed install)
Acceptance criteria:
- opening
.thriftfile activates extension - syntax highlighting works with TextMate grammar
- diagnostics and formatting work via extension-managed
thriftlsinstall (or configured external path) - managed install validates manifest/checksum and preserves last-known-good binary on failed update
- offline/download/verification failures produce actionable user-facing errors and do not corrupt existing managed binary
- extension works on macOS/Linux/Windows in smoke tests
- user can override server path via settings
Scope:
- performance tuning
- crash hardening and release automation
Acceptance criteria:
- beta performance targets met on representative corpora
- no known crashers from fuzz/corpus suites
- release pipeline produces signed/publishable artifacts (or documented unsigned process)
- user documentation covers install, format, and VS Code setup
- Project hosting and governance:
- Resolved for v1: start in
github.com/kpumuk/thrift-weaverand evaluate upstreaming later.
- Resolved for v1: start in
tree-sitterdistribution policy:- Superseded by RFC 0002: ship embedded wasm parser artifacts and pure-Go (
CGO_ENABLED=0) binaries; no cgo parser backend remains. - Follow-up: keep wasm artifact drift and runtime ABI checks green in CI.
- Superseded by RFC 0002: ship embedded wasm parser artifacts and pure-Go (
- Formatter v1 style strictness:
- Resolved for v1: preserve separator lexemes and deprecated spellings by default; canonicalize whitespace/indentation only.
- Library API stability:
- Resolved for v1: keep implementation packages internal until post-beta; no public/stable Go library API commitment in v1.
- Invalid-code formatting policy in editors:
- Resolved for v1: fail closed (no edits + explicit error) unless formatting is provably safe.
- Release orchestration:
- Resolved for v1: use release PR automation driven by Conventional Commit-style PR titles, then publish from the created
vX.Y.Ztag.
- Resolved for v1: use release PR automation driven by Conventional Commit-style PR titles, then publish from the created
- VS Code changelog ownership:
- Resolved for v1: keep a curated in-repo changelog for extension-user-visible changes and roll
Unreleasedinto the released version during release preparation.
- Resolved for v1: keep a curated in-repo changelog for extension-user-visible changes and roll
Remaining non-blocking question (can be decided in M3/M4):
- Linux managed binary compatibility policy for VS Code extension:
- Resolved direction: follow managed install/distribution patterns rather than bundling.
- Remaining detail (M3/M4): define Linux binary baseline(s) and fallback guidance (
glibcfloor and/or alternate artifacts).
No M0-blocking open questions remain.
These are narrower than the open questions above and directly block scaffolding work:
- Resolved: repository home/module path starts at
github.com/kpumuk/thrift-weaver - Superseded by RFC 0002: use embedded tree-sitter wasm with
wazero; keep tree-sitter core/runtime sources vendored for wasm generation - Resolved: use RFC v1 default style profile and preserve separators/deprecated spellings
- Resolved: LSP invalid-format behavior defaults to fail-closed
- Resolved: Windows
arm64artifact publication is supported in the pure-Go release matrix - Resolved: release automation reads Conventional Commit-style PR titles from squash merges; individual local commits remain unconstrained
- Resolved: the shared repo version also updates
editors/vscode/package.jsonand the root package-lock metadata fields during release preparation
Rejected for this project scope because:
- frontend discards trivia and normalizes syntax too early for formatter needs
- significant refactor would be required
- editor/LSP incremental parsing remains unsolved
Deferred (possible future alternative) because:
- simpler pure-Go distribution
- but higher risk/time for error recovery + incremental/LSP-friendly behavior
Rejected because editor integration is a primary requirement and affects parser architecture choices from day one.
- Implement engine and CLI first (
thriftfmt) to stabilize formatting semantics. - Add
thriftlson top of same engine. - Ship VS Code extension with managed
thriftlsinstall (plus external-path fallback). - Iterate on editor features (semantic tokens, code actions, navigation).
internal/text(line index, UTF-16 conversions, byte edits)internal/lexer(lossless tokens/trivia + tests)grammar/tree-sitter-thriftskeleton + parser generation pipelineinternal/syntaxparse wrapper + diagnostics + CST queriesinternal/formatdoc printer + declaration formattingcmd/thriftfmt+ golden tests + compiler compatibility CIinternal/lspcore server + formatting/diagnostics handlerseditors/vscodeextension with TextMate + managed-installthriftls- Hardening, performance, release automation