Skip to content

[kotlin] kotlin grammars. #4888

@kaby76

Description

@kaby76

There are two versions of the Kotlin grammar in the kotlinlang.org website.

❯ Claude, please read https://kotlinlang.org/grammar/ and https://kotlinlang.org/spec/syntax-and-grammar.html#syntax-and-grammar.
Extra the EBNF in both web pages. Compare the EBNF and report differences.

● Fetch(https://kotlinlang.org/grammar/)
  ⎿  Received 663.9KB (200 OK)

● Fetch(https://kotlinlang.org/spec/syntax-and-grammar.html#syntax-and-grammar)
  ⎿  Received 218.8KB (200 OK)

● Here is a detailed comparison. The two pages represent different versions and serve different purposes:

  - kotlinlang.org/grammar/ — v2.4.0, ANTLR-style notation (reference grammar)
  - kotlinlang.org/spec/… — v1.9-rfc+0.1, classic EBNF notation (language spec)

  ---
  Notation Differences

  ┌──────────────┬──────────────────────────────┬─────────────────────────────────────┐
  │   Concept    │       grammar/ (ANTLR)       │            spec/ (EBNF)             │
  ├──────────────┼──────────────────────────────┼─────────────────────────────────────┤
  │ Zero or more │ rule*                        │ {rule}                              │
  ├──────────────┼──────────────────────────────┼─────────────────────────────────────┤
  │ One or more  │ rule+                        │ rule {rule}                         │
  ├──────────────┼──────────────────────────────┼─────────────────────────────────────┤
  │ Optional     │ rule?                        │ [rule]                              │
  ├──────────────┼──────────────────────────────┼─────────────────────────────────────┤
  │ Newlines     │ Implicit (handled by parser) │ {NL} inserted explicitly throughout │
  └──────────────┴──────────────────────────────┴─────────────────────────────────────┘

  ---
  Rules Present in grammar/ (v2.4.0) but Absent from spec/ (v1.9)

  ┌────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
  │              Rule              │                                 Definition                                  │
  ├────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
  │ contextParameterList           │ 'context' '(' functionValueParameter (',' functionValueParameter)* ','? ')' │
  ├────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
  │ functionTypeContext            │ 'context' '(' type (',' type)* ','? ')'                                     │
  ├────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
  │ whenSubject                    │ '(' (annotation* 'val' variableDeclaration '=')? expression ')'             │
  ├────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
  │ whenGuard                      │ 'if' expression                                                             │
  ├────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
  │ MULTI_DOLLAR_QUOTE_OPEN        │ '$'+ '"' (multi-dollar string interpolation)                                │
  ├────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
  │ MULTI_DOLLAR_TRIPLE_QUOTE_OPEN │ '$'+ '"""'                                                                  │
  └────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘

  ---
  Rules Present in spec/ but Absent from grammar/

  ┌──────────────────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────┐
  │             Rule             │                                           Definition                                           │
  ├──────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ NL                           │ LF | (CR [LF]) — grammar handles this implicitly                                               │
  ├──────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ NullLiteral                  │ 'null' as a standalone lexer token — grammar inlines it in literalConstant                     │
  ├──────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ elseif                       │ 'else' {NL} 'if' {NL} '(' expression ')' {NL} controlStructureBody                             │
  ├──────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ destructuringLambdaParameter │ multiVariableDeclaration (separate rule in spec)                                               │
  ├──────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ QuotedSymbol                 │ <any character excluding CR, LF and \>`                                                        │
  ├──────────────────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────┤
  │ Many named operator tokens   │ DOT, COMMA, LPAREN, LANGLE, EQEQ, EXCL_EQ, AS_SAFE, etc. (all explicit in spec, inlined as     │
  │                              │ literals in grammar)                                                                           │
  └──────────────────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────┘

  ---
  Rules That Differ in Definition

  classModifier

  # grammar/ (v2.4.0) — adds 'value':
  classModifier: 'enum' | 'sealed' | 'annotation' | 'data' | 'inner' | 'value'

  # spec/ (v1.9):
  classModifier: 'enum' | 'sealed' | 'annotation' | 'data' | 'inner'

  memberModifier

  # grammar/ — 'tailrec' absent here (it's only in functionModifier):
  memberModifier: 'override' | 'lateinit'

  # spec/ — 'tailrec' appears in both memberModifier AND functionModifier (appears duplicated):
  memberModifier: 'override' | 'lateinit' | 'tailrec'

  companionObject

  # grammar/ — supports 'data companion object':
  companionObject: modifiers? 'companion' 'data'? 'object' simpleIdentifier? ...

  # spec/ — no 'data' keyword:
  (no 'data' in companion object rule)

  objectLiteral

  # grammar/ — supports 'data object', classBody is optional:
  objectLiteral: 'data'? 'object' (':' delegationSpecifiers)? classBody?

  # spec/ — no 'data', classBody is required:
  objectLiteral: 'object' [{NL} ':' {NL} delegationSpecifiers] {NL} classBody

  functionType

  # grammar/ — prepends functionTypeContext (context receivers):
  functionType: functionTypeContext? (receiverType '.')? functionTypeParameters '->' type

  # spec/ — no context:
  functionType: [receiverType {NL} '.' {NL}] functionTypeParameters {NL} '->' {NL} type

  anonymousFunction

  # grammar/ — includes context parameters and suspend interplay:
  anonymousFunction: ((contextParameterList 'suspend'?) | ('suspend' contextParameterList?))?
                     'fun' (type '.')? parametersWithOptionalType (':' type)? typeConstraints? functionBody?

  # spec/ — no context, includes typeParameters:
  anonymousFunction: 'fun' [{NL} typeParameters] [{NL} receiverType {NL} '.']
                     functionValueParameters [{NL} ':' {NL} type] [{NL} typeConstraints] [functionBody]

  modifiers

  # grammar/ — contextParameterList is a modifier:
  modifiers: annotation | modifier | contextParameterList+

  # spec/:
  modifiers: {modifier}     (modifier includes annotation inside)

  whenExpression

  # grammar/ — uses whenSubject rule, supports whenGuard:
  whenExpression: 'when' whenSubject? '{' whenEntry* '}'
  whenEntry: whenCondition (',' whenCondition)* ','? whenGuard? '->' controlStructureBody semi?
           | 'else' whenGuard? '->' controlStructureBody semi?

  # spec/ — no whenSubject or whenGuard, uses elseif pattern differently:
  whenExpression: 'when' {NL} [(expression | ['(' expression ')']) {NL}] '{' {NL} {whenEntry} '}'
  whenEntry: whenCondition {{NL} ',' {NL} whenCondition} [{NL} ','] {NL} '->' {NL} statements

  ifExpression

  # grammar/ — flat alternation:
  ifExpression: 'if' '(' expression ')'
                (controlStructureBody | (controlStructureBody? ';'? 'else' (controlStructureBody | ';')) | ';')

  # spec/ — introduces elseif chain rule:
  ifExpression: 'if' {NL} '(' expression ')' {NL} controlStructureBody
                [{{NL} elseif} {NL} 'else' {NL} controlStructureBody]
  elseif: 'else' {NL} 'if' {NL} '(' expression ')' {NL} controlStructureBody

  rangeExpression

  # grammar/ — adds '..<' (range-until operator, Kotlin 1.9+):
  rangeExpression: additiveExpression (('..' | '..<') additiveExpression)*

  # spec/ — only '..':
  rangeExpression: additiveExpression {'..' {NL} additiveExpression}

  catchBlock

  # grammar/ — catch type is 'type' (any type):
  catchBlock: 'catch' '(' annotation* simpleIdentifier ':' type ','? ')' block

  # spec/ — catch type is restricted to 'userType':
  catchBlock: 'catch' {NL} '(' {annotation} simpleIdentifier ':' userType ')' {NL} block

  jumpExpression

  # grammar/ — uses RETURN_AT / CONTINUE_AT / BREAK_AT tokens:
  jumpExpression: 'throw' expression | ('return' | RETURN_AT) expression?
                | 'continue' | CONTINUE_AT | 'break' | BREAK_AT

  # spec/ — uses label syntax:
  jumpExpression: 'throw' {NL} expression | 'return' [label] [{NL} expression]
                | 'continue' [label] | 'break' [label]

  thisExpression / superExpression

  # grammar/ — uses dedicated tokens:
  thisExpression: 'this' | THIS_AT
  superExpression: 'super' ('<' type '>')? ('@' simpleIdentifier)? | SUPER_AT

  # spec/ — uses label syntax:
  thisExpression: 'this' [label]
  superExpression: 'super' [typeArguments] [[label]]

  lambdaLiteral / lambdaParameters / lambdaParameter

  # grammar/:
  lambdaLiteral: '{' (lambdaParameters? '->')? statements '}'
  lambdaParameters: lambdaParameter (',' lambdaParameter)* ','?
  lambdaParameter: variableDeclaration | multiVariableDeclaration (':' type)?

  # spec/ — arrow required when params present, destructuring is a separate rule:
  lambdaLiteral: '{' {NL} [lambdaParameters] {NL} '->' {NL} statements {NL} '}'
  lambdaParameters: lambdaParameter {{NL} ',' {NL} lambdaParameter} [{NL} ',']
                 | destructuringLambdaParameter {{NL} ',' {NL} destructuringLambdaParameter} [{NL} ',']
  lambdaParameter: variableDeclaration | destructuringLambdaParameter
  destructuringLambdaParameter: multiVariableDeclaration

  callableReference

  # grammar/ — simpler, receiverType is optional prefix:
  callableReference: receiverType? '::' (simpleIdentifier | 'class')

  # spec/ — enumerates three forms explicitly:
  callableReference: userType [typeArguments] '::' simpleIdentifier
                   | functionType '::' simpleIdentifier
                   | simpleIdentifier '::' simpleIdentifier

  elvis

  # grammar/:  elvis: '?' ':'
  # spec/:     elvis: QUEST_NO_WS ':'   (explicitly no whitespace between ? and :)

  semi / semis

  # grammar/ — implicit (handled by the parser's hidden channel):
  semi:  (implicit)
  semis: (implicit)

  # spec/ — explicit:
  semi:  (';' | NL) {NL}
  semis: ';' | NL {';' | NL}

  ---
  Summary

  ┌─────────────────────────────────────────────┬───────┐
  │                  Category                   │ Count │
  ├─────────────────────────────────────────────┼───────┤
  │ Rules only in grammar/ (v2.4.0 additions)   │ 6     │
  ├─────────────────────────────────────────────┼───────┤
  │ Rules only in spec/ (v1.9)                  │ ~6    │
  ├─────────────────────────────────────────────┼───────┤
  │ Rules with substantive semantic differences │ ~18   │
  └─────────────────────────────────────────────┴───────┘

  The dominant themes are:
  1. v2.4.0 adds: context parameters (context(...)), value class modifier, data objects/companions, whenGuard (if guard in when), ..<
   range-until, multi-dollar string interpolation.
  2. NL handling: spec is explicit with {NL} everywhere; grammar page uses implicit newline handling.
  3. Label vs. _AT tokens: spec uses [label] syntax; grammar uses dedicated lexer tokens (THIS_AT, RETURN_AT, etc.).
  4. catchBlock: spec restricts to userType, grammar allows any type.
  5. callableReference: grammar is more permissive; spec restricts to three enumerated forms.

✻ Worked for 2m 35s

There are two versions of the Kotlin grammar here in this repo, which I noted previously (#3965).

There are no grammars for the language in the kntlin repo, but it does contain a copy of the javascript/javascript grammar. It is also slightly out of date.

06/09-08:16:39 ~/issues
$ git clone https://github.com/JetBrains/kotlin.git g4-kotlin
Cloning into 'g4-kotlin'...
remote: Enumerating objects: 5575736, done.
remote: Counting objects: 100% (3562/3562), done.
remote: Compressing objects: 100% (1417/1417), done.
remote: Total 5575736 (delta 2957), reused 2153 (delta 2135), pack-reused 5572174 (from 3)
Receiving objects: 100% (5575736/5575736), 4.58 GiB | 23.62 MiB/s, done.
Resolving deltas: 100% (3068761/3068761), done.
Updating files: 100% (105611/105611), done.
06/09-08:44:59 ~/issues
$ cd !$
cd g4-kotlin
06/09-09:01:25 ~/issues/g4-kotlin
$ find . -iname '*.g4'
./js/js.parser/src/main/antlr/JavaScriptLexer.g4
./js/js.parser/src/main/antlr/JavaScriptParser.g4
06/09-09:01:43 ~/issues/g4-kotlin
$ find . -iname '*grammar*'
./license/third_party/antlr_js_grammar_license.txt
06/09-09:02:14 ~/issues/g4-kotlin

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions