Skip to content

kaitai-io/kaitai_yaml

Repository files navigation

Kaitai YAML

A lightweight YAML parser for Scala that handles a practical subset of YAML 1.2. Built on FastParse, it cross-compiles for JVM, Scala.js, and Scala Native.

There are plenty of YAML parsers available for Scala already, here's what makes Kaitai YAML distinct:

  • Portable across all Scala ecosystem (JVM, Scala.js, Scala Native).
  • No type coercion — all scalar values are plain strings. It solves 3/4 of what YAML is routinely being hated for, e.g. infamous Norway problem, treating versions as doubles, etc.
  • Source positions — every YAML node carries its position which can be used for precise offset/line/column error reporting.
  • Deterministic key order — mappings preserve document order.
  • Integration with FastParse — making it possible to embed this parser into other FastParse parsers, or, vice versa, extend YAML parsing with other FastParse parsers.

If you're looking for slightly different feature set, consider using any of these libraries:

Installation

The library is available for Scala 2.12, 2.13, and 3.x.

sbt

// JVM
libraryDependencies += "io.kaitai" %% "kaitai-yaml" % "0.1"

// Scala.js / Scala Native
libraryDependencies += "io.kaitai" %%% "kaitai-yaml" % "0.1"

Mill

ivy"io.kaitai::kaitai-yaml:0.1"

Gradle

implementation("io.kaitai:kaitai-yaml_2.13:0.1")

Usage

Parsing

import fastparse._
import io.kaitai.yaml._

val input = """
name: my-service
version: 1.0
tags: [scala, yaml]
config:
  debug: false
  ports:
    - 8080
    - 8443
""".trim

YamlParser.parse(input) match {
  case Parsed.Success(root, _) => println(root)
  case f: Parsed.Failure       => System.err.println(f.trace().longAggregateMsg)
}

YamlParser.parse returns fastparse.Parsed[YamlNode] — the standard FastParse result type:

  • Parsed.Success carries the parsed AST and the index where parsing stopped.
  • Parsed.Failure provides the failure index and a .trace() with detailed diagnostics.

Navigating the AST

val Parsed.Success(root, _) = YamlParser.parse(input): @unchecked

// YamlMap.apply throws on missing key; .get returns Option
val name = root.asInstanceOf[YamlMap]("name")        // YamlScalar("my-service", 0)
val tags = root.asInstanceOf[YamlMap]("tags")        // YamlSeq([...], ...)
val debug = root.asInstanceOf[YamlMap]("config")
  .asInstanceOf[YamlMap]("debug")                    // YamlScalar("false", ...)

println(name.asInstanceOf[YamlScalar].value)         // "my-service"
println(debug.asInstanceOf[YamlScalar].value)        // "false"

Pattern matching

Pattern matching is the idiomatic way to work with the AST:

val Parsed.Success(root, _) = YamlParser.parse(input): @unchecked

root match {
  case YamlMap(fields, _) =>
    fields.foreach { case (key, value) =>
      value match {
        case YamlScalar(v, _)  => println(s"${key.value} = $v")
        case YamlSeq(items, _) => println(s"${key.value} has ${items.size} items")
        case YamlMap(_, _)     => println(s"${key.value} is a nested map")
      }
    }
  case _ => ()
}

Error handling

On failure you get a Parsed.Failure with the character index where parsing stopped and a full trace:

YamlParser.parse("- [unterminated") match {
  case Parsed.Success(node, _) => // ...
  case f: Parsed.Failure =>
    println(s"Error at index ${f.index}")
    println(f.trace().longAggregateMsg)
}

Position tracking

Every node records where it appeared in the source text as a zero-based character offset:

val Parsed.Success(root, _) = YamlParser.parse("key: value"): @unchecked
val offset = root.pos  // 0

This is the same offset scheme used by FastParse itself. To convert an offset to a line:column string, use ParserInput.prettyIndex:

val input = "key: value"
val pi = fastparse.ParserInput.fromString(input)
println(pi.prettyIndex(root.pos))  // "1:1"

Composing with other FastParse parsers

The YamlParser.document parser is public, so you can embed YAML parsing inside a larger FastParse grammar:

import fastparse._, NoWhitespace._

def myFormat[A: P]: P[(String, YamlNode)] =
  P("---BEGIN---\n" ~ CharsWhile(_ != '\n').! ~ "\n" ~ YamlParser.document)

val input = "---BEGIN---\nheader-value\nkey: value\n"
fastparse.parse(input, myFormat(_)) match {
  case Parsed.Success((header, yaml), _) => println(s"$header -> $yaml")
  case f: Parsed.Failure                 => println(f.trace().longAggregateMsg)
}

AST reference

YamlNode (sealed trait)
  +-- YamlScalar(value: String, pos: Int)
  +-- YamlSeq(items: List[YamlNode], pos: Int)
  +-- YamlMap(fields: List[(YamlScalar, YamlNode)], pos: Int)
        get(key: String): Option[YamlNode]
        apply(key: String): YamlNode   // throws NoSuchElementException

pos is a zero-based character offset into the source input.

Supported YAML

See SUPPORTED_YAML.md for the full specification of which YAML features are and are not supported.

Building from source

Requires sbt 1.x and JDK 11+.

# Run tests across all Scala versions (JVM)
sbt "+kaitaiYamlJVM/test"

# Compile for a specific platform
sbt kaitaiYamlJS/compile
sbt kaitaiYamlNative/compile

# Format and lint
sbt scalafmtAll
sbt scalafixAll

License

MIT -- see LICENSE for details.

About

Kaitai YAML: a simplified portable YAML parser for Scala

Topics

Resources

License

Stars

Watchers

Forks

Contributors