Skip to content

Implement new enums syntax (enforced underlying type, doc / doc-ref support)#335

Draft
generalmimon wants to merge 9 commits intomasterfrom
new-enums-syntax
Draft

Implement new enums syntax (enforced underlying type, doc / doc-ref support)#335
generalmimon wants to merge 9 commits intomasterfrom
new-enums-syntax

Conversation

@generalmimon
Copy link
Copy Markdown
Member

@generalmimon generalmimon commented Apr 16, 2026

Resolves kaitai-io/kaitai_struct#1288, resolves kaitai-io/kaitai_struct#358

Related PR in the KST repo to update the test suite accordingly: kaitai-io/kaitai_struct_tests#140

As explained in kaitai-io/kaitai_struct#1288, this is a breaking change: the current enums syntax is declared old and unsupported ("legacy pre-v0.12 enum syntax"), and only the new syntax is accepted.

I can imagine that this will have a big impact, because suddenly almost everyone will have to update their .ksy specs. I suppose it would be possible to maintain support for the old syntax somehow, but that would significantly limit adoption of the new one... So I'm not sure if that's really what we want.

generalmimon added a commit to kaitai-io/kaitai_struct_tests that referenced this pull request Apr 18, 2026
See kaitai-io/kaitai_struct#1288

This commit fixes only the tests that were broken compared to the
`master` branch when running `sbt test` in a compiler that requires the
new syntax for enum definitions:
kaitai-io/kaitai_struct_compiler#335
Example error message:

```
formats/enum_values_out_of_type_range_s2.ksy: /enums/animal/values/-32769:
        error: integer constant -32769 is out of range -32768..32767 of the enum's underlying type `s2`
```
It was recognized that allowing enums to be applied only to integer
types that match the enum's underlying type is unnecessarily restrictive
and serves no practical purpose - see
kaitai-io/kaitai_struct#1288 (comment)
generalmimon added a commit to kaitai-io/kaitai_struct_tests that referenced this pull request Apr 23, 2026
See kaitai-io/kaitai_struct#1288

This commit fixes only the tests that were broken compared to the
`master` branch when running `sbt test` in a compiler that requires the
new syntax for enum definitions:
kaitai-io/kaitai_struct_compiler#335
See kaitai-io/kaitai_struct#1288

Closes #248
(in Go, we no longer use the `int` type with platform-dependent width)

Fixes kaitai-io/kaitai_struct#862

Fixes kaitai-io/kaitai_struct#92
See kaitai-io/kaitai_struct#959

As mentioned in the linked issue, this is only possible in C++11 (in
C++98, it would be invalid syntax). This means that the undefined
behavior still remains in the C++98 code, and therefore
kaitai-io/kaitai_struct#959 hasn't been fully
resolved yet.
@generalmimon
Copy link
Copy Markdown
Member Author

generalmimon commented May 1, 2026

Let me summarize the results of this PR. The syntax changes and semantics are explained in kaitai-io/kaitai_struct#1288, so I won't repeat them here. Instead, I will focus on the observable changes in the generated code.

Summary

  1. As per Documentation support for enums kaitai_struct#358 (comment), enum-level doc and doc-ref are written into the generated code as docstrings: 5c6b444

    The Docstrings and DocstringsDocref tests were extended to demonstrate the use of enum-level doc / doc-ref keys: kaitai-io/kaitai_struct_tests@056e408

  2. To enable full support for 32-bit/64-bit signed/unsigned integers and thereby fix errors in the tests

    in C#, Go and Zig, enums in these languages are now declared with fixed underlying type: eca0c22

  3. To avoid undefined behavior in C++11 when casting out-of-range integers to our generated enum types (see C++: loading out-of-range values to enums without fixed underlying type is undefined behavior kaitai_struct#959), the C++11 generated code now always declares enums with fixed underlying type too: 02e26f1

    The C++98 code still declares enums without fixed underlying type (just as before), because declaring a fixed underlying type is a feature available only since C++11, as I mentioned in C++: loading out-of-range values to enums without fixed underlying type is undefined behavior kaitai_struct#959.

Changes in test results

C++11 - cpp_stl_11/clang18-linux-x86_64

The Enum{Int,Long}Range{S,U} tests were already passing in C++11 previously, so there is no difference in test results during a normal test run. To verify that kaitai-io/kaitai_struct#959 has been fixed for C++11, we need to run the test suite under UndefinedBehaviorSanitizer (UBSan).

Running the test suite with UBSan

This is what I used to enable UBSan in spec/cpp_stl_11/CMakeLists.txt:

diff --git i/spec/cpp_stl_11/CMakeLists.txt w/spec/cpp_stl_11/CMakeLists.txt
index cc5af639..1716f9c4 100644
--- i/spec/cpp_stl_11/CMakeLists.txt
+++ w/spec/cpp_stl_11/CMakeLists.txt
@@ -9,6 +9,7 @@ find_package(Iconv REQUIRED)
 # Enforce UTF-8 source files encoding for MSVC
 add_compile_options("$<$<CXX_COMPILER_ID:MSVC>:/utf-8>")

+set(CMAKE_BUILD_TYPE Debug)
 set(CMAKE_CXX_STANDARD 11)
 set(CMAKE_CXX_STANDARD_REQUIRED ON)
 set(CMAKE_CXX_EXTENSIONS OFF)
@@ -69,6 +70,9 @@ target_compile_options(ks_tests PRIVATE
        >
 )

+target_compile_options(ks_tests PUBLIC -fno-omit-frame-pointer -fno-optimize-sibling-calls -O1 -fsanitize=undefined)
+target_link_options(ks_tests PUBLIC -fno-omit-frame-pointer -fno-optimize-sibling-calls -O1 -fsanitize=undefined)
+
 add_test(ks_tests ks_tests)

 enable_testing()

However, it turned out that if you try to run the test suite now using our Docker image kaitai-cpp_stl-clang18-linux-x86_64 (./docker-ci -t cpp_stl -u _11 -i clang18 -r) built according to src/cpp_stl/clang18-linux-x86_64/Dockerfile, the linking phase fails:

[  1%] Linking CXX executable ks_tests
/usr/bin/cmake -E cmake_link_script CMakeFiles/ks_tests.dir/link.txt --verbose=1
/usr/bin/c++ -g -fno-omit-frame-pointer -fno-optimize-sibling-calls -O1 -fsanitize=undefined CMakeFiles/ks_tests.dir/runtime/cpp_stl/kaitai/kaitaistream.cpp.o (...) -o ks_tests (...)
/usr/bin/ld: cannot find /usr/lib/llvm-18/lib/clang/18/lib/linux/libclang_rt.ubsan_standalone-x86_64.a: No such file or directory
/usr/bin/ld: cannot find /usr/lib/llvm-18/lib/clang/18/lib/linux/libclang_rt.ubsan_standalone_cxx-x86_64.a: No such file or directory
c++: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[2]: *** [CMakeFiles/ks_tests.dir/build.make:9731: ks_tests] Error 1

So the missing libclang-rt-dev package needs to be installed:

diff --git i/src/cpp_stl/clang18-linux-x86_64/Dockerfile w/src/cpp_stl/clang18-linux-x86_64/Dockerfile
index 389b6e8..8254d4a 100644
--- i/src/cpp_stl/clang18-linux-x86_64/Dockerfile
+++ w/src/cpp_stl/clang18-linux-x86_64/Dockerfile
@@ -6,6 +6,7 @@ COPY 4img/* ./
 # Install all dependencies
 RUN apt-get update && apt-get install -y --no-install-recommends \
        clang \
+       libclang-rt-dev \
        make \
        cmake \
        libboost-test-dev \

The resulting diff of the test binary's output before and after 02e26f1 follows:

diff --git 1/test_out/cpp_stl_11-clang18-old/test_run-1.stdout 2/test_out/cpp_stl_11-clang18/test_run-1.stdout
index 06f086bf..d4097c88 100644
--- 1/test_out/cpp_stl_11-clang18-old/test_run-1.stdout
+++ 2/test_out/cpp_stl_11-clang18/test_run-1.stdout
@@ -1,19 +1,3 @@
-/tests/compiled/cpp_stl_11/switch_manual_enum_invalid.h:98:43: runtime error: load of value 255, which is not a valid value for type 'code_enum_t'
-SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tests/compiled/cpp_stl_11/switch_manual_enum_invalid.h:98:43
-/tests/compiled/cpp_stl_11/switch_manual_enum_invalid.h:98:43: runtime error: load of value 255, which is not a valid value for type 'code_enum_t'
-SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tests/compiled/cpp_stl_11/switch_manual_enum_invalid.h:98:43
-/usr/include/boost/test/tools/old/impl.hpp:107:12: runtime error: load of value 255, which is not a valid value for type 'const switch_manual_enum_invalid_t::opcode_t::code_enum_t'
-SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/boost/test/tools/old/impl.hpp:107:12
-/usr/include/boost/test/tools/old/impl.hpp:107:20: runtime error: load of value 255, which is not a valid value for type 'const switch_manual_enum_invalid_t::opcode_t::code_enum_t'
-SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/boost/test/tools/old/impl.hpp:107:20
-/tests/compiled/cpp_stl_11/switch_manual_enum_invalid_else.h:121:43: runtime error: load of value 255, which is not a valid value for type 'code_enum_t'
-SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tests/compiled/cpp_stl_11/switch_manual_enum_invalid_else.h:121:43
-/tests/compiled/cpp_stl_11/switch_manual_enum_invalid_else.h:121:43: runtime error: load of value 255, which is not a valid value for type 'code_enum_t'
-SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /tests/compiled/cpp_stl_11/switch_manual_enum_invalid_else.h:121:43
-/usr/include/boost/test/tools/old/impl.hpp:107:12: runtime error: load of value 255, which is not a valid value for type 'const switch_manual_enum_invalid_else_t::opcode_t::code_enum_t'
-SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/boost/test/tools/old/impl.hpp:107:12
-/usr/include/boost/test/tools/old/impl.hpp:107:20: runtime error: load of value 255, which is not a valid value for type 'const switch_manual_enum_invalid_else_t::opcode_t::code_enum_t'
-SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/boost/test/tools/old/impl.hpp:107:20

 Test module "C++/STL test units for Kaitai Struct" has failed with:
   255 test cases out of 260 passed

As you can see, all the errors detected by UBSan have been fixed.

C# - csharp/net10.0-linux-x86_64

  • 🟢 3 tests fixed:
    • EnumIntRangeU
      -    {"status"=>"format_build_failed"}
      +    {"status"=>"passed"}
    • EnumLongRangeS
      -    {"status"=>"format_build_failed"}
      +    {"status"=>"passed"}
    • EnumLongRangeU
      -    {"status"=>"format_build_failed"}
      +    {"status"=>"passed"}

Go - go/1.26-linux-x86_64

  • ➕ 1 tests added:
    • EnumLongRangeU
      {"status"=>"passed", "elapsed"=>0.0, "is_kst"=>true}
  • ➖ 1 tests removed:
    • EnumToIInvalid

      {"status"=>"passed", "elapsed"=>0.0, "is_kst"=>true}

      New compile error:

      ../../compiled/go/src/test_formats/enum_to_i_invalid.go:111:33: 32768 (untyped int constant) overflows uint8
      
      $ cat -n compiled/go/src/test_formats/enum_to_i_invalid.go
        (...)
          11  type EnumToIInvalid_Animal uint8
        (...)
          21  type EnumToIInvalid struct {
          22          Pet1 EnumToIInvalid_Animal
          23          Pet2 EnumToIInvalid_Animal
        (...)
         106  func (this *EnumToIInvalid) Pet2Mod() (v int, err error) {
         107          if (this._f_pet2Mod) {
         108                  return this.pet2Mod, nil
         109          }
         110          this._f_pet2Mod = true
       * 111          this.pet2Mod = int(this.Pet2 + 32768)
         112          return this.pet2Mod, nil
         113  }

      Unfortunately, this kind of error (integer type mismatch) is very common in the currently generated Go code, and using the correct underlying type for the animal enum just surfaced another instance. This is a more general issue and has nothing to do specifically with enums, and therefore is unrelated to this PR.

Zig - zig/0.15.2-linux-x86_64

  • 🟢 3 tests fixed:
    • EnumIntRangeU
      -    {"status"=>"format_build_failed"}
      +    {"status"=>"passed"}
    • EnumLongRangeS
      -    {"status"=>"format_build_failed"}
      +    {"status"=>"passed"}
    • EnumLongRangeU
      -    {"status"=>"format_build_failed"}
      +    {"status"=>"passed"}
  • 🔴 1 tests broken:
    • EnumToI

      -    {"status"=>"passed"}
      +    {"status"=>"format_build_failed"}

      This is a similar problem as in Go:

      formats/enum_to_i.zig:67:39: error: expected type 'i32', found 'u32'
              _v = @intFromEnum(self.pet_1) + 32768;
                   ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
      formats/enum_to_i.zig:67:39: note: signed 32-bit int cannot represent all possible unsigned 32-bit values
      referenced by:
          test.EnumToI: tests/enum_to_i_test.zig:21:59
          comptime: tests.zig:64:17
      
      $ cat -n compiled/zig/enum_to_i.zig
          63      pub fn pet1Mod(self: *EnumToI) !i32 {
          64          if (self._m_pet_1_mod) |_v|
          65              return _v;
          66          var _v: i32 = undefined;
       *  67          _v = @intFromEnum(self.pet_1) + 32768;
          68          self._m_pet_1_mod = _v;
          69          return _v;
          70      }

      If you look at it, the Zig compiler is absolutely right here: it makes no sense to store u32 + <const> in an i32 variable. So the problem is that our compiler derived the resulting type of this value instance as CalcIntType (which maps to i32) instead of a more specific type based on the types of operands (related issues: CalcIntType "int" might lead to loss of data (in Java) kaitai_struct#510, [C++] Computed instances are int32_t kaitai_struct#617). This error is therefore also unrelated to this PR.

Nim - nim/2.2.10-linux-x86_64

Although the latest CI status of the Enum{Int,Long}Range{S,U} tests in Nim is the following (see https://ci.kaitai.io/ and search for tests containing "range", and here's the permalink to ci.json: https://github.com/kaitai-io/ci_artifacts/blob/39831e9b7b0a405d8c7f5b5ff5eee937e851f717/test_out/nim/ci.json#L265-L301):

  1. EnumIntRangeS - passed
  2. EnumIntRangeU - passed
  3. EnumLongRangeS - passed
  4. EnumLongRangeU - failed

... in reality the "passed" status of 2. EnumIntRangeU and 3. EnumLongRangeS is a lie - they should be actually failed as well. I explained in detail in the commit message of kaitai-io/kaitai_struct_tests@e55f4a0 why they passed in Nim 2.2.10 (the latest version as of this writing) until now and how I improved the tests to prevent this from happening. The relevant excerpt is:

(...)
~/kaitai_struct_tests/spec/nim# grep 'typedef.*EnumLongRangeS_Constants' /root/.cache/nim/tenum_long_range_s_d/@m..@s..@scompiled@snim@senum_long_range_s.nim.c
typedef NI32 tyEnum_EnumLongRangeS_Constants__ZSLAsl9awA9cujexdPBWfDiA;

NI32 is Nim's internal name for int32.

Consequently, the assertions in spec/nim/tenum_long_range_s.nim (or spec/nim/tenum_int_range_u.nim) don't work properly, because both sides of each comparison are truncated to signed 32-bit integers:

~/kaitai_struct_tests/spec/nim# grep '==.*EnumLongRangeS_Constants' /root/.cache/nim/tenum_long_range_s_d/@mtenum_long_range_s.nim.c
                if (!!(((*r__tenum95long95range95s_u218).f1 == ((tyEnum_EnumLongRangeS_Constants__ZSLAsl9awA9cujexdPBWfDiA)(IL64(-9223372036854775807) - IL64(1)))))) goto LA4_;
                if (!!(((*r__tenum95long95range95s_u218).f2 == ((tyEnum_EnumLongRangeS_Constants__ZSLAsl9awA9cujexdPBWfDiA)IL64(-2147483649))))) goto LA8_;
                if (!!(((*r__tenum95long95range95s_u218).f3 == ((tyEnum_EnumLongRangeS_Constants__ZSLAsl9awA9cujexdPBWfDiA)(-2147483647 -1))))) goto LA12_;
                if (!!(((*r__tenum95long95range95s_u218).f4 == ((tyEnum_EnumLongRangeS_Constants__ZSLAsl9awA9cujexdPBWfDiA)0)))) goto LA16_;
                if (!!(((*r__tenum95long95range95s_u218).f5 == ((tyEnum_EnumLongRangeS_Constants__ZSLAsl9awA9cujexdPBWfDiA)2147483647)))) goto LA20_;
                if (!!(((*r__tenum95long95range95s_u218).f6 == ((tyEnum_EnumLongRangeS_Constants__ZSLAsl9awA9cujexdPBWfDiA)IL64(2147483648))))) goto LA24_;
                if (!!(((*r__tenum95long95range95s_u218).f7 == ((tyEnum_EnumLongRangeS_Constants__ZSLAsl9awA9cujexdPBWfDiA)IL64(9223372036854775807))))) goto LA28_;

Of course, all of these comparisons are satisfied, so the test passes. But since the EnumLongRangeS test was specifically designed to verify that signed 64-bit integer enums work correctly, it did a poor job in this case, because it was unable to detect this incorrect Nim behavior.

While digging into this, I discovered that enums in Nim 2.2.10 are severely broken and mostly unusable for any values beyond the range of the signed 32-bit integer type. This all stems from the fact that unlike any other statically typed language we support, you cannot reliably specify a fixed underlying type of Nim enums. The underlying type is automatically decided for you by this piece of horrible code in the Nim 2.2.10 compiler:

compiler/ccgtypes.nim

        if firstOrd(m.config, t) < 0:
          m.s[cfsTypes].addf("typedef NI32 $1;$n", [result])
          size = 4
        else:
          size = int(getSize(m.config, t))
          case size
          of 1: m.s[cfsTypes].addf("typedef NU8 $1;$n", [result])
          of 2: m.s[cfsTypes].addf("typedef NU16 $1;$n", [result])
          of 4: m.s[cfsTypes].addf("typedef NI32 $1;$n", [result])
          of 8: m.s[cfsTypes].addf("typedef NI64 $1;$n", [result])
          else: internalError(m.config, t.sym.info, "getTypeDescAux: enum")

This means that if you have any negative value in the enum, the underlying type will always be NI32 (bye bye support for int64-based enums with negative values). If all enum values are non-negative and they fit into 32 bits, the underlying type will always be NI32 (bye bye support for uint32-based enums). If they are non-negative and fit into 64 bits, you get NI64. So besides int32-based enums, which work fully, we have support for int64 with non-negative values, and that's it (so of course, uint64-based enums don't work either).

I think that this abomination really shows just how much the required underlying type as proposed in kaitai-io/kaitai_struct#1288 is important - otherwise, you get a very poorly designed half-baked solution, as Nim so vividly demonstrates here.

So here are the changes in test results:

  • 🔴 2 tests broken:
    • EnumIntRangeU
      -    {"status"=>"passed"}
      +    {"status"=>"failed",
      +     "failure"=>
      +      {"file_name"=>nil,
      +       "line_num"=>nil,
      +       "message"=>
      +        "/tests/spec/nim/tenum_int_range_u.nim(12) tenum_int_range_u\n" +
      +        "/opt/nim/lib/std/assertions.nim(45) failedAssertImpl\n" +
      +        "/opt/nim/lib/std/assertions.nim(40) raiseAssert\n" +
      +        "/opt/nim/lib/system/fatal.nim(62) sysFatal\n" +
      +        "Error: unhandled exception: /tests/spec/nim/tenum_int_range_u.nim(12, 1) `ord(r.f2) == 4294967295&apos;i64`  [AssertionDefect]\n",
      +       "trace"=>""}}
    • EnumLongRangeS
      -    {"status"=>"passed"}
      +    {"status"=>"failed",
      +     "failure"=>
      +      {"file_name"=>nil,
      +       "line_num"=>nil,
      +       "message"=>
      +        "/tests/spec/nim/tenum_long_range_s.nim(9) tenum_long_range_s\n" +
      +        "/opt/nim/lib/std/assertions.nim(45) failedAssertImpl\n" +
      +        "/opt/nim/lib/std/assertions.nim(40) raiseAssert\n" +
      +        "/opt/nim/lib/system/fatal.nim(62) sysFatal\n" +
      +        "Error: unhandled exception: /tests/spec/nim/tenum_long_range_s.nim(9, 1) `ord(r.f1) == -9223372036854775808&apos;i64`      [AssertionDefect]\n",
      +       "trace"=>""}}

As expected, these two tests now fail, which is a good thing, because it finally reflects reality.

Rust - rust/1.95-linux-x86_64

In Rust, enums are currently always i64-based. Therefore, the EnumLongRangeU has status format_build_failed (see https://github.com/kaitai-io/ci_artifacts/blob/a08bf924dd1c66aa4e443ba3ba010b1fca806f10/test_out/rust/ci.json#L281), because the generated format code fails to compile. build-2.log:25-26 contains the specific error messages:

error: literal out of range for `i64`
  --> src/formats/enum_long_range_u.rs:93:13
   |
93 |             18446744073709551615 => Ok(EnumLongRangeU_Constants::LongMax),
   |             ^^^^^^^^^^^^^^^^^^^^
   |
   = note: the literal `18446744073709551615` does not fit into the type `i64` whose range is `-9223372036854775808..=9223372036854775807`
   = help: consider using the type `u64` instead
   = note: `#[deny(overflowing_literals)]` on by default


error: literal out of range for `i64`
   --> src/formats/enum_long_range_u.rs:105:50
    |
105 |             EnumLongRangeU_Constants::LongMax => 18446744073709551615,
    |                                                  ^^^^^^^^^^^^^^^^^^^^
    |
    = note: the literal `18446744073709551615` does not fit into the type `i64` whose range is `-9223372036854775808..=9223372036854775807`
    = help: consider using the type `u64` instead

I tried switching to the specific underlying type for each enum just like in C#, Go and Zig (see eca0c22). It was easy to do, but testing revealed that this breaks 25 tests: all tests with enums except EnumLongRangeS and ExprEnum. This is because i64 is currently hardcoded by the compiler (RustCompiler.scala:1205-1208), so we cannot easily use various underlying types of enums without removing these hardcoded (... as i64) casts (which would probably require changes in other places as well). Therefore, it's best to leave Rust as it is for now.

@generalmimon
Copy link
Copy Markdown
Member Author

I believe this PR is finished and ready for review - I don't think there is anything left to change before merging. However, I will keep it in the "draft" state, because I don't know yet whether we'll want to merge it into master or somewhere else, and I don't want to be merged by mistake before that is decided. AFAIK this is the first backwards-incompatible change in the .ksy language in the history of 0.x releases, so as @GreyCat pointed out in kaitai-io/kaitai_struct#1288 (comment), we shouldn't just release it in 0.12, but we should probably continue with the 0.x-compatible releases as 1.x and this should go into 2.x.

To be honest, I've never had to work with multiple release branches. I suppose there will be something like 1.x and 2.x Git branches, we will primarily work on 2.x and backport some changes to 1.x if possible. I don't know what role (if any) the master branch plays in this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New enums syntax to allow enum-level properties (doc / doc-ref, required underlying type) Documentation support for enums

1 participant