Implement Garbage Collector type system by zherczeg · Pull Request #2607 · WebAssembly/wabt

zherczeg · 2025-05-23T10:59:10Z

This patch supports parsing the new GC types

Abstract types
Recursive types
Composite types

The patch also improves type comparison.

zherczeg · 2025-05-23T16:01:04Z

include/wabt/binary-reader.h

 };
 using TypeMutVector = std::vector<TypeMut>;

+// Garbage Collector specific type information


This is a core part of the patch. It contains the (sub ...) part of the type. It is declared as a structure, because it is not mandatory.

zherczeg · 2025-05-23T16:02:27Z

include/wabt/binary-reader.h

  virtual Result OnTypeCount(Index count) = 0;
+  virtual Result OnRecursiveRange(Index start_index, Index type_count) = 0;
  virtual Result OnFuncType(Index index,
+                            GCTypeExtension* gc_ext,


This structure is passed as a second argument, because it is a header, but it could go to the last argument since it is an extra (and optional) information. Which one you prefer?

zherczeg · 2025-05-23T16:03:41Z

include/wabt/type-checker.h

+  struct RecursiveRange {
+    Index start_index;
+    Index type_count;
+  };


This is another core structure to encode (rec ...) constructs. It represents the range.

zherczeg · 2025-05-23T16:04:58Z

include/wabt/type-checker.h

+    std::vector<FuncType> func_types;
+    std::vector<StructType> struct_types;
+    std::vector<ArrayType> array_types;
+    std::vector<RecursiveRange> recursive_ranges;


It is stored in an ordered array. It cannot be stored as part of the types, because zero length (rec) range is allowed for whatever reason.

zherczeg · 2025-06-03T10:20:04Z

I have reworked the type validation system of the patch. Now it is capable of detecting the first type index for all equal types. This first type index is called canonical index. If I have two types (t1/t2), and their canonical index is computed, then type comparison is t1.canonical_index == t2.canonical_index. Sub type indices can also be turned to canonical sub indices. This is not only useful for validation, but also very important for high speed execution, since it simplifies type comparison a lot. To compute these canonical indices, a hash code is computed for each type. When two types have different hash codes, they are never equal. My hash computation algorithm might not be good, I don't have much experience with these algorithms.

zherczeg · 2025-06-03T13:09:40Z

The type-* gc tests are running except the runtime part of type-subtyping.wast
I think the typing system in the validator and interpreter are ok now. This is another huge change with 1500 lines of new code.

rossberg · 2025-06-03T13:18:52Z

It sounds like you are canonicalising wrt type indices. But type indices are meaningless in any program that consists of more than one module. Type canonicalisation must happen globally, across module boundaries, based on the types' structure. I suspect that is the reason for the link/run-time tests failing.

zherczeg · 2025-06-03T19:15:24Z

The link tests are not failing, although the interpreter do a slow type comparison for import/exports. As far as I understand the interpreter here is just a demonstration, so this is probably ok. The runtime tests fail because the operations (such as ref.cast) is not implemented. I will do that in a follow-up patch.

The global type canonicalisation sounds like a very good idea! A high performance engine should do that!

zherczeg · 2025-06-05T04:44:14Z

@sbc100 there is a fuzzer issue in the code. The code is correct though.

https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772
As for the fuzzer generated test, it wants to allocate 16190847 entries, which is pretty large for a 38 byte input, but not an invalid value in general.

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp
LLVM considers this as a large value, and reports it as an error. There is an -rss_limit_mb to modify this limit.

What shall I do?

sbc100 · 2025-06-05T18:54:14Z

@sbc100 there is a fuzzer issue in the code. The code is correct though.

https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772 As for the fuzzer generated test, it wants to allocate 16190847 entries, which is pretty large for a 38 byte input, but not an invalid value in general.

https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp LLVM considers this as a large value, and reports it as an error. There is an -rss_limit_mb to modify this limit.

What shall I do?

We don't tend to have time to worry about fixing all the fuzz tests issues, unless they could conceivable show up in real world programs. i.e. we tend to assume trusted and save inputs, since we don't have the resources the harden wabt against other things.

Having said that we obviously would be happy to accept fixes for such issues if folks come up with them.

zherczeg · 2025-06-05T19:11:32Z

There is nothing to fix here, the code is correct (and not related to this patch). It is simply a limitation of the fuzzer, it assumes too much memory allocation is likely a bug.

zherczeg · 2025-11-14T18:17:07Z

@sbc100 This is the first patch of the GC support. It triggers a fuzzer fail I described above. Reserving memory is correct for valid wasm files. However, a random value generated by a fuzzer causes a large memory allocation, which is considered as an error by the fuzzer. The fuzzer has an option to raise this allocation limit. What do you suggest?

zherczeg · 2025-11-15T04:27:19Z

Note: the issue is visible on the fuzzer backtrace:

     #11 0x56421033ba2f in __allocate_at_least<std::__1::allocator<wabt::interp::DataDesc> > /usr/local/bin/../include/c++/v1/__memory/allocate_at_least.h:41:19
    #12 0x56421033ba2f in __split_buffer /usr/local/bin/../include/c++/v1/__split_buffer:330:25
    #13 0x56421033ba2f in std::__1::vector<wabt::interp::DataDesc, std::__1::allocator<wabt::interp::DataDesc>>::reserve(unsigned long) /usr/local/bin/../include/c++/v1/__vector/vector.h:1109:49
    #14 0x564210325038 in wabt::interp::(anonymous namespace)::BinaryReaderInterp::OnDataCount(unsigned int) /src/wabt/src/interp/binary-reader-interp.cc:925:17
    #15 0x5642103d8474 in wabt::(anonymous namespace)::BinaryReader::ReadDataCountSection(unsigned long) /src/wabt/src/binary-reader.cc:3113:3

The OnDataCount gets a huge number, runs the .reserve() and aborts.

zherczeg

The patch is finally green. I have fixed some fuzzer bugs, which are unrelated to the patch. These are interpreter related, which main purpose is testing.

zherczeg · 2025-11-25T11:01:13Z

src/interp/binary-reader-interp.cc


 Result BinaryReaderInterp::OnFunctionCount(Index count) {
-  module_.funcs.reserve(count);
+  module_.funcs.reserve(std::min(count, 1024u));


Fuzzer fix: avoid allocating too much memory. If count is really > 1024, the array will still grow, just it happens later. However, some elements at the end of the buffer might not be used. I don't know what should we do here. There are similar cases below.

Why is this relevant to this change?

Surely this list will need to grow to count elements anyway? Why not allocate it all here?

Without this change, the fuzzer CI fails. The random byte sequence contains a huge "count" value, and when the reserve is executed, the fuzzer reports it as a "too big allocation". I tried to find a configuration parameter for the fuzzer, but I could not. I would also prefer a fuzzer change rather than this code change. At least it is in the interpreter, and not in the main code.

But that issues seems unrelated to adding the GC type system no? Perhaps this could be split into a separate PR?

I'm not so sure failing with too big allocation is such a bad outcome from such crazy inputs anyway.

If we do use some hardcode value here it might make sense for it be the implementation limit on the number of function.

I really don't know what is the best action here. The current code is perfect for correct WebAssembly files. The best thing would be to tell to the fuzzer that large allocations are valid. Ok I will move this to another patch.

zherczeg · 2025-11-25T11:02:36Z

src/interp/binary-reader-interp.cc

-  CHECK_RESULT(
-      validator_.OnFunction(GetLocation(), Var(sig_index, GetLocation())));
-  FuncType& func_type = module_.func_types[sig_index];
+  Result result =


Fuzzer bug: we need to add something to the list, even if the validation fails. The module_.funcs will be used later.

sbc100

Wow! This is awesome.

There is a lot going on here though. I didn't get time to look through all of it get.

@tlively could you take a quick look over this and see if anything jumps out? Does the general approach look reasonable?

sbc100 · 2025-12-10T21:51:02Z

include/wabt/binary-reader-nop.h

  Result OnStructType(Index index,
                      Index field_count,
-                      TypeMut* fields) override {
+                      TypeMut* fields,


I wonder if we should be using C++ std::array rather then size + ptr in these API ? But I see its a pre-existing thing so no worries for this PR.

include/wabt/binary-reader.h

include/wabt/interp/interp-inl.h

include/wabt/interp/interp.h

include/wabt/ir.h

include/wabt/type-checker.h

zherczeg · 2026-02-04T08:34:53Z

@sbc100 may I ask whether you have time to review this patch?

sbc100

There is a lot of code here. I've not yet had a chance to look at all of it but LGTM so far.

sbc100 · 2026-03-04T22:11:10Z

include/wabt/interp/interp-inl.h

+    : ExternType(ExternKind::Func),
+      params(params),
+      results(results),
+      func_types(nullptr) {}


This line looks like it did not change? Unless I'm missing something? Maybe revert this line?

The line is longer than 80 columns, it is surprising for me that it is not a style error.

Hmm.. i wonder if we should clang-format the codebase as a separate PR?

I have no objection.

sbc100 · 2026-03-04T22:12:58Z

include/wabt/interp/interp.h

+  };
+
+  // To simplify the implementation, FuncType may also represent
+  // Struct and Array types. In the latter case, the mutability


Can you explain a little more? Why would FuncType represent struct or array?

FuncType is a type, which is used in a lot of code in the interpreter. When I worked on this patch, I wanted to avoid adding a lot of new code to the interpreter (the patch is large enough), which primary purpose is just demonstration as far as I understood. This could be reworked later if we want the extra code. An option is just renaming FuncType to something generic.

OK, maybe add a TODO to refactor or rename it?

sbc100 · 2026-03-04T22:23:43Z

src/interp/binary-reader-interp.cc


 Result BinaryReaderInterp::OnFunctionCount(Index count) {
-  module_.funcs.reserve(count);
+  module_.funcs.reserve(std::min(count, 1024u));


Why is this relevant to this change?

Surely this list will need to grow to count elements anyway? Why not allocate it all here?

src/shared-validator.cc

test/roundtrip/rec-groups.txt

zherczeg · 2026-03-05T10:58:59Z

Thank you for the review. I have updated the patch.

zherczeg · 2026-03-06T19:43:04Z

I have updated this patch. If there are follow-up works I should do (e.g changing the param/result types for functions from pointer to vector reference), please open issues for them and assign me. It is easy to forget these things.

This patch supports parsing the new GC types - Abstract types - Recursive types - Composite types The patch also improves type comparison.

zherczeg · 2026-03-13T05:56:13Z

I made some small changes. I hope this patch is in good shape now. I will fix issues related to this patch.

zherczeg force-pushed the gc_core branch 2 times, most recently from 243ec44 to 68fe37e Compare May 23, 2025 15:34

zherczeg commented May 23, 2025

View reviewed changes

zherczeg force-pushed the gc_core branch 9 times, most recently from e0ce7f8 to adcbdf7 Compare May 31, 2025 02:14

zherczeg force-pushed the gc_core branch from adcbdf7 to 3c92952 Compare June 3, 2025 09:34

zherczeg force-pushed the gc_core branch 2 times, most recently from 75bdfe5 to 890a316 Compare June 3, 2025 13:06

zherczeg force-pushed the gc_core branch 3 times, most recently from cc8e21f to 097046d Compare June 4, 2025 12:09

sbc100 mentioned this pull request Jun 5, 2025

wasm-decompile: unexpected opcode 0x12 #2613

Closed

zherczeg marked this pull request as ready for review June 11, 2025 10:11

zherczeg force-pushed the gc_core branch 3 times, most recently from 68c749c to b7fc45c Compare June 20, 2025 08:11

zherczeg force-pushed the gc_core branch 2 times, most recently from 1510410 to f73967c Compare November 14, 2025 20:24

zherczeg force-pushed the gc_core branch 6 times, most recently from 1bc9c5e to 0222374 Compare November 25, 2025 10:35

zherczeg commented Nov 25, 2025

View reviewed changes

sbc100 reviewed Dec 10, 2025

View reviewed changes

zherczeg force-pushed the gc_core branch 4 times, most recently from 031fa07 to 9eeb962 Compare December 15, 2025 07:59

zherczeg force-pushed the gc_core branch from 9eeb962 to 83b6311 Compare February 4, 2026 08:10

sbc100 reviewed Mar 4, 2026

View reviewed changes

zherczeg force-pushed the gc_core branch 2 times, most recently from 8325b85 to 475c983 Compare March 5, 2026 10:42

zherczeg force-pushed the gc_core branch from 475c983 to 7992f56 Compare March 5, 2026 17:39

zherczeg mentioned this pull request Mar 6, 2026

Prevent fuzzer allocation errors #2713

Merged

zherczeg force-pushed the gc_core branch from 7992f56 to a5ed65e Compare March 6, 2026 19:35

zherczeg force-pushed the gc_core branch from a5ed65e to e1d4797 Compare March 12, 2026 19:06

Implement Garbage Collector type system

51a42dc

This patch supports parsing the new GC types - Abstract types - Recursive types - Composite types The patch also improves type comparison.

zherczeg force-pushed the gc_core branch from e1d4797 to 51a42dc Compare March 13, 2026 05:24

Conversation

zherczeg commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zherczeg commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zherczeg commented Jun 3, 2025

Uh oh!

rossberg commented Jun 3, 2025

Uh oh!

zherczeg commented Jun 3, 2025

Uh oh!

zherczeg commented Jun 5, 2025

Uh oh!

sbc100 commented Jun 5, 2025

Uh oh!

zherczeg commented Jun 5, 2025

Uh oh!

zherczeg commented Nov 14, 2025

Uh oh!

zherczeg commented Nov 15, 2025

Uh oh!

zherczeg left a comment

Choose a reason for hiding this comment

Uh oh!

zherczeg Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbc100 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zherczeg commented Feb 4, 2026

Uh oh!

sbc100 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

zherczeg commented May 23, 2025 •

edited

Loading

zherczeg commented Jun 3, 2025 •

edited

Loading

zherczeg Nov 25, 2025 •

edited

Loading