Implement Garbage Collector type system#2607
Implement Garbage Collector type system#2607zherczeg wants to merge 1 commit intoWebAssembly:mainfrom
Conversation
243ec44 to
68fe37e
Compare
include/wabt/binary-reader.h
Outdated
| }; | ||
| using TypeMutVector = std::vector<TypeMut>; | ||
|
|
||
| // Garbage Collector specific type information |
There was a problem hiding this comment.
This is a core part of the patch. It contains the (sub ...) part of the type. It is declared as a structure, because it is not mandatory.
include/wabt/binary-reader.h
Outdated
| virtual Result OnTypeCount(Index count) = 0; | ||
| virtual Result OnRecursiveRange(Index start_index, Index type_count) = 0; | ||
| virtual Result OnFuncType(Index index, | ||
| GCTypeExtension* gc_ext, |
There was a problem hiding this comment.
This structure is passed as a second argument, because it is a header, but it could go to the last argument since it is an extra (and optional) information. Which one you prefer?
| struct RecursiveRange { | ||
| Index start_index; | ||
| Index type_count; | ||
| }; |
There was a problem hiding this comment.
This is another core structure to encode (rec ...) constructs. It represents the range.
include/wabt/type-checker.h
Outdated
| std::vector<FuncType> func_types; | ||
| std::vector<StructType> struct_types; | ||
| std::vector<ArrayType> array_types; | ||
| std::vector<RecursiveRange> recursive_ranges; |
There was a problem hiding this comment.
It is stored in an ordered array. It cannot be stored as part of the types, because zero length (rec) range is allowed for whatever reason.
e0ce7f8 to
adcbdf7
Compare
|
I have reworked the type validation system of the patch. Now it is capable of detecting the first type index for all equal types. This first type index is called canonical index. If I have two types ( |
75bdfe5 to
890a316
Compare
|
The |
|
It sounds like you are canonicalising wrt type indices. But type indices are meaningless in any program that consists of more than one module. Type canonicalisation must happen globally, across module boundaries, based on the types' structure. I suspect that is the reason for the link/run-time tests failing. |
|
The link tests are not failing, although the interpreter do a slow type comparison for import/exports. As far as I understand the interpreter here is just a demonstration, so this is probably ok. The runtime tests fail because the operations (such as The global type canonicalisation sounds like a very good idea! A high performance engine should do that! |
cc8e21f to
097046d
Compare
|
@sbc100 there is a fuzzer issue in the code. The code is correct though. https://github.com/WebAssembly/wabt/blob/main/src/interp/binary-reader-interp.cc#L772 https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/fuzzer/FuzzerLoop.cpp What shall I do? |
We don't tend to have time to worry about fixing all the fuzz tests issues, unless they could conceivable show up in real world programs. i.e. we tend to assume trusted and save inputs, since we don't have the resources the harden wabt against other things. Having said that we obviously would be happy to accept fixes for such issues if folks come up with them. |
|
There is nothing to fix here, the code is correct (and not related to this patch). It is simply a limitation of the fuzzer, it assumes too much memory allocation is likely a bug. |
68c749c to
b7fc45c
Compare
|
@sbc100 This is the first patch of the GC support. It triggers a fuzzer fail I described above. Reserving memory is correct for valid wasm files. However, a random value generated by a fuzzer causes a large memory allocation, which is considered as an error by the fuzzer. The fuzzer has an option to raise this allocation limit. What do you suggest? |
1510410 to
f73967c
Compare
|
Note: the issue is visible on the fuzzer backtrace: The OnDataCount gets a huge number, runs the |
1bc9c5e to
0222374
Compare
zherczeg
left a comment
There was a problem hiding this comment.
The patch is finally green. I have fixed some fuzzer bugs, which are unrelated to the patch. These are interpreter related, which main purpose is testing.
src/interp/binary-reader-interp.cc
Outdated
|
|
||
| Result BinaryReaderInterp::OnFunctionCount(Index count) { | ||
| module_.funcs.reserve(count); | ||
| module_.funcs.reserve(std::min(count, 1024u)); |
There was a problem hiding this comment.
Fuzzer fix: avoid allocating too much memory. If count is really > 1024, the array will still grow, just it happens later. However, some elements at the end of the buffer might not be used. I don't know what should we do here. There are similar cases below.
There was a problem hiding this comment.
Why is this relevant to this change?
Surely this list will need to grow to count elements anyway? Why not allocate it all here?
There was a problem hiding this comment.
Without this change, the fuzzer CI fails. The random byte sequence contains a huge "count" value, and when the reserve is executed, the fuzzer reports it as a "too big allocation". I tried to find a configuration parameter for the fuzzer, but I could not. I would also prefer a fuzzer change rather than this code change. At least it is in the interpreter, and not in the main code.
There was a problem hiding this comment.
But that issues seems unrelated to adding the GC type system no? Perhaps this could be split into a separate PR?
I'm not so sure failing with too big allocation is such a bad outcome from such crazy inputs anyway.
If we do use some hardcode value here it might make sense for it be the implementation limit on the number of function.
There was a problem hiding this comment.
I really don't know what is the best action here. The current code is perfect for correct WebAssembly files. The best thing would be to tell to the fuzzer that large allocations are valid. Ok I will move this to another patch.
| CHECK_RESULT( | ||
| validator_.OnFunction(GetLocation(), Var(sig_index, GetLocation()))); | ||
| FuncType& func_type = module_.func_types[sig_index]; | ||
| Result result = |
There was a problem hiding this comment.
Fuzzer bug: we need to add something to the list, even if the validation fails. The module_.funcs will be used later.
| Result OnStructType(Index index, | ||
| Index field_count, | ||
| TypeMut* fields) override { | ||
| TypeMut* fields, |
There was a problem hiding this comment.
I wonder if we should be using C++ std::array rather then size + ptr in these API ? But I see its a pre-existing thing so no worries for this PR.
031fa07 to
9eeb962
Compare
|
@sbc100 may I ask whether you have time to review this patch? |
sbc100
left a comment
There was a problem hiding this comment.
There is a lot of code here. I've not yet had a chance to look at all of it but LGTM so far.
include/wabt/interp/interp-inl.h
Outdated
| : ExternType(ExternKind::Func), | ||
| params(params), | ||
| results(results), | ||
| func_types(nullptr) {} |
There was a problem hiding this comment.
This line looks like it did not change? Unless I'm missing something? Maybe revert this line?
There was a problem hiding this comment.
The line is longer than 80 columns, it is surprising for me that it is not a style error.
There was a problem hiding this comment.
Hmm.. i wonder if we should clang-format the codebase as a separate PR?
There was a problem hiding this comment.
I have no objection.
include/wabt/interp/interp.h
Outdated
| }; | ||
|
|
||
| // To simplify the implementation, FuncType may also represent | ||
| // Struct and Array types. In the latter case, the mutability |
There was a problem hiding this comment.
Can you explain a little more? Why would FuncType represent struct or array?
There was a problem hiding this comment.
FuncType is a type, which is used in a lot of code in the interpreter. When I worked on this patch, I wanted to avoid adding a lot of new code to the interpreter (the patch is large enough), which primary purpose is just demonstration as far as I understood. This could be reworked later if we want the extra code. An option is just renaming FuncType to something generic.
There was a problem hiding this comment.
OK, maybe add a TODO to refactor or rename it?
src/interp/binary-reader-interp.cc
Outdated
|
|
||
| Result BinaryReaderInterp::OnFunctionCount(Index count) { | ||
| module_.funcs.reserve(count); | ||
| module_.funcs.reserve(std::min(count, 1024u)); |
There was a problem hiding this comment.
Why is this relevant to this change?
Surely this list will need to grow to count elements anyway? Why not allocate it all here?
8325b85 to
475c983
Compare
|
Thank you for the review. I have updated the patch. |
|
I have updated this patch. If there are follow-up works I should do (e.g changing the param/result types for functions from pointer to vector reference), please open issues for them and assign me. It is easy to forget these things. |
This patch supports parsing the new GC types - Abstract types - Recursive types - Composite types The patch also improves type comparison.
|
I made some small changes. I hope this patch is in good shape now. I will fix issues related to this patch. |
This patch supports parsing the new GC types
The patch also improves type comparison.