Description
There's currently two primary reasons that a wasm module has bits and pieces of it which end up being "double parsed":
-
First is that the
BinaryReader
has someskip_*
methods which are used. These methods are typically used to conform to the iterator protocol of Rust. For example when looking atElementSectionReader
when you callread
it acts as an iterator, repositioning at the next element. The processing of the first element, however, happens by the consumer, which must happen afterwards. This means that theread
method must skip to the next element for the next call to `read. Affected locations are:ElementSectionReader::read
DataSectionReader::read
GlobalSectionReader::read
FunctionBody::get_operators_reader
FunctionLocalReader::read
(and probably more in this file)InstanceSectionReader::read
-
Secondly the API design of the
Validator
type is such that it will always parse "header" sections, and then consuming applications (like wasmtime) are likely to then re-parse content of the section again. For exampleValidator::import_section
will parse the import section, but then wasmtime also will iterate over the import section, re-parsing everything.
In general this isn't a massive concern because the header sections are likely all dwarfed in size by the code section so parsing is quite fast. Nonetheless we should strive to parse everything in a wasm file precisely once. I think to fix this we'll need two features, one for each problem above:
-
For the first issue I think we'll want to move towards a more advancing-style API rather than an iterator-based API. For example we'd have a dedicated type for reading the element section, and you'd say "read the header" followed by "read the elements". We might be able to use
Drop
and clever trickery to skip over data that wasn't explicitly read, or we could simply panic if methods aren't called in the right order. The downside of this is that consumers are likely going to get a little more complicated, but this may be fixable with clever strategies around APIs. I'm not sure how this would exactly look like. -
For the second issue we'll want to add more APIs to the validator. For example instead of taking the import section as a whole we'd probably want to add something like "the import section is starting with this many items" which gives you a "sub-validator" which is used to validate each import after parsing. What I'm roughly imagining is that the application does all the parsing and then just after parsing feeds in everything to the validator. Another possible alternative is a "validating parser" which automatically feeds parsed values into the validator before handing them to the application. I'm not sure if this alternative is possible with "parse everything precisely once", however, since for example the element section ideally shouldn't be parsed twice, just once.