Refactor wasmparser to avoid "double parsing"

There's currently two primary reasons that a wasm module has bits and pieces of it which end up being "double parsed":

* First is that the `BinaryReader` has some `skip_*` methods which are used. These methods are typically used to conform to the iterator protocol of Rust. For example when looking at `ElementSectionReader` when you call `read` it acts as an iterator, repositioning at the next element. The processing of the first element, however, happens by the consumer, which must happen afterwards. This means that the `read` method must skip to the next element for the next call to `read. Affected locations are:
  * `ElementSectionReader::read`
  * `DataSectionReader::read`
  * `GlobalSectionReader::read`
  * `FunctionBody::get_operators_reader`
  * `FunctionLocalReader::read` (and probably more in this file)
  * `InstanceSectionReader::read`

* Secondly the API design of the `Validator` type is such that it will always parse "header" sections, and then consuming applications (like wasmtime) are likely to then re-parse content of the section again. For example `Validator::import_section` will parse  the import section, but then wasmtime *also* will iterate over the import section, re-parsing everything.

In general this isn't a massive concern because the header sections are likely all dwarfed in size by the code section so parsing is quite fast. Nonetheless we should strive to parse everything in a wasm file precisely once. I think to fix this we'll need two features, one for each problem above:

1. For the first issue I think we'll want to move towards a more advancing-style API rather than an iterator-based API. For example we'd have a dedicated type for reading the element section, and you'd say "read the header" followed by "read the elements". We might be able to use `Drop` and clever trickery to skip over data that wasn't explicitly read, or we could simply panic if methods aren't called in the right order. The downside of this is that consumers are likely going to get a little more complicated, but this may be fixable with clever strategies around APIs. I'm not sure how this would exactly look like.

2. For the second issue we'll want to add more APIs to the validator. For example instead of taking the import section as a whole we'd probably want to add something like "the import section is starting with this many items" which gives you a "sub-validator" which is used to validate each import after parsing. What I'm roughly imagining is that the application does all the parsing and then just after parsing feeds in everything to the validator. Another possible alternative is a "validating parser" which automatically feeds parsed values into the validator before handing them to the application. I'm not sure if this alternative is possible with "parse everything precisely once", however, since for example the element section ideally shouldn't be parsed twice, just once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor wasmparser to avoid "double parsing" #188

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor wasmparser to avoid "double parsing" #188

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions