Skip to content

Refactor wasmparser to avoid "double parsing" #188

Open
@alexcrichton

Description

@alexcrichton

There's currently two primary reasons that a wasm module has bits and pieces of it which end up being "double parsed":

  • First is that the BinaryReader has some skip_* methods which are used. These methods are typically used to conform to the iterator protocol of Rust. For example when looking at ElementSectionReader when you call read it acts as an iterator, repositioning at the next element. The processing of the first element, however, happens by the consumer, which must happen afterwards. This means that the read method must skip to the next element for the next call to `read. Affected locations are:

    • ElementSectionReader::read
    • DataSectionReader::read
    • GlobalSectionReader::read
    • FunctionBody::get_operators_reader
    • FunctionLocalReader::read (and probably more in this file)
    • InstanceSectionReader::read
  • Secondly the API design of the Validator type is such that it will always parse "header" sections, and then consuming applications (like wasmtime) are likely to then re-parse content of the section again. For example Validator::import_section will parse the import section, but then wasmtime also will iterate over the import section, re-parsing everything.

In general this isn't a massive concern because the header sections are likely all dwarfed in size by the code section so parsing is quite fast. Nonetheless we should strive to parse everything in a wasm file precisely once. I think to fix this we'll need two features, one for each problem above:

  1. For the first issue I think we'll want to move towards a more advancing-style API rather than an iterator-based API. For example we'd have a dedicated type for reading the element section, and you'd say "read the header" followed by "read the elements". We might be able to use Drop and clever trickery to skip over data that wasn't explicitly read, or we could simply panic if methods aren't called in the right order. The downside of this is that consumers are likely going to get a little more complicated, but this may be fixable with clever strategies around APIs. I'm not sure how this would exactly look like.

  2. For the second issue we'll want to add more APIs to the validator. For example instead of taking the import section as a whole we'd probably want to add something like "the import section is starting with this many items" which gives you a "sub-validator" which is used to validate each import after parsing. What I'm roughly imagining is that the application does all the parsing and then just after parsing feeds in everything to the validator. Another possible alternative is a "validating parser" which automatically feeds parsed values into the validator before handing them to the application. I'm not sure if this alternative is possible with "parse everything precisely once", however, since for example the element section ideally shouldn't be parsed twice, just once.

Metadata

Metadata

Assignees

No one assigned

    Labels

    wasmparserRelated to the binary format of WebAssembly (wasmparser)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions