Skip to content

Code generated by wasmtime doesn't cache-align loops #4883

Open
@koute

Description

@koute

The problem

Currently wasmtime/cranelift (unlike e.g. LLVM which doesn't have this problem AFAIK) doesn't cache-align the loops it generates, leading to potentially huge performance regressions if a hot loop ends up accidentally spanning over multiple cache lines.

Background

Recently we were updating from wasmtime 0.38 to 0.40 and we saw a peculiar performance regression when doing so. One of our benchmarks took almost 2x the time to run, with a lot of them taking around ~45% more time. A huge regression. Ultimately it ended up being unrelated to the 0.38 -> 0.40 upgrade. We tracked the problem down to memset within the WASM (we're currently not using the bulk memory ops extension) suddenly taking a lot more time to run for no apparent reason. Depending on which exact address wasmtime decided to generate the code for memset at (which is essentially random, although consistent for the same code with the same flags in the same environment) the benchmarks were either slow, or fast, and it all boiled down to whether the hot loop of the memset spanned multiple cache lines or not.

You can find a detailed analysis of the problem in this comment and this comment of mine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    craneliftIssues related to the Cranelift code generatorcranelift:E-easyIssues suitable for newcomers to investigate, including Rust newcomers!cranelift:goal:optimize-speedFocus area: the speed of the code produced by Cranelift.enhancementperformance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions