Translating thread and wait to js and test harness for multi-threaded programs #229

zilinc · 2025-03-07T07:31:46Z

This PR fixes #218 and supersedes #219.

It includes changes in the following aspects:

It changes the JS test harness (test/harness/testharness*) to newer versions from the upstream WPT repo. This is required because it fixes several problems around fetch_tests_from_workers and tests getting stuck (which also affects the upstream WebAssembly/spec repo).
It fixes the JS test harness to support Either expected results.
It implements the generation of the thread and wait constructs in wast scripts.
It allows the test harness (test/harness/async_*) to work with multi-threaded test cases. It includes the JS file names under test in the test reporting, because otherwise the test harness will complain about duplicate test names coming from different threads.
It also fixes several race conditions in the JS test harness that are not only present here, but also in the upstream WebAssembly/spec repo (main branch). For instance, in the assert_unlinkable function, it does not wait for the preceding updates to the registry to happen. It will non-deterministically produce wrong test results (false negatives and false positives) depending on the timing and order of the tests.
Updates to the build scripts (test/build.py) with additional flags. (In the upstream WebAssembly/spec repo, it generates some JS file in the wrong paths, resulting in non-existent files when running the testsuite; but this behaviours is not present in this branch of this repo.)
With all the fixes, there're still several failed tests in the testsuite due to the differences between production web browsers and our interpreter. Chrome and Firefox give the same results. The failed tests are:

test file	# failed tests
import.wast	6
global.wast	4
elem.wast	4
binary.wast	16
data.wast	4
memory.wast	4
threads/atomic.wast	2

bvisness · 2025-06-23T17:18:38Z

Several of these new tests are very dubious, and most of them seem to be testing implementation details of the interpreter:

internal_names.fail.wast: Seems to be asserting something about internal names used by the interpreter. Does not fail as expected on my machine.
module_shadow.fail.wast: Seems to assert that the interpreter rejects duplicate module names. Does not fail as expected on my machine.
threads/duplicate_thread_name.fail.wast: Seems to just be asserting that the interpreter rejects duplicate thread names. Does not fail as expected on my machine.
threads/infinite_loop.fail.wast: Seems to be testing a race condition between threads. Does not always fail as expected on my machine.
threads/thread.wast: Contains spurious assert_unlinkable tests that are covered (or should be covered) by threads/unlinkable.wast. Everything below (wait $T2) seems unnecessary.
threads/unlinkable.wast: Contains commented-out threads and wait directives for seemingly no reason.

I particularly want to stress that we should not be testing implementation details of the interpreter and test system. For example, our scripts convert (module $M1 ...) to JS like let $M1 = instantiate(...), meaning that module_shadow.fail.wast triggers SyntaxError: redeclaration of let $M1. This is clearly worthless: there's no reason for test authors to reuse the same name, and it's never been a problem up until this point. So not only are these tests not passing in the interpreter on my machine, but they are worthless for engine implementers and need to be suppressed on our end.

One final comment: many of these issues would probably have been caught by CI, but this repo seemingly does not have GitHub Actions configured like the rest of the proposal repos. It's not worth our time to try and stay up to date with spec tests from this repo if we can't be confident that they pass in the reference interpreter.

conrad-watt · 2025-06-23T18:12:12Z

@bvisness I appreciate your engagement on this.

I view these tests as documenting corner-cases of the .wast format rather than details of the interpreter. My understanding is that the .fail mixfix is not indicating that the tests should fail, but rather that they currently don't pass as desired in some interpreter backends (e.g. the JS harness) due to imperfect handling of the .wast. When you are testing whether these tests fail on your machine, are you running them directly in the interpreter, or are you using the async_harness.js backend?

So not only are these tests not passing in the interpreter on my machine, but they are worthless for engine implementers and need to be suppressed on our end.

I'm happy to work with you to come up with better conventions for organising these tests that facilitate your use case, with the caveat that at a high level there should be an expectation that the test suite may contain some tests that aren't directly applicable to all implementers/consumers.

It's late right now in Singapore, but tomorrow I can move these .fail tests into a separate folder so that exclusion is more intuitive, and tidy up threads/thread.wast and threads/unlinkable.wast while doing this. Would these changes address your immediate concerns?

bvisness · 2025-06-23T19:46:01Z

I think the approach that makes the most sense to me is to extract any meta-tests about WAST and the reference interpreter to a separate folder outside the test directory. It makes sense to have some meta-tests to catch ill-formed spec tests or interpreter bugs, but since they will have zero impact on test consumers (e.g. engines), I think they ought to be kept out of the test folder. I think that is in line with what you have suggested and it would indeed address my immediate concerns.

I also think it would make the most sense to do that in a PR to the main spec repo instead of in a PR in the threads repo, although that wouldn't affect us either way.

As for what harness I was using, I was just running the main run.py script in the core folder.

tlively · 2025-06-23T23:59:17Z

I think the approach that makes the most sense to me is to extract any meta-tests about WAST and the reference interpreter to a separate folder outside the test directory

I agree that separating out tests that implementers are not expected to care about makes sense, but I would prefer to move them to a separate subdirectory rather than out of the test directory entirely. Either way, this would be a break from established precedent, so it would be good to let the CG know what the plan is.

rossberg · 2025-06-24T06:29:23Z

@bvisness:

most of them seem to be testing implementation details of the interpreter:

In principle, all consumers of .wast could implement the meta-level correctness checks (which are part of the format's semantics more than an implementation detail). But it's probably not worth enforcing that, especially if apparently not even the reference interpreter itself does it. :)

I'd just drop those couple of tests instead of building more infrastructure around them.

I believe that the linking test in thread.wast specifically verifies that a .wast-level thread has a fresh modules registry, as part of the basic semantics of the thread construct. Seems legit to me, and easy to misimplement.

@conrad-watt:

My understanding is that the .fail mixfix is not indicating that the tests should fail

Well, no, the test runner requires, not allows, such tests to fail. IIRC, .fail. was originally introduced to express negative tests, but has become mostly obsolete with wast features like assert_malformed and especially module quote, which allowed to internalise those. I think we haven't used it in ages, and it may not be worth restarting it.

conrad-watt · 2025-06-24T07:11:40Z

Well, no, the test runner requires, not allows, such tests to fail. IIRC, .fail. was originally introduced to express negative tests

Ok, this is my misunderstanding of the testing infrastructure then, apologies.

I've just removed the offending tests for now (#232), and we can have a longer-term conversation in a separate issue about whether/how to do edge-case testing of the .wast format.

rossberg and others added 6 commits October 19, 2023 02:07

[interpreter] Fix regsitry to be per-thread

8c6505f

Rebuild either patterns; fix tests

de6320e

[test/js] use newer version of testharness

e88191a

[interpreter] remove bogus unittest in Makefile

85563d9

[interpreter] fix match_result definition in js generation

ca5dbca

[test/js] thread/wait in js generation and tests

ed867c1

conrad-watt approved these changes Jun 23, 2025

View reviewed changes

conrad-watt merged commit 4e2a1bb into WebAssembly:upstream-rebuild Jun 23, 2025

This was referenced Jun 23, 2025

Fix some typos in *.wast tests #217

Closed

Fix bug in test. #228

Closed

[interpreter] Fix registry to be per-thread #219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Translating thread and wait to js and test harness for multi-threaded programs #229

Translating thread and wait to js and test harness for multi-threaded programs #229

Uh oh!

zilinc commented Mar 7, 2025 •

edited

Loading

Uh oh!

bvisness commented Jun 23, 2025 •

edited

Loading

Uh oh!

conrad-watt commented Jun 23, 2025

Uh oh!

bvisness commented Jun 23, 2025 •

edited

Loading

Uh oh!

tlively commented Jun 23, 2025

Uh oh!

rossberg commented Jun 24, 2025

Uh oh!

conrad-watt commented Jun 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Translating thread and wait to js and test harness for multi-threaded programs #229

Translating thread and wait to js and test harness for multi-threaded programs #229

Uh oh!

Conversation

zilinc commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bvisness commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

conrad-watt commented Jun 23, 2025

Uh oh!

bvisness commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlively commented Jun 23, 2025

Uh oh!

rossberg commented Jun 24, 2025

Uh oh!

conrad-watt commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zilinc commented Mar 7, 2025 •

edited

Loading

bvisness commented Jun 23, 2025 •

edited

Loading

bvisness commented Jun 23, 2025 •

edited

Loading

conrad-watt commented Jun 24, 2025 •

edited

Loading