Skip to content

Translating thread and wait to js and test harness for multi-threaded programs #229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 23, 2025

Conversation

zilinc
Copy link

@zilinc zilinc commented Mar 7, 2025

This PR fixes #218 and supersedes #219.

It includes changes in the following aspects:

  1. It changes the JS test harness (test/harness/testharness*) to newer versions from the upstream WPT repo. This is required because it fixes several problems around fetch_tests_from_workers and tests getting stuck (which also affects the upstream WebAssembly/spec repo).
  2. It fixes the JS test harness to support Either expected results.
  3. It implements the generation of the thread and wait constructs in wast scripts.
  4. It allows the test harness (test/harness/async_*) to work with multi-threaded test cases. It includes the JS file names under test in the test reporting, because otherwise the test harness will complain about duplicate test names coming from different threads.
  5. It also fixes several race conditions in the JS test harness that are not only present here, but also in the upstream WebAssembly/spec repo (main branch). For instance, in the assert_unlinkable function, it does not wait for the preceding updates to the registry to happen. It will non-deterministically produce wrong test results (false negatives and false positives) depending on the timing and order of the tests.
  6. Updates to the build scripts (test/build.py) with additional flags. (In the upstream WebAssembly/spec repo, it generates some JS file in the wrong paths, resulting in non-existent files when running the testsuite; but this behaviours is not present in this branch of this repo.)
  7. With all the fixes, there're still several failed tests in the testsuite due to the differences between production web browsers and our interpreter. Chrome and Firefox give the same results. The failed tests are:
test file # failed tests
import.wast 6
global.wast 4
elem.wast 4
binary.wast 16
data.wast 4
memory.wast 4
threads/atomic.wast 2

@conrad-watt conrad-watt merged commit 4e2a1bb into WebAssembly:upstream-rebuild Jun 23, 2025
@bvisness
Copy link

bvisness commented Jun 23, 2025

Several of these new tests are very dubious, and most of them seem to be testing implementation details of the interpreter:

  • internal_names.fail.wast: Seems to be asserting something about internal names used by the interpreter. Does not fail as expected on my machine.
  • module_shadow.fail.wast: Seems to assert that the interpreter rejects duplicate module names. Does not fail as expected on my machine.
  • threads/duplicate_thread_name.fail.wast: Seems to just be asserting that the interpreter rejects duplicate thread names. Does not fail as expected on my machine.
  • threads/infinite_loop.fail.wast: Seems to be testing a race condition between threads. Does not always fail as expected on my machine.
  • threads/thread.wast: Contains spurious assert_unlinkable tests that are covered (or should be covered) by threads/unlinkable.wast. Everything below (wait $T2) seems unnecessary.
  • threads/unlinkable.wast: Contains commented-out threads and wait directives for seemingly no reason.

I particularly want to stress that we should not be testing implementation details of the interpreter and test system. For example, our scripts convert (module $M1 ...) to JS like let $M1 = instantiate(...), meaning that module_shadow.fail.wast triggers SyntaxError: redeclaration of let $M1. This is clearly worthless: there's no reason for test authors to reuse the same name, and it's never been a problem up until this point. So not only are these tests not passing in the interpreter on my machine, but they are worthless for engine implementers and need to be suppressed on our end.

One final comment: many of these issues would probably have been caught by CI, but this repo seemingly does not have GitHub Actions configured like the rest of the proposal repos. It's not worth our time to try and stay up to date with spec tests from this repo if we can't be confident that they pass in the reference interpreter.

@conrad-watt
Copy link
Collaborator

@bvisness I appreciate your engagement on this.

I view these tests as documenting corner-cases of the .wast format rather than details of the interpreter. My understanding is that the .fail mixfix is not indicating that the tests should fail, but rather that they currently don't pass as desired in some interpreter backends (e.g. the JS harness) due to imperfect handling of the .wast. When you are testing whether these tests fail on your machine, are you running them directly in the interpreter, or are you using the async_harness.js backend?

So not only are these tests not passing in the interpreter on my machine, but they are worthless for engine implementers and need to be suppressed on our end.

I'm happy to work with you to come up with better conventions for organising these tests that facilitate your use case, with the caveat that at a high level there should be an expectation that the test suite may contain some tests that aren't directly applicable to all implementers/consumers.

It's late right now in Singapore, but tomorrow I can move these .fail tests into a separate folder so that exclusion is more intuitive, and tidy up threads/thread.wast and threads/unlinkable.wast while doing this. Would these changes address your immediate concerns?

@bvisness
Copy link

bvisness commented Jun 23, 2025

I think the approach that makes the most sense to me is to extract any meta-tests about WAST and the reference interpreter to a separate folder outside the test directory. It makes sense to have some meta-tests to catch ill-formed spec tests or interpreter bugs, but since they will have zero impact on test consumers (e.g. engines), I think they ought to be kept out of the test folder. I think that is in line with what you have suggested and it would indeed address my immediate concerns.

I also think it would make the most sense to do that in a PR to the main spec repo instead of in a PR in the threads repo, although that wouldn't affect us either way.

As for what harness I was using, I was just running the main run.py script in the core folder.

@tlively
Copy link
Member

tlively commented Jun 23, 2025

I think the approach that makes the most sense to me is to extract any meta-tests about WAST and the reference interpreter to a separate folder outside the test directory

I agree that separating out tests that implementers are not expected to care about makes sense, but I would prefer to move them to a separate subdirectory rather than out of the test directory entirely. Either way, this would be a break from established precedent, so it would be good to let the CG know what the plan is.

@rossberg
Copy link
Member

@bvisness:

most of them seem to be testing implementation details of the interpreter:

In principle, all consumers of .wast could implement the meta-level correctness checks (which are part of the format's semantics more than an implementation detail). But it's probably not worth enforcing that, especially if apparently not even the reference interpreter itself does it. :)

I'd just drop those couple of tests instead of building more infrastructure around them.

I believe that the linking test in thread.wast specifically verifies that a .wast-level thread has a fresh modules registry, as part of the basic semantics of the thread construct. Seems legit to me, and easy to misimplement.

@conrad-watt:

My understanding is that the .fail mixfix is not indicating that the tests should fail

Well, no, the test runner requires, not allows, such tests to fail. IIRC, .fail. was originally introduced to express negative tests, but has become mostly obsolete with wast features like assert_malformed and especially module quote, which allowed to internalise those. I think we haven't used it in ages, and it may not be worth restarting it.

@conrad-watt
Copy link
Collaborator

conrad-watt commented Jun 24, 2025

Well, no, the test runner requires, not allows, such tests to fail. IIRC, .fail. was originally introduced to express negative tests

Ok, this is my misunderstanding of the testing infrastructure then, apologies.

I've just removed the offending tests for now (#232), and we can have a longer-term conversation in a separate issue about whether/how to do edge-case testing of the .wast format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants