Before: sprite 3 uses the `wait` block to wait for the other two sprites
to finish. After the `wait` blocks, sprite 3 sends "end" to the test. If
the sprites take too long to finish, they might not be done before the
"end" message.
After: sprite 3 sets a variable called `finished` to zero on startup,
and increments it every time it hears a new message. Each sprite sends
that new message once it's done. When sprite 3 notices that `finished >=
2`, it sends the "end" message.
Part of the motivation for this test is to verify compatibility with
Scratch 2.0 / SB2 behavior, so we shouldn't change the project to an SB3
file without very good reason.
Sometimes load causes the VM to run more slowly, especially with
parallel tests. This change allows the wait block a little extra wiggle
room to account for that. Ending early is still tested with a fairly
strict threshold.
Newer versions of `tap` run more asynchronously, so sometimes using `process.nextTick(process.exit)`
to end a test would prevent the test from completing correctly. Removing all instances of
`process.nextTick(process.exit)` put tests into three categories:
* the test still worked correctly -- no fixup needed.
* the test would hang because the VM's `_steppingInterval` was keeping
Node alive. These tests call a new `quit()` method which ends the
stepping interval.
* the `load-extensions` test needed special attention because the "Video
Sensing" extension starts its own loop using `setTimeout`. I added a
`_stopLoop()` method on the extension and directly call that from the
test. I'm not completely happy with this solution but anything more
general would likely require a change to the extension spec, so I'm
leaving that as a followup task.
As part of simplifying the CI setup I plan to not explicitly specify
`--jobs=4` to `tap` going forward. Upgrading to a newer version of `tap`
means that it will automatically parallelize jobs according to available
CPU count, which should be better anyway. Only one of our tests was
incompatible with newer versions of `tap`, so this commit includes a
compatibility fix there.
Also, by default newer versions of `tap` calculate coverage and consider
the test run to fail if below the configured thresholds. The default is
100% coverage and we're not there, so I adjusted the thresholds to match
where we're at for now. We can ratchet those up over time.