You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Rename sync(Weave) to syncRoot(Weave) to make clear that it is not composable
* Introduce fine-grained awaitable for-loop
* fix comment in capture section
* make parallel reduction compile standalone
* Add yet-to-be-proper sync on awaitable loops
* Sometimes the task you try to split is not the current task anymore
* update changelog
* update histogram and logsumexp to use the awaitable loops
* Fighting your way through recursive imports, and static early symbol resolution
* Allox sync on not iterated loop (i.e. iterations = 0)
* Well, it seems like awaitable loops are not enough to describe the data dependencies of GEMM :sad_face:
* fix lazyFLowvar symbol resolution
* Fix LazyFlowVar with reduction and awaitable loops
* mention that awaitable might still change [skip ci]
Copy file name to clipboardExpand all lines: README.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -135,7 +135,7 @@ exit(Weave)
135
135
-`init(Weave)`, `exit(Weave)` to start and stop the runtime. Forgetting this will give you nil pointer exceptions on spawn.
136
136
-`spawn fnCall(args)` which spawns a function that may run on another thread and gives you an awaitable Flowvar handle.
137
137
-`sync(Flowvar)` will await a Flowvar and block until you receive a result.
138
-
-`sync(Weave)` is a global barrier for the main thread on the main task. Allowing nestable barriers for any thread is work-in-progress.
138
+
-`syncRoot(Weave)` is a global barrier for the main thread on the main task.
139
139
-`parallelFor`, `parallelForStrided`, `parallelForStaged`, `parallelForStagedStrided` are described above and in the experimental section.
140
140
-`loadBalance(Weave)` gives the runtime the opportunity to distribute work. Insert this within long computation as due to Weave design, it's busy workers hat are also in charge of load balancing. This is done automatically when using `parallelFor`.
141
141
-`isSpawned` allows you to build speculative algorithm where a thread is spawned only if certain conditions are valid. See the `nqueens` benchmark for an example.
Copy file name to clipboardExpand all lines: changelog.md
+47
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,52 @@
1
1
# Changelog
2
2
3
+
### v0.3.0 - unreleased
4
+
5
+
`sync(Weave)` has been renamed `syncRoot(Weave)` to highlight that it is only valid on the root task in the main thread. In particular, a procedure that uses syncRoot should not be called be in a multithreaded section. This is a breaking change. In the future such changes will have a deprecation path but the library is only 2 weeks old at the moment.
now support an "awaitable" statement to allow fine-grain sync.
9
+
10
+
Fine-grained data-dependencies are under research (for example launch a task when the first 50 iterations are done out of a 100 iteration loops), "awaitable" may change
11
+
to have an unified syntax for delayed tasks depending on a task, a whole loop or a subset of it.
12
+
If possible, it is recommended to use "awaitable" instead of `syncRoot()` to allow composable parallelism, `syncRoot()` can only be called in a serial section of the code.
13
+
14
+
Weave can now be compiled with Microsoft Visual Studio in C++ mode.
15
+
16
+
"LastVictim" and "LastThief" WV_Target policy has been added.
17
+
The default is still "Random", pass "-d:WV_Target=LastVictim" to explore performance on your workload
18
+
19
+
"StealEarly" has been implemented, the default is not to steal early,
20
+
pass "-d:WV_StealEarly=2" for example to allow workers to initiate a steal request
21
+
when 2 tasks or less are left in their queue.
22
+
23
+
#### Performance
24
+
25
+
Weave has been thoroughly tested and tuned on state-of-the-art matrix multiplication implementation
26
+
against competing pure Assembly, hand-tuned BLAS implementations to reach High-performance Computing scalability standards.
27
+
28
+
3 cases can trigger loop splitting in Weave:
29
+
- loadBalance(Weave),
30
+
- sharing work to idle child threads
31
+
- incoming thieves
32
+
The first 2 were not working properly and resulted in pathological performance cases.
33
+
This has been fixed.
34
+
35
+
Fixed strided loop iteration rounding
36
+
Fixed compilation with metrics
37
+
38
+
Executing a loop now counts as a single task for the adaptative steal policy.
39
+
This prevents short loops from hindering steal-half strategy as it depends
40
+
on the number of tasks executed per steal requests interval.
41
+
42
+
#### Internals
43
+
- Weave uses explicit finite state machines in several places.
44
+
- The memory pool now has the same interface has malloc/free, in the past
45
+
freeing a block required passing a threadID as this avoided an expensive getThreadID syscall.
46
+
The new solution uses assembly code to get the address of the current thread thread-local storage
47
+
as an unique threadID.
48
+
- Weave memory subsystem now supports LLVM AddressSanitizer to detect memory bugs.
49
+
Spurious (?) errors from Nim and Weave were not removed and are left as a future task.
0 commit comments