Implement parallel ahc-ld
/ahc-link
#621
Description
Is your feature request related to a problem? Please describe.
Recent performance improvements has resulted in over 50% improvement in wall clock time when linking large programs. For further improvements, we should introduce parallelism in ahc-ld
and ahc-link
. There are multiple chances of parallelism which is explained in the next section.
Describe the solution you'd like
Chances of parallelism
- When loading archives and object files in
ahc-ld
, we can parallelize the deserialization of each object file. All object files are converted toByteString
s first, either via direct reading orArchiveEntry
, then the deserialization can be performed in parallel. - After the gc-sections pass is run, the shrinked
AsteriusModule
should be fully evaluated, and this can be done in parallel as well. - In the
binaryen
backend, we can parallelize the marshaling of different data segments and functions.binaryen
will transparently switch to a new allocator when it notices it's allocating an IR node on a different thread, so we should ensure each Haskell worker thread is pinned usingforkOn
.
Method of parallelism
We cannot introduce additional dependencies like parallel
, monad-par
or scheduler
here, since we need to strictly control our dependency surface. So we need to roll our minimal parallelism framework first.
The need for nested parallelism can be avoided for our use cases. A simple parallel loop should be sufficient:
parallelFor :: Monoid r => Int -> [a] -> (a -> IO r) -> IO r
The first argument is the worker thread pool capacity, which should be equivalent to CPU core number.
In addition, we should implement a link-time option for ahc-ld
/ahc-link
to allow overriding the worker thread pool size; specifying it to 1
should fallback to sequential code to avoid threading overhead.