Reduce zero-init overhead

To support destructible and sinkable types, in particular atomic refcounted types, tasks must zero-init their data buffer.
This is introduced in #144 to properly support the refcounted FlowEvent.

However there is a significant 17% overhead on very short running tasks like Fibonacci(40)

> Note: significant is relative, fibonacci spawns 2^40 tasks which are in the trillions and each task is simpler than zero initialization

The change: https://github.com/mratsim/weave/pull/144/files#diff-c5d52e34ee454756d2c729faec306b62L113

```Nim
proc newTaskFromCache*(): Task =
  result = workerContext.taskCache.pop()
  result = workerContext.taskCache.pop0()
  if result.isNil:
  if result.isNil:
    result = myMemPool().borrow(deref(Task))
    result = myMemPool().borrow0(deref(Task))
  # Zeroing is expensive, it's 96 bytes
  # The task must be fully zero-ed including the data buffer

  # otherwise datatypes that use custom destructors
  # result.fn = nil # Always overwritten
  # and that rely on "myPointer.isNil" to return early
  # result.parent = nil # Always overwritten
  # may read recycled garbage data.
  # result.scopedBarrier = nil # Always overwritten
  # "FlowEvent" is such an example
  result.prev = nil

  result.next = nil
  # TODO: The perf cost to the following is 17% as measured on fib(40)
  result.start = 0

  result.cur = 0
  # # Zeroing is expensive, it's 96 bytes
  result.stop = 0
  # # result.fn = nil # Always overwritten
  result.stride = 0
  # # result.parent = nil # Always overwritten
  result.futures = nil
  # # result.scopedBarrier = nil # Always overwritten
  result.isLoop = false
  # result.prev = nil
  result.hasFuture = false
  # result.next = nil
  # result.start = 0
  # result.cur = 0
  # result.stop = 0
  # result.stride = 0
  # result.futures = nil
  # result.isLoop = false
  # result.hasFuture = false
```

The simple optimization would be to only zero init the part of the buffer that will be overwritten.
An alternative would be to zero init the buffer only for non-trivial types as detected by `supportsCopyMem`.
And a third possiblity would be to do both.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce zero-init overhead #145

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reduce zero-init overhead #145

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions