Skip to content

Initialization: GPU->host bandwidth could be halved #114

@poszu

Description

@poszu

Currently, the code transfers 32 bytes of each label from GPU to the host. 32B are needed to search for the POPS VRF nonce because all 32B are compared with difficulty. Later on, only the most significant (big-endian) 16B are preserved in the POS data.

It's possible to optimize the bandwidth and send only the 16B of each label. The missing lower 16B could be generated on the CPU on the rare occasion (1/2^128) when the most significant bytes are not enough to decide.

Given a 16B label and 32B difficulty, there are 3 possible cases to consider:

  • label > difficulty[0:16] -> NOT a nonce
  • label == difficulty[0:16] -> POSSIBLY a nonce - generate lower 16 bytes of label
  • label < difficulty[0:16] -> VALID nonce

The code would need to generate a label on the CPU only if the 16 bytes of the label are equal to the most significant 16B of the difficulty.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions