I am thinking of making memset faster using DMA. Assuming I have already implemented memcpy, then my memset is:
void memset(void *s, uint8_t c, uint32_t n) {
(*(uint8_t *)s) = c; // initialize the first byte
memcpy(s + 1, s, n - 1); // copy to the remaining bytes
}
However, this does not work well in the simulator.
So the question is, does the DMA has a defined behaviour when the source address is overlapping with the destination address? Is it possible to implement memset in a similar way? The docs says "Once intiated the DMA channel will transfer up to 32-bits per CPU cycle until the transfer has been completed.", so will this work if I initialize 4 bytes instead of only 1 byte?