The core of the adaptation is to map the REU hardware access to pointer access in the PSP's main RAM.
The PSP's MIPS CPU is little-endian. The original C64 code that writes binary data (short and float) also writes in little-endian format (the least significant byte first).
The TransformerWeights64 struct uses typedef uint32_t REUPtr.
An REUPtr is an absolute 32-bit address in the Commodore 64's REU memory.
In the PSP's context, I load the entire weights.psp file into a single memory block allocated with malloc. Therefore, all fields of the REUPtr type will not be absolute addresses in external memory, but rather float pointers that point to different locations within this large memory block.
I can either keep the REUPtr type but change its meaning to be an offset from the beginning of my weights memory block, or just change everything to float*.
This file is the "brain" of the inference. It contains the implementations of the Transformer algorithms, such as matrix multiplication, normalization, and attention. Most of the heavy computational work occurs here.
In my case, I removed the REU_getf and REU_putf calls and replaced them with direct memory access on the PSP.
We don't need it. We will use <math.h>.
generate is the main loop that produces the text, token by token.
It orchestrates the text generation process along with sampler64.c (or sampler.cpp).