Hi, Is there any experiments about LLM training speech input? there are two kind of inputs: the indices of codebook in codec, as a singel integer value, or the indexed cluster center of codebook as a vector. Is there any study to say which one can better fit the AutoRegressive LLM model training?