Which will be better? with indices or with hiddens?

Hi, Is there any experiments about LLM training speech input? there are two kind of inputs: the indices of codebook in codec, as a singel integer value, or the indexed cluster center of codebook as a vector.  Is there any study to say which one can better fit the AutoRegressive LLM model training?