Create a new utility under the cmd folder which gets as an input parameter an Hugging Face dataset (we need to define which formats will be supported) and a model name to be used for the tokenization. The utility should generate a SQLite file mapping each prompt hash value to the appropriate response.
Datasets created by this utility will be placed in the llm-d organization in HF - https://huggingface.co/llm-d/datasets