Skip to content

Conversation

@steffi4321
Copy link

I think this fixes Issue #110 .

In one commit of mistral-common, they moved the InstructTokenizerBase to another file (see commit mistral-ai/mistral-common@2c9f1762f8824e5d821840414ecb1b9e5267ffb3).
This required a fix in the import of the InstructTokenizerBase in two files

  1. In File "mistral-finetune/finetune/data/dataset.py", line 15, change
    from mistral_common.tokens.tokenizers.sentencepiece import InstructTokenizerBase to
    from mistral_common.tokens.tokenizers.instruct import InstructTokenizerBase
  2. In File "mistral-finetune/finetune/data/tokenize.py", line 25, change
    from mistral_common.tokens.tokenizers.sentencepiece import InstructTokenizerBase to
    from mistral_common.tokens.tokenizers.instruct import InstructTokenizerBase

Also, afterward, I received another error concerning the updated mistral-common, which I fixed by editing
3. In File "mistral-finetune/finetune/data/tokenize.py", line 180, change validator.validate_messages(messages) to validator.validate_messages(messages, False)
4. In File "mistral-finetune/finetune/data/tokenize.py", line 330, change curr_tokens = instruct_tokenizer.encode_assistant_message(message, is_before_last_user_message=False) to curr_tokens = instruct_tokenizer.encode_assistant_message(message, is_before_last_user_message=False, continue_message=False)

With fixed 3 and 4, I compared the old code of mistral-common to the new code, and I am confident that this is the intended behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants