Releases: mistralai/mistral-common
v1.8.6: rm Python 3.9, bug fixes.
What's Changed
- Remove deprecated imports in docs. by @juliendenize in #138
- Add normalizer and validator utils by @juliendenize in #140
- Refactor private aggregate messages for InstructRequestNormalizer by @juliendenize in #141
- test: improve unit test for is_opencv_installed by @PrasanaaV in #143
- Optimize spm decode function by @juliendenize in #144
- Add get_one_valid_tokenizer_file by @juliendenize in #142
- Remove Python 3.9 support by @juliendenize in #145
- Correctly pass
revisionandtokento hf_api by @juliendenize in #149 - Fix assertion in test_convert_text_chunk and tool_call by @patrickvonplaten in #152
- Pins GH actions by @arcanis in #160
- Add usage restrictions regarding third-party rights. by @juliendenize in #161
- Improve tekken logging message for vocabulary by @juliendenize in #162
- Set version 1.8.6 by @juliendenize in #151
New Contributors
- @PrasanaaV made their first contribution in #143
- @arcanis made their first contribution in #160
Full Changelog: v1.8.5...v1.8.6
v1.8.5: Patch Release
What's Changed
- Make model field optional in TranscriptionRequest by @juliendenize in #128
- Remove all responses and embedding requests. Add transcription docs. by @juliendenize in #133
- Add chunk file by @juliendenize in #129
- allow message content to be empty string by @mingfang in #135
- Add test empty content for AssistantMessage v7 by @juliendenize in #136
- v1.8.5 by @juliendenize in #137
New Contributors
Full Changelog: v1.8.4...v1.8.5
v1.8.4: optional dependencies and allow random padding on ChatCompletionResponseStreamResponse
What's Changed
- Update experimental.md by @juliendenize in #124
- Make sentencepiece optional and refactor optional imports by @juliendenize in #126
- Improve UX for contributing by @juliendenize in #127
- feat: allow random padding on ChatCompletionResponseStreamResponse by @aac228 in #131
New Contributors
Full Changelog: v1.8.3...v1.8.4
v1.8.3: Add an experimental REST API
What's Changed
- Add a FastAPI app by @juliendenize in #113
We released an experimental REST API leveraging Fast API to handle requests from tokenization, through generation via calls to an engine, to detokenization.
For a detailed documentation see [https://mistralai.github.io/mistral-common/usage/experimental/].
Here is how to launch the server:
pip install mistral-common[server]
mistral_common serve mistralai/Magistral-Small-2507 \
--host 127.0.0.1 --port 8000 \
--engine-url http://127.0.0.1:8080 --engine-backend llama_cpp \
--timeout 60Then you can see the Swagger at: http://localhost:8000.
Full Changelog: v1.8.2...v1.8.3
v1.8.2: Add ThinkChunk
What's Changed
- Add think chunk by @juliendenize in #122
Now you can use TextChunk and ThinkChunk in your SystemMessage or AssistantMessage:
from mistral_common.protocol.instruct.messages import SystemMessage, TextChunk, ThinkChunk
system_message = SystemMessage(
content = [
TextChunk(text="First draft your thinking process (inner monologue) until you arrive at a response. Format your response using Markdown, and use LaTeX for any mathematical equations. Write both your thoughts and the response in the same language as the input.\n\nYour thinking process must follow the template below:"),
ThinkChunk(
thinking="Your thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate the response. Use the same language as the input.",
closed=True,
),
TextChunk(text="Here, provide a self-contained response.")
],
)Full Changelog: v1.8.1...v1.8.2
v1.8.1: Add AudioURLChunk
What's Changed
- Add AudioURLChunk by @juliendenize in #120
Now you can use http(s) URLs, file paths and base64 string (without specifying format) in your content chunks thanks to AudioURLChunk !
from mistral_common.protocol.instruct.messages import AudioURL, AudioURLChunk, TextChunk, UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
repo_id = "mistralai/Voxtral-Mini-3B-2507"
tokenizer = MistralTokenizer.from_hf_hub(repo_id)
text_chunk = TextChunk(
text="Wat do you think about this audio?"
)
user_msg = UserMessage(
content=[
AudioURLChunk(audio_url=AudioURL(url="https://freewavesamples.com/files/Ouch-6.wav")),
text_chunk,
]
)
request = ChatCompletionRequest(messages=[user_msg])
tokenized = tokenizer.encode_chat_completion(request)
# pass tokenized.tokens to your favorite audio model
print(tokenized.tokens)
print(tokenized.audios)
# print text to visually see tokens
print(tokenized.text)Full Changelog: v1.8.0...v1.8.1
v1.8.0 - Mistral welcomes 📢
What's Changed
- [Audio] Add audio by @patrickvonplaten in #119
Full Changelog: v1.7.0...v1.8.0
Audio chat example:
from mistral_common.protocol.instruct.messages import TextChunk, AudioChunk, UserMessage, AssistantMessage, RawAudio
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.audio import Audio
from huggingface_hub import hf_hub_download
repo_id = "mistralai/voxtral-mini"
tokenizer = MistralTokenizer.from_hf_hub(repo_id)
obama_file = hf_hub_download("patrickvonplaten/audio_samples", "obama.mp3", repo_type="dataset")
bcn_file = hf_hub_download("patrickvonplaten/audio_samples", "bcn_weather.mp3", repo_type="dataset")
def file_to_chunk(file: str) -> AudioChunk:
audio = Audio.from_file(file, strict=False)
return AudioChunk.from_audio(audio)
text_chunk = TextChunk(text="Which speaker do you prefer between the two? Why? How are they different from each other?")
user_msg = UserMessage(content=[file_to_chunk(obama_file), file_to_chunk(bcn_file), text_chunk]).to_openai()
request = ChatCompletionRequest(messages=[user_msg])
tokenized = tokenizer.encode_chat_completion(request)
# pass tokenized.tokens to your favorite audio model
print(tokenized.tokens)
print(tokenized.audios)
# print text to visually see tokens
print(tokenized.text)Audio transcription example:
from mistral_common.protocol.transcription.request import TranscriptionRequest
from mistral_common.protocol.instruct.messages import RawAudio
from mistral_common.audio import Audio
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from huggingface_hub import hf_hub_download
repo_id = "mistralai/voxtral-mini"
tokenizer = MistralTokenizer.from_hf_hub(repo_id)
obama_file = hf_hub_download("patrickvonplaten/audio_samples", "obama.mp3", repo_type="dataset")
audio = Audio.from_file(obama_file, strict=False)
audio = RawAudio.from_audio(audio)
request = TranscriptionRequest(model=repo_id, audio=audio, language="en")
tokenized = tokenizer.encode_transcription(request)
# pass tokenized.tokens to your favorite audio model
print(tokenized.tokens)
print(tokenized.audios)
# print text to visually see tokens
print(tokenized.text)v1.7.0 - v13 instruct tokenizer, rename multi-modal to image
What's Changed
- [Naming] Rename multi-modal to image by @patrickvonplaten in #114
- Add v13 Tokenizer by @juliendenize in #116
- 1.7.0 Release by @patrickvonplaten in #118
Full Changelog: v1.6.3...v1.7.0
v1.6.3 - Improved from_hf_hub, support multiprocessing, ...
What's Changed
- Improve hf hub support by @juliendenize in #95
- Fix the Python badge by @juliendenize in #96
- [Build system] Ensure UV reads more than just py files by @patrickvonplaten in #97
- Update images.md by @juliendenize in #98
- Improve decode and deprecate to_string by @juliendenize in #99
- Fix string formatting for ConnectionError by @gaby in #101
- Fix string formatting for NotImplementedError() by @gaby in #103
- Fix error message instructions in transform_image() by @gaby in #102
- Fix spelling issues across repo by @gaby in #107
- Improve integration with HF by @juliendenize in #104
- Opening tekkenizer file with utf-8 and remove deprecation warning by @juliendenize in #110
- fix: multiprocessing pickle error with tokenizer by @NanoCode012 in #111
New Contributors
- @gaby made their first contribution in #101
- @NanoCode012 made their first contribution in #111
Full Changelog: v1.6.0...v1.6.3
Patch release: v1.6.2
Ensure that pypi version includes tokenizer files.