Open
Description
Python -VV
Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0]
Pip Freeze
kanishk@anarch[~/mistral] > pip freeze
annotated-types==0.7.0
appdirs==1.4.4
asttokens==2.4.1
attrs==24.2.0
certifi==2024.8.30
charset-normalizer==3.3.2
cityscapesScripts==2.2.2
coloredlogs==15.0.1
contourpy==1.2.0
cycler==0.12.1
decorator==5.1.1
executing==2.0.1
filelock==3.13.1
fonttools==4.49.0
fsspec==2024.2.0
graphviz==0.20.3
huggingface-hub==0.24.6
humanfriendly==10.0
idna==3.8
ipython==8.22.1
jedi==0.19.1
Jinja2==3.1.3
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
keyboard==0.13.5
kiwisolver==1.4.5
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mistral_common==1.4.0
mplcyberpunk==0.7.1
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
opencv-python==4.9.0.80
packaging==23.2
parso==0.8.3
pexpect==4.9.0
pillow==10.4.0
progressbar==2.5
prompt-toolkit==3.0.43
ptyprocess==0.7.0
pure-eval==0.2.2
pydantic==2.9.1
pydantic_core==2.23.3
pygame==2.5.2
Pygments==2.17.2
pyparsing==3.1.2
pyquaternion==0.9.9
python-dateutil==2.9.0.post0
PyYAML==6.0.2
pyzmq==23.2.1
qbstyles==0.1.4
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
rpds-py==0.20.0
sentencepiece==0.2.0
six==1.16.0
stack-data==0.6.3
sympy==1.12
tiktoken==0.7.0
torch==2.2.1
tqdm==4.66.2
traitlets==5.14.1
triton==2.2.0
typing==3.7.4.3
typing_extensions==4.12.2
urllib3==2.2.2
wcwidth==0.2.13
XPlaneApi==0.0.6
xplaneconnect @ file:///home/kanishk/X-Plane%2010/Resources/plugins/XPlaneConnect/XPlaneConnect
zmq==0.0.0
Reproduction Steps
- Run any one of the example code snippets given in the release documentation.
Expected Behavior
The Pixtral model should output some form of visualizable/interactive data, or additional code snippets of how to use the output tokens.
Additional Context
The mistral_common.multimodal
module doesn't seem to have any function to make sense of the data output by the tokenizer, if I didn't overlook anything. I tried the open the output image(s) but they must have some read
function according to the selected open
function below.
TLDR: I have no clue how to use the output image
Suggested Solutions
Suggestions:
- Addition of modules to interact with multimodal data
- WebUI API, like Gradio