Skip to content

How to use Pixtral tokens & outputs? #46

Open
@kanishkanarch

Description

@kanishkanarch

Python -VV

Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0]

Pip Freeze

kanishk@anarch[~/mistral] > pip freeze
annotated-types==0.7.0
appdirs==1.4.4
asttokens==2.4.1
attrs==24.2.0
certifi==2024.8.30
charset-normalizer==3.3.2
cityscapesScripts==2.2.2
coloredlogs==15.0.1
contourpy==1.2.0
cycler==0.12.1
decorator==5.1.1
executing==2.0.1
filelock==3.13.1
fonttools==4.49.0
fsspec==2024.2.0
graphviz==0.20.3
huggingface-hub==0.24.6
humanfriendly==10.0
idna==3.8
ipython==8.22.1
jedi==0.19.1
Jinja2==3.1.3
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
keyboard==0.13.5
kiwisolver==1.4.5
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mistral_common==1.4.0
mplcyberpunk==0.7.1
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
opencv-python==4.9.0.80
packaging==23.2
parso==0.8.3
pexpect==4.9.0
pillow==10.4.0
progressbar==2.5
prompt-toolkit==3.0.43
ptyprocess==0.7.0
pure-eval==0.2.2
pydantic==2.9.1
pydantic_core==2.23.3
pygame==2.5.2
Pygments==2.17.2
pyparsing==3.1.2
pyquaternion==0.9.9
python-dateutil==2.9.0.post0
PyYAML==6.0.2
pyzmq==23.2.1
qbstyles==0.1.4
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
rpds-py==0.20.0
sentencepiece==0.2.0
six==1.16.0
stack-data==0.6.3
sympy==1.12
tiktoken==0.7.0
torch==2.2.1
tqdm==4.66.2
traitlets==5.14.1
triton==2.2.0
typing==3.7.4.3
typing_extensions==4.12.2
urllib3==2.2.2
wcwidth==0.2.13
XPlaneApi==0.0.6
xplaneconnect @ file:///home/kanishk/X-Plane%2010/Resources/plugins/XPlaneConnect/XPlaneConnect
zmq==0.0.0

Reproduction Steps

  1. Run any one of the example code snippets given in the release documentation.

Expected Behavior

The Pixtral model should output some form of visualizable/interactive data, or additional code snippets of how to use the output tokens.

Additional Context

The mistral_common.multimodal module doesn't seem to have any function to make sense of the data output by the tokenizer, if I didn't overlook anything. I tried the open the output image(s) but they must have some read function according to the selected open function below.
image

TLDR: I have no clue how to use the output image

image

Suggested Solutions

Suggestions:

  1. Addition of modules to interact with multimodal data
  2. WebUI API, like Gradio

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions