Hypothesis: Even though tool calling is not yet supported by llama vision, we can still utilise the structured output: ```python class FunctionDetails(BaseModel): function_name: Enum['click_x_y', ...] ``` MLX-VLM to improve on the speed on mac.