Llama 3.2 vision for the planner via Structured Output and MLX

Hypothesis: Even though tool calling is not yet supported by llama vision, we can still utilise the structured output:

```python
class FunctionDetails(BaseModel):
    function_name: Enum['click_x_y', ...]
```

MLX-VLM to improve on the speed on mac.