Request for image input support 

I plan to implement the function calling with vision models such as LLaVA and Nous-Hermes-2-Vision-Alpha based on the image, but it seems that the current implementation in the example folder only supports text input. It'd be great to have the image input support in the future version.  Or please let me know if know a workaround to add image input support for this. 
Thank you,