A Streamlit-based chatbot frontend designed to analyze diagrams and flowcharts using advanced vision models. Supports both local Ollama models and Google's Gemini API for flexible deployment. 🚀
VisionLane provides an intuitive interface for analyzing diagrams and flowcharts through a chat-based system. It leverages cutting-edge vision models to interpret visual data, offering seamless integration with both local and cloud-based models. 🖼️
- Image Upload: Supports PNG and JPG formats for diagram analysis. 📤
- Interactive Chat Interface: Clearly displays user and AI roles for smooth interaction. 💬
- Persistent Chat Sessions: Maintains conversation history across sessions. 📜
- Export Functionality: Export chat history as JSON for easy record-keeping. 💾
- Minimalistic UI: Clean design with intuitive sidebar navigation. 🖥️
- Model Flexibility: Supports local (Ollama) and cloud-based (Gemini) vision models. 🌍
- Diagram Parsing: Automatically extracts structured JSON from diagrams. 📊
- Python 3.8 or higher
- Streamlit
- Pillow
- google-generativeai (for Gemini models)
- Ollama (for local models)
Install dependencies using:
pip install -r requirements.txt
- Install Ollama from ollama.ai.
- Pull the LLaMA 3.2 Vision model:
ollama pull llama3.2-vision - Run the model in the background:
ollama run llama3.2-vision
- Obtain an API key from Google AI Studio.
- Enable the "Use Online Models" option in VisionLane.
- Configure your API key via the "Configure API Key" button.
Launch the Streamlit app with:
streamlit run app.py
- Select the model (Ollama or Gemini) using the sidebar checkbox. ✅
- For Gemini, configure your API key as prompted. 🔑
- Upload a flowchart or diagram (PNG/JPG) using the file uploader. 🖼️
- Enter your question about the diagram in the chat input field. ❓
- Review the AI's response in the chat interface. 📝
- Continue the conversation or start a new session using sidebar controls. 🔄
- Export chat history as a JSON file for record-keeping. 💾
- Use the "Parse Diagram" button to generate structured JSON output from diagrams. 📈
The repository includes a Jupyter notebook (model_evaluation.ipynb) for assessing vision model performance on diagram analysis. It provides:
- Automated testing across multiple models. 🧪
- Response time metrics. ⏱️
- Visualization tools for performance comparison. 📉
- Manual scoring framework for response quality. ✅
- LLaMA 3.2 Vision
- LLaVA (Large Language-and-Vision Assistant)
- Gemini 1.5 Flash