InsightLens is an AI-powered image analysis tool designed to deliver quick, insightful, and contextually accurate information about images. Powered by Google's Gemini-1.5-Flash model and Streamlit. And InsightLens generates automatic captions, offers detailed insights, and answers to specific questions about the image content.
Whether you're using InsightLens to enhance content creation, explore visual storytelling, or analyze images for insights, this tool provides a seamless and interactive experience that’s both informative and engaging.
Experience InsightLens in action! 👉🏻
Below is a preview of InsightLens analyzing an image and providing detailed insights! 👇
Building InsightLens was an exciting exploration at the crossroads of generative AI and interactive web development. Here’s a snapshot of my journey:
-
Inspiration:
Inspired by the potential of generative AI, I wanted to create a tool that could not only describe images but also engage users by answering their questions about the content. -
Why I Made It:
I aimed to blend art and technology by providing dynamic insights into images, using cutting-edge AI to generate both creative captions and structured, detailed descriptions. -
Challenges Faced:
- API Integration: Integrating with Google’s Gemini AI required careful handling of API keys and ensuring smooth communication with the service.
- Interactive Design: Creating an engaging and intuitive Streamlit interface with custom animations and styling.
- Error Handling: Building robust error handling to manage unexpected inputs and sensitive content.
- Performance Optimization: Balancing quick response times with detailed analysis capabilities.
-
What I Learned:
- Effective use of generative AI for multimedia content analysis.
- How to build visually appealing, responsive apps with Streamlit.
- Advanced techniques in API integration and user input processing.
Every step of the process has deepened my understanding of AI and web development, and reinforced my passion for creating tools that merge creativity with technology.
- Features
- Installation
- Usage
- Technologies Used
- Results
- Directory Structure
- Future Enhancements
- Contributing
- License
- Contact
-
Automatic Captioning: Generates a brief, one-line caption for uploaded images.
-
Detailed Descriptions: Provides concise summeries that highlight the primary content and context of the image.
-
Image Q&A: Users can ask questions about the image's content, with responses powered by Gemini AI.
-
Easy-to-Use Interface: A simple, intuitive layout designed for both casual users and tech enthusiasts.
-
Privacy by Design: InsightLens does not store any images or questions asked, ensuring a secure and private interaction every time.
-
Clone the Repository:
git clone https://github.com/hk-kumawat/InsightLens.git cd hk-kumawat-insight-lens
-
Create & Activate a Virtual Environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Required Packages:
pip install -r requirements.txt
-
Set Up Your Gemini API Key:
- Create a
.streamlit/secrets.toml
file. - Add your Gemini API key:
[GEMINI] api_key = "your_gemini_api_key_here"
- Alternatively, set your API key as an environment variable.
- Create a
-
Run the Application:
streamlit run app.py
-
Launch the Application:
streamlit run app.py
-
Using the App:
- Upload an image using the file uploader
- View the auto-generated caption and structured description
- Ask specific questions about the image
- Get AI-powered responses and insights
-
Features in Action:
- Image Upload: Supports JPG, JPEG, and PNG formats
- Auto Analysis: Receives immediate caption and description
- Q&A: Ask any question about the uploaded image
- Interactive Elements: Enjoy animations and celebrations
-
Programming Language: Python
-
Libraries:
streamlit
— For creating the user interface.Pillow
— For image processing.python-dotenv
— Manages environment variables.google-generativeai
— For generating captions, descriptions, and answering questions.
-
Frontend:
HTML/CSS
Custom animations
-
API:
- Gemini API by Google Generative AI — Powers the core captioning, description generation, and Q&A functionalities.
InsightLens successfully analyzes images, providing an insightful, one-line caption along with a structured, emoji-based description and engaging Q&A responses. This AI-powered analysis is useful in applications ranging from social media to education and creative design.
In the above example, InsightLens provides a brief caption, structured description, and accurate answers to user questions about the image content.
hk-kumawat-insight-lens/
├── README.md # Project documentation
├── LICENSE # License information
├── app.py # Streamlit application for generating image insights
└── requirements.txt # List of dependencies
-
Multi-turn Conversation: Enable the assistant to maintain conversation context across multiple interactions.
-
Advanced Emotion Detection: Expand sentiment capabilities to identify a wider range of emotional tones in image context.
-
Integration with External Services: Extend InsightLens’s functionality to connect with APIs for additional insights (e.g., related news or facts about image objects).
-
Voice Interaction: Add voice input/output for a more dynamic user experience.
-
Multi-Model Integration:
Integrate additional AI models to provide diverse perspectives on image analysis, such as object detection and sentiment analysis. -
Social Sharing:
Allow users to easily share generated insights and images on social media platforms. -
User Feedback Loop:
Integrate a mechanism for users to provide feedback on the AI's responses, helping to continuously improve accuracy and relevance.
Contributions make the open source community such an amazing place to learn, inspire, and create. 🙌 Any contributions you make are greatly appreciated! 😊
Have an idea to improve this project? Go ahead and fork the repo to create a pull request, or open an issue with the tag "enhancement". Don't forget to give the project a star! ⭐ Thanks again! 🙏
-
Fork the repository.
-
Create a new branch:
git checkout -b feature/YourFeatureName
-
Commit your changes with a descriptive message.
-
Push to your branch:
git push origin feature/YourFeatureName
-
Open a Pull Request detailing your enhancements or bug fixes.
This project is licensed under the MIT License — see the LICENSE file for details.
Feel free to reach out for collaborations or questions:
💻 — Explore more projects by Harshal Kumawat.
🌐 — Let's connect professionally.
📧 — Reach out for inquiries or collaboration.
"Every image tells a story, let AI help you discover it." - InsightLens