By @swamy18
A production-ready mini-product that accelerates data annotation workflows for ML teams using AI assistance.
LabelMate is a comprehensive data labeling tool that combines AI automation with human oversight to create high-quality labeled datasets efficiently:
- Sentiment Analysis: Automatically classify text as Positive, Negative, or Neutral
- Topic Classification: Categorize text into Technology, Business, Politics, Sports, etc.
- Batch Processing: Label hundreds of texts with one click
- Human Review: Easy interface to correct AI mistakes
- Change Tracking: Monitor which labels were AI-generated vs human-corrected
- Multi-Label Suggestions: AI provides top-3 label suggestions per image
- Flexible Views: Grid, List, or Single-image detailed views
- Custom Labels: Add your own labels beyond AI suggestions
- Batch Operations: Re-process all images with updated AI models
- Real-time Progress: Track labeling completion rates
- AI Performance: Monitor accuracy and human correction rates
- Data Quality: Insights into label distribution and consistency
- Export Reports: Comprehensive analytics in JSON and Markdown formats
git clone https://github.com/swamy18/labelmate.git
cd labelmate
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
# Create .env file
echo "GOOGLE_API_KEY=your_gemini_key_here" > .env
# OR for OpenAI
echo "OPENAI_API_KEY=your_openai_key_here" > .env
Get Free API Keys:
- Google Gemini (Recommended - handles both text and images)
- OpenAI (Text only, requires paid account)
streamlit run app.py
Open http://localhost:8501 in your browser.
- Upload Data: Use the provided
data/sample_texts.csv
or upload your own CSV with a 'text' column - AI Processing: Click "Auto-Label All" to get AI sentiment predictions
- Human Review: Review and correct any mislabeled items using the dropdown selectors
- Export Results: Download the labeled CSV with tracking of AI vs human labels
- Upload Images: Drop 5-10 sample images (JPG/PNG format)
- AI Suggestions: System provides 3 label suggestions per image using Gemini Vision
- Label Selection: Choose the best label or add custom ones
- Export Labels: Download CSV mapping filenames to selected labels
- Progress Tracking: Real-time charts showing completion rates
- Performance Metrics: AI accuracy rates and human correction statistics
- Data Insights: Label distribution analysis and quality metrics
- Report Export: Generate comprehensive analytics reports
labelmate/
βββ app.py # Main Streamlit application
βββ requirements.txt # Python dependencies
βββ .env # API keys (create this)
βββ .gitignore # Git ignore rules
βββ README.md # This file
βββ utils/ # Helper modules
β βββ __init__.py
β βββ text_helper.py # Text labeling logic
β βββ image_helper.py # Image labeling logic
β βββ charts.py # Analytics and visualization
βββ data/ # Sample data and temporary files
β βββ sample_texts.csv # Demo text dataset
β βββ temp_images/ # Temporary image storage
βββ exports/ # Output folder (auto-generated)
βββ labeled_text.csv
βββ labeled_images.csv
βββ analytics_report.json
- Frontend: Streamlit (Python-based web interface)
- AI Backend: Google Gemini 1.5 Flash (text + vision) or OpenAI GPT-3.5
- Data Processing: Pandas for CSV manipulation
- Visualization: Plotly for interactive charts and progress tracking
- Image Processing: PIL (Pillow) for image handling
- State Management: Streamlit session state for real-time updates
The system automatically detects available API keys:
- Gemini (preferred): Handles both text and image labeling
- OpenAI: Text labeling only (requires separate vision solution)
Modify utils/text_helper.py
to add new classification tasks:
CUSTOM_PROMPT = """
Your custom classification prompt here.
Return one of: Option1, Option2, Option3
"""
Adjust batch sizes in the UI:
- Text: Process 1-100 items at once
- Images: Upload and process multiple files simultaneously
streamlit run app.py --server.port 8501 --server.address 0.0.0.0
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.headless", "true"]
- Streamlit Cloud: Direct GitHub integration
- Heroku: Use provided Procfile
- AWS/GCP: Container-based deployment
We welcome contributions! Here's how:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Commit changes:
git commit -m 'Add amazing feature'
- Push to branch:
git push origin feature/amazing-feature
- Open a Pull Request
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
python -m pytest tests/
# Code formatting
black app.py utils/
- Text: Process in batches of 10-20 for optimal speed
- Images: Resize large images before upload
- Memory: Clear session state periodically for large datasets
LabelMate tracks comprehensive metrics:
- Progress Tracking: Real-time completion percentages
- AI Performance: Accuracy rates and confidence scores
- Human Oversight: Correction rates and consistency metrics
- Data Quality: Label distribution and anomaly detection
- Export Analytics: JSON reports for further analysis
- Local Processing: All data stays on your machine by default
- API Calls: Only text/image content sent to AI providers (no personal data)
- No Storage: No permanent data storage in cloud services
- Session Based: Data cleared when application restarts
- Dataset Creation: Rapidly label training data for supervised learning
- Data Augmentation: Generate labels for synthetic or scraped data
- Quality Assurance: Validate existing labels with AI assistance
- Survey Analysis: Classify open-ended survey responses
- Content Analysis: Categorize research papers, articles, or documents
- Image Annotation: Label research images for computer vision projects
- Customer Feedback: Analyze sentiment in reviews and support tickets
- Content Moderation: Classify user-generated content for policy compliance
- Market Research: Tag and categorize social media posts and comments
MIT License - see LICENSE file for details.
- Streamlit team for the amazing web framework
- Google for the powerful Gemini API
- OpenAI for GPT models and vision capabilities
- Plotly for interactive visualization components
- PIL/Pillow for robust image processing
If you find LabelMate useful, please give it a star on GitHub! β
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: Contact via GitHub profile
- Documentation: Wiki
Made by @swamy18
LabelMate - Accelerating AI development through intelligent data labeling """