Skip to content

LabelMate is a comprehensive data labeling tool that combines AI automation with human oversight to create high-quality labeled datasets efficiently:

Notifications You must be signed in to change notification settings

swamy18/LabelMate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏷️ LabelMate - AI-Powered Data Labeling Assistant

By @swamy18

A production-ready mini-product that accelerates data annotation workflows for ML teams using AI assistance.

πŸš€ What LabelMate Does

LabelMate is a comprehensive data labeling tool that combines AI automation with human oversight to create high-quality labeled datasets efficiently:

πŸ“„ Text Labeling

  • Sentiment Analysis: Automatically classify text as Positive, Negative, or Neutral
  • Topic Classification: Categorize text into Technology, Business, Politics, Sports, etc.
  • Batch Processing: Label hundreds of texts with one click
  • Human Review: Easy interface to correct AI mistakes
  • Change Tracking: Monitor which labels were AI-generated vs human-corrected

πŸ–ΌοΈ Image Labeling

  • Multi-Label Suggestions: AI provides top-3 label suggestions per image
  • Flexible Views: Grid, List, or Single-image detailed views
  • Custom Labels: Add your own labels beyond AI suggestions
  • Batch Operations: Re-process all images with updated AI models

πŸ“Š Analytics Dashboard

  • Real-time Progress: Track labeling completion rates
  • AI Performance: Monitor accuracy and human correction rates
  • Data Quality: Insights into label distribution and consistency
  • Export Reports: Comprehensive analytics in JSON and Markdown formats

βš™οΈ Quick Setup (< 5 minutes)

1. Clone and Install

git clone https://github.com/swamy18/labelmate.git
cd labelmate
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure API Keys

# Create .env file
echo "GOOGLE_API_KEY=your_gemini_key_here" > .env
# OR for OpenAI
echo "OPENAI_API_KEY=your_openai_key_here" > .env

Get Free API Keys:

  • Google Gemini (Recommended - handles both text and images)
  • OpenAI (Text only, requires paid account)

3. Launch Application

streamlit run app.py

Open http://localhost:8501 in your browser.

🎯 Demo Workflow

Text Labeling Demo:

  1. Upload Data: Use the provided data/sample_texts.csv or upload your own CSV with a 'text' column
  2. AI Processing: Click "Auto-Label All" to get AI sentiment predictions
  3. Human Review: Review and correct any mislabeled items using the dropdown selectors
  4. Export Results: Download the labeled CSV with tracking of AI vs human labels

Image Labeling Demo:

  1. Upload Images: Drop 5-10 sample images (JPG/PNG format)
  2. AI Suggestions: System provides 3 label suggestions per image using Gemini Vision
  3. Label Selection: Choose the best label or add custom ones
  4. Export Labels: Download CSV mapping filenames to selected labels

Analytics Demo:

  1. Progress Tracking: Real-time charts showing completion rates
  2. Performance Metrics: AI accuracy rates and human correction statistics
  3. Data Insights: Label distribution analysis and quality metrics
  4. Report Export: Generate comprehensive analytics reports

πŸ“ Project Structure

labelmate/
β”œβ”€β”€ app.py                 # Main Streamlit application
β”œβ”€β”€ requirements.txt       # Python dependencies  
β”œβ”€β”€ .env                   # API keys (create this)
β”œβ”€β”€ .gitignore            # Git ignore rules
β”œβ”€β”€ README.md             # This file
β”œβ”€β”€ utils/                # Helper modules
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ text_helper.py    # Text labeling logic
β”‚   β”œβ”€β”€ image_helper.py   # Image labeling logic  
β”‚   └── charts.py         # Analytics and visualization
β”œβ”€β”€ data/                 # Sample data and temporary files
β”‚   β”œβ”€β”€ sample_texts.csv  # Demo text dataset
β”‚   └── temp_images/      # Temporary image storage
└── exports/              # Output folder (auto-generated)
    β”œβ”€β”€ labeled_text.csv
    β”œβ”€β”€ labeled_images.csv
    └── analytics_report.json

πŸ› οΈ Technical Stack

  • Frontend: Streamlit (Python-based web interface)
  • AI Backend: Google Gemini 1.5 Flash (text + vision) or OpenAI GPT-3.5
  • Data Processing: Pandas for CSV manipulation
  • Visualization: Plotly for interactive charts and progress tracking
  • Image Processing: PIL (Pillow) for image handling
  • State Management: Streamlit session state for real-time updates

πŸ”§ Advanced Configuration

Multiple AI Providers

The system automatically detects available API keys:

  1. Gemini (preferred): Handles both text and image labeling
  2. OpenAI: Text labeling only (requires separate vision solution)

Custom Labeling Tasks

Modify utils/text_helper.py to add new classification tasks:

CUSTOM_PROMPT = """
Your custom classification prompt here.
Return one of: Option1, Option2, Option3
"""

Batch Processing Settings

Adjust batch sizes in the UI:

  • Text: Process 1-100 items at once
  • Images: Upload and process multiple files simultaneously

πŸ“ˆ Production Deployment

Local Production Mode

streamlit run app.py --server.port 8501 --server.address 0.0.0.0

Docker Deployment

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.headless", "true"]

Cloud Deployment

  • Streamlit Cloud: Direct GitHub integration
  • Heroku: Use provided Procfile
  • AWS/GCP: Container-based deployment

🀝 Contributing

We welcome contributions! Here's how:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
python -m pytest tests/

# Code formatting
black app.py utils/

Performance Optimization:

  • Text: Process in batches of 10-20 for optimal speed
  • Images: Resize large images before upload
  • Memory: Clear session state periodically for large datasets

πŸ“Š Analytics & Metrics

LabelMate tracks comprehensive metrics:

  • Progress Tracking: Real-time completion percentages
  • AI Performance: Accuracy rates and confidence scores
  • Human Oversight: Correction rates and consistency metrics
  • Data Quality: Label distribution and anomaly detection
  • Export Analytics: JSON reports for further analysis

πŸ”’ Data Privacy & Security

  • Local Processing: All data stays on your machine by default
  • API Calls: Only text/image content sent to AI providers (no personal data)
  • No Storage: No permanent data storage in cloud services
  • Session Based: Data cleared when application restarts

πŸ“š Use Cases

ML/AI Teams

  • Dataset Creation: Rapidly label training data for supervised learning
  • Data Augmentation: Generate labels for synthetic or scraped data
  • Quality Assurance: Validate existing labels with AI assistance

Research Organizations

  • Survey Analysis: Classify open-ended survey responses
  • Content Analysis: Categorize research papers, articles, or documents
  • Image Annotation: Label research images for computer vision projects

Business Applications

  • Customer Feedback: Analyze sentiment in reviews and support tickets
  • Content Moderation: Classify user-generated content for policy compliance
  • Market Research: Tag and categorize social media posts and comments

πŸ“„ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

  • Streamlit team for the amazing web framework
  • Google for the powerful Gemini API
  • OpenAI for GPT models and vision capabilities
  • Plotly for interactive visualization components
  • PIL/Pillow for robust image processing

🌟 Star History

If you find LabelMate useful, please give it a star on GitHub! ⭐

πŸ“ž Support & Contact


Made by @swamy18

LabelMate - Accelerating AI development through intelligent data labeling """

About

LabelMate is a comprehensive data labeling tool that combines AI automation with human oversight to create high-quality labeled datasets efficiently:

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published