BERT Studio is a comprehensive, full-stack platform for experimenting with BERT and other transformer-based models from HuggingFace. It provides an intuitive web interface for model exploration, task execution, and custom code development with enterprise-grade MongoDB integration.
- Model Management: Browse, download, and load models from HuggingFace Hub
- Embedding Generation: Create embeddings from text using various models
- Text Classification: Perform sentiment analysis and multi-class classification
- Question Answering: Extract answers from context using extractive QA models
- Named Entity Recognition: Identify and classify entities in text
- Fill Mask: Complete masked text using language models
- Text Summarization: Generate concise summaries from longer texts
- Feature Extraction: Extract high-dimensional features from text
- Custom Tasks: Execute custom PyTorch/Transformers code with security restrictions
- MongoDB Integration: Enterprise-grade task storage and management
- API Key Management: Secure authentication and session handling
- Task Sharing: Export/import custom tasks between installations
- Real-time Processing: Fast inference with GPU acceleration support
- Docker Deployment: Production-ready containerized deployment
BERT Studio follows a modern full-stack architecture:
- Frontend: React 18 + TypeScript + Vite + shadcn/ui + Tailwind CSS
- Backend: FastAPI + Python with PyTorch and Transformers
- Database: MongoDB for persistent storage
- Deployment: Docker Compose with Nginx reverse proxy
- Authentication: Session-based with API key management
- Node.js 18+ and npm/yarn
- Python 3.9+
- MongoDB 6+ (or Docker)
- Docker and Docker Compose (for containerized deployment)
- CUDA (optional, for GPU acceleration)
-
Clone the repository:
git clone <YOUR_GIT_URL> cd bert-studio
-
Frontend Setup:
npm install npm run dev
-
Backend Setup:
cd backend pip install -r requirements.txt python start_server.py -
MongoDB Setup:
# Install MongoDB locally or use MongoDB Atlas # Ubuntu/Debian sudo apt-get install mongodb # macOS brew install mongodb-community # Configure connection (optional) export MONGODB_CONNECTION_STRING="mongodb://localhost:27017" export MONGODB_DATABASE_NAME="bert_studio"
-
Clone and configure:
git clone <YOUR_GIT_URL> cd bert-studio cp backend/.env.example backend/.env.local # Edit backend/.env.local with your configuration
-
Deploy with Docker Compose:
docker-compose up -d
-
Access the application:
- Frontend: http://localhost
- Backend API: http://localhost:8000
- MongoDB: localhost:27017
- Navigate to the web interface
- Browse available models in the Model Browser
- Select a task type (Classification, QA, NER, etc.)
- Choose or download a model
- Input your text and run inference
Create custom PyTorch code with these security features:
- Only
transformersandtorchimports allowed - Code must be wrapped in functions
- Separate tokenizer, model, and function code blocks
- Function must be named
custom_functionand accept text parameter
Example Custom Task:
# Tokenizer Code
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
# Model Code
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
# Function Code
def custom_function(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
return {"prediction": probabilities[0][1].item()}- Save Tasks: Store custom code with metadata (name, description, tags)
- Search & Filter: Find tasks by name, description, tags, or model
- Export/Import: Share tasks between installations
- Statistics: View usage analytics and popular tags
- Backup/Restore: Full database backup capabilities
We welcome contributions! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes
- Run tests:
npm run test(frontend) andpytest(backend) - Lint code:
npm run lint(frontend) andflake8(backend) - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
- Frontend: Follow React/TypeScript best practices, use ESLint configuration
- Backend: Follow PEP 8, use type hints, add docstrings
- Database: Use proper MongoDB indexing and query optimization
- Security: Never commit API keys, follow OWASP guidelines
- Frontend: Jest + React Testing Library
- Backend: pytest with async test support
- Integration: Docker-based end-to-end testing
| Feature | BERT Studio | HuggingFace Spaces | Colab | Local Scripts |
|---|---|---|---|---|
| Custom Code Execution | β Secure sandbox | β Limited | β Full access | β Full access |
| Persistent Storage | β MongoDB | β Session only | β Session only | β Local files |
| Multi-Model Support | β Full HF Hub | β Full HF Hub | β Manual setup | β Manual setup |
| Web Interface | β Professional UI | β Basic | β Notebook only | β CLI/Scripts |
| Task Management | β Advanced search/tags | β None | β None | β File-based |
| Production Ready | β Docker + scaling | β Shared resources | β Development only | β Manual setup |
| Collaboration | β Export/import | β Public only | β Sharing | β Manual |
- Research: Rapid prototyping and model comparison
- Education: Teaching ML concepts with hands-on examples
- Production: Model validation before deployment
- Enterprise: Secure, self-hosted ML experimentation platform
See the docs/ directory for detailed documentation:
- API Reference
- Frontend Architecture
- Backend Architecture
- Database Schema
- Deployment Guide
- Security Guidelines
# Backend (.env.local)
MONGODB_CONNECTION_STRING=mongodb://localhost:27017
MONGODB_DATABASE_NAME=bert_studio
SECRET_KEY=your-secret-key-here
CORS_ORIGINS=http://localhost:3000,http://localhost
# Optional: HuggingFace configuration
HF_TOKEN=your-huggingface-token
TRANSFORMERS_CACHE=/path/to/cache- CPU Only: Default configuration works out of the box
- GPU Support: Uncomment GPU sections in docker-compose.yml
- Custom Models: Mount model cache directories for persistence
- Set strong SECRET_KEY in environment
- Configure MongoDB with authentication
- Set up SSL/TLS certificates
- Configure Nginx with security headers
- Set up monitoring and logging
- Configure automated backups
- Test disaster recovery procedures
- Horizontal: Load balance multiple backend instances
- Vertical: Increase container resources for large models
- Database: Use MongoDB replica sets or sharding
- CDN: Serve static assets via CDN
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs/
- HuggingFace Transformers for the ML backend
- shadcn/ui for the beautiful UI components
- FastAPI for the robust API framework
- MongoDB for reliable data persistence
Built with β€οΈ for the ML community
