Release v2.0.0
Release Notes - Word Frequency Mini Project
Version 2.0.0 (January 4, 2026)
🚀 Major Release - Web API & Enhanced Architecture
Complete rewrite with FastAPI web service, professional NLP processing, and automated setup.
🎯 What's New
🌐 Web API Service
- RESTful API with FastAPI framework
- Swagger UI at
/docsfor interactive testing - Two endpoints:
/analyses/text(JSON input) and/analyses/file(file upload) - Multiple formats: JSON, CSV, PNG outputs
🔒 Security & Validation
- File size limits (5MB default)
- Content-Type validation (text/plain only)
- Pydantic models for request/response validation
- Comprehensive error handling
🛠️ Developer Experience
- One-click setup:
python start.pyorstart.bat/start.sh - Automated installer: Checks dependencies, downloads NLTK data, starts server
- Modular architecture: Clean separation (app, pipeline, models, middleware)
- Cross-platform: Windows, Linux, macOS support
📊 Version Comparison
| Feature | v1.0.0 | v2.0.0 |
|---|---|---|
| Interface | CLI script | FastAPI REST API |
| Server | None | Uvicorn ASGI |
| Setup | Manual | Automated scripts |
| Documentation | Basic | Comprehensive + API docs |
| File Upload | Manual script execution | HTTP multipart upload |
🆕 New Dependencies
# NEW in v2.0.0
fastapi>=0.104.0 # Web framework
uvicorn[standard]>=0.24.0 # ASGI server
python-multipart>=0.0.6 # File upload support
pandas>=1.5.0 # Data manipulation (replaces csv module)
numpy>=1.21.0 # Numerical operations
nltk>=3.8 # English NLP (replaces re module)
underthesea>=1.3.0 # Vietnamese NLP (NEW)
# CARRIED OVER from v1.0.0
matplotlib>=3.5.0 # Visualization
setuptools>=65.0.0 # Build tools
wheel>=0.37.0 # Build tools🚀 Quick Start
# Automated setup (recommended)
python start.py
# Manual setup
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
uvicorn src.app.main:app --reload --port 5000Access: http://localhost:5000/docs
📡 API Usage Examples
Text Analysis:
curl -X POST "http://localhost:5000/analyses/text" \
-H "Content-Type: application/json" \
-d '{"text": "Xin chào! Đây là ví dụ.", "format": "json"}'File Upload:
curl -X POST "http://localhost:5000/analyses/file" \
-F "file=@sample.txt" \
-F "format=csv"Python:
import requests
response = requests.post("http://localhost:5000/analyses/text",
json={"text": "Hello world!", "format": "json"})
print(response.json())⚠️ Breaking Changes
Migration required from v1.0.0:
- Installation: Install new dependencies (
fastapi,uvicorn,nltk,underthesea,pandas,numpy) - Execution: Run via
uvicornor startup scripts instead of direct Python execution - Interface: Primary interface is HTTP API (CLI script deprecated)
- Output: Files saved to
output/directory with timestamp naming - Imports: Module structure changed to
src.pipeline.text_stats
Direct function usage still available:
from src.pipeline import text_stats as ts
tokens = ts.preprocessing(text)
stats = ts.statistics(tokens)🐛 Bug Fixes
- ✅ Better handling of punctuation and special characters
📁 Project Structure
word-frequency-mini-project/
├── src/
│ ├── app/
│ │ ├── main.py # FastAPI application
│ │ ├── models.py # Request/Response models
│ │ └── middleware.py # Security middleware
│ └── pipeline/
│ └── text_stats.py # Core NLP processing
├── data/ # Input files
├── output/ # Generated reports
├── start.py # Automated setup
├── start.bat/start.sh # Platform-specific launchers
└── requirements.txt # Dependencies
🔮 Roadmap
v2.1.0 (Q2 2026):
- Word cloud visualization (from v1.0.0 roadmap)
- TF-IDF analysis (from v1.0.0 roadmap)
- Batch file processing (from v1.0.0 roadmap)
- Export to PDF
v3.0.0 (Q3 2026):
- .docx and .pdf file support (from v1.0.0 roadmap)
- Database integration
- User authentication
- Docker containerization
- Web UI frontend (from v1.0.0 roadmap)
🙏 Acknowledgments
- NLTK Team - Natural Language Toolkit
- Underthesea Team - Vietnamese NLP
- FastAPI - Modern web framework
- Matplotlib - Visualization library
🛠️ Tech Stack
Language: Python 3.x
Libraries:
matplotlib- Data visualizationcsv- CSV file handlingre- Regular expressions for text processing
📝 Notes
- First stable release
- Tested with Vietnamese educational content
- CSV output compatible with Excel and data analysis tools
🔮 Future Improvements
- Support for additional file formats (.docx, .pdf)
- Web interface for file upload
- Word cloud visualization
- TF-IDF analysis
- Multi-file batch processing
Full Changelog: https://github.com/huypq02/word-frequency-mini-project/commits/releases/v2.0.0/