Skip to content

Release v2.0.0

Choose a tag to compare

@huypq02 huypq02 released this 04 Jan 17:02
694a19d

Release Notes - Word Frequency Mini Project

Version 2.0.0 (January 4, 2026)

🚀 Major Release - Web API & Enhanced Architecture

Complete rewrite with FastAPI web service, professional NLP processing, and automated setup.


🎯 What's New

🌐 Web API Service

  • RESTful API with FastAPI framework
  • Swagger UI at /docs for interactive testing
  • Two endpoints: /analyses/text (JSON input) and /analyses/file (file upload)
  • Multiple formats: JSON, CSV, PNG outputs

🔒 Security & Validation

  • File size limits (5MB default)
  • Content-Type validation (text/plain only)
  • Pydantic models for request/response validation
  • Comprehensive error handling

🛠️ Developer Experience

  • One-click setup: python start.py or start.bat/start.sh
  • Automated installer: Checks dependencies, downloads NLTK data, starts server
  • Modular architecture: Clean separation (app, pipeline, models, middleware)
  • Cross-platform: Windows, Linux, macOS support

📊 Version Comparison

Feature v1.0.0 v2.0.0
Interface CLI script FastAPI REST API
Server None Uvicorn ASGI
Setup Manual Automated scripts
Documentation Basic Comprehensive + API docs
File Upload Manual script execution HTTP multipart upload

🆕 New Dependencies

# NEW in v2.0.0
fastapi>=0.104.0          # Web framework
uvicorn[standard]>=0.24.0 # ASGI server
python-multipart>=0.0.6   # File upload support
pandas>=1.5.0             # Data manipulation (replaces csv module)
numpy>=1.21.0             # Numerical operations
nltk>=3.8                 # English NLP (replaces re module)
underthesea>=1.3.0        # Vietnamese NLP (NEW)

# CARRIED OVER from v1.0.0
matplotlib>=3.5.0         # Visualization
setuptools>=65.0.0        # Build tools
wheel>=0.37.0             # Build tools

🚀 Quick Start

# Automated setup (recommended)
python start.py

# Manual setup
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
uvicorn src.app.main:app --reload --port 5000

Access: http://localhost:5000/docs


📡 API Usage Examples

Text Analysis:

curl -X POST "http://localhost:5000/analyses/text" \
  -H "Content-Type: application/json" \
  -d '{"text": "Xin chào! Đây là ví dụ.", "format": "json"}'

File Upload:

curl -X POST "http://localhost:5000/analyses/file" \
  -F "file=@sample.txt" \
  -F "format=csv"

Python:

import requests
response = requests.post("http://localhost:5000/analyses/text",
    json={"text": "Hello world!", "format": "json"})
print(response.json())

⚠️ Breaking Changes

Migration required from v1.0.0:

  1. Installation: Install new dependencies (fastapi, uvicorn, nltk, underthesea, pandas, numpy)
  2. Execution: Run via uvicorn or startup scripts instead of direct Python execution
  3. Interface: Primary interface is HTTP API (CLI script deprecated)
  4. Output: Files saved to output/ directory with timestamp naming
  5. Imports: Module structure changed to src.pipeline.text_stats

Direct function usage still available:

from src.pipeline import text_stats as ts
tokens = ts.preprocessing(text)
stats = ts.statistics(tokens)

🐛 Bug Fixes

  • ✅ Better handling of punctuation and special characters

📁 Project Structure

word-frequency-mini-project/
├── src/
│   ├── app/
│   │   ├── main.py          # FastAPI application
│   │   ├── models.py        # Request/Response models
│   │   └── middleware.py    # Security middleware
│   └── pipeline/
│       └── text_stats.py    # Core NLP processing
├── data/                    # Input files
├── output/                  # Generated reports
├── start.py                 # Automated setup
├── start.bat/start.sh       # Platform-specific launchers
└── requirements.txt         # Dependencies

🔮 Roadmap

v2.1.0 (Q2 2026):

  • Word cloud visualization (from v1.0.0 roadmap)
  • TF-IDF analysis (from v1.0.0 roadmap)
  • Batch file processing (from v1.0.0 roadmap)
  • Export to PDF

v3.0.0 (Q3 2026):

  • .docx and .pdf file support (from v1.0.0 roadmap)
  • Database integration
  • User authentication
  • Docker containerization
  • Web UI frontend (from v1.0.0 roadmap)

🙏 Acknowledgments

  • NLTK Team - Natural Language Toolkit
  • Underthesea Team - Vietnamese NLP
  • FastAPI - Modern web framework
  • Matplotlib - Visualization library

🛠️ Tech Stack

Language: Python 3.x

Libraries:

  • matplotlib - Data visualization
  • csv - CSV file handling
  • re - Regular expressions for text processing

📝 Notes

  • First stable release
  • Tested with Vietnamese educational content
  • CSV output compatible with Excel and data analysis tools

🔮 Future Improvements

  • Support for additional file formats (.docx, .pdf)
  • Web interface for file upload
  • Word cloud visualization
  • TF-IDF analysis
  • Multi-file batch processing

Full Changelog: https://github.com/huypq02/word-frequency-mini-project/commits/releases/v2.0.0/